Hamish wrote:
> r.in.mat and r.out.mat are littered with "sizeof(long) == 4"
> assumptions.
The MATv4 file format specifies this (IIRC, it's been a while now & I'd
have to look it up), at least for the header but I think for the arrays
as well. The file header specifies which endianness the data that
follows was written in, it allows either.
You're getting confused. sizeof(long) is whatever the compiler says it
is; the MATv4 format has no say in the matter.
If you mean that the fields are supposed to be 4-byte integers, that's
a different matter. In that case, the code needs to use 4-byte
integers, not "long".
> Also, AFAICT, r.out.mat always writes the output in the
> system's byte-order,
as specified by the format, byte order used is recorded in the header,
But there's no requirement that it's the same as the system's
byte-order, right?
> and r.in.mat just assumes that the file is in the system's byte-order
> (it checks, but doesn't do anything in the event of a mismatch).
In the event of a mismatch it triggers a warning that this is "TODO" and
the rest will likely not succeed. I prefer that to a G_fatal_error(), it
encourages help with debugging. It is likely that more of those warnings
are needed for other endian/64bit permutations. The situation is also
mentioned in the r.in.mat help page. I'd rather invest the time fixing
the problem vs. going to great lengths to add more elablorate tests to
provide "sorry," messages.
Right. So explicit [de]serialisation will prevent all of those
problems while eliminating the need to check the system's byte order.
> Both of those programs need to be substantially re-written.
I welcome help. Bitwise operations are not my forte.
Converting an integer to 4 bytes, little-endian:
void serialise_int32_le(unsigned char *buf, long x)
{
int i;
for (i = 0; i < 4; i++)
buf[i] = (x >> i*8) & 0xFF;
}
Converting an integer to 4 bytes, big-endian:
void serialise_int32_be(unsigned char *buf, long x)
{
int i;
for (i = 0; i < 4; i++)
buf[3-i] = (x >> i*8) & 0xFF;
}
Converting 4 bytes, little-endian, to an integer;
long deserialise_int32_le(const unsigned char *buf)
{
long x = 0;
for (i = 0; i < 4; i++)
x |= (buf[i] << i*8);
return x;
}
Converting 4 bytes, big-endian, to an integer;
long deserialise_int32_le(const unsigned char *buf)
{
long x = 0;
for (i = 0; i < 4; i++)
x |= (buf[3-i] << i*8);
return x;
}
These work regardless of whether the system is big- or little-endian
or whether x is 32 or 64 bits.
[r.out.bin]
> If you change the semantics of that flag so that the absence of the -s
> switch means little-endian while the presence of the flag means
> big-endian, r.out.bin doesn't need to know the host's byte order, and
> a given r.out.bin command achieves the same result (file in big-endian
> format or file in little-endian format) regardless of the system's
> byte order.
mmph. don't make the confusion worse; change the flag's letter to
something else and loudly warn -s is superseded. Are you advocating that
the default mode should not write out in the native byte order?!
That's correct. The system's byte order is irrelevant for file
formats.
Is that really "expected behavior"? I would think it preferable to have -b and
-l flags to force big or little if you want them, otherwise go native.
-b and -l make sense, but the default should be one or other
regardless of the CPU type. The system's byte order is irrelevant for
file formats.
(but -b is taken for BIL, "-l" should be avoided as it looks the same as
"-1" in some fonts, and -e,-E has issues for mingw32 people...?)
> As for backwards compatibility, making the default byte order (no -s
> flag) little-endian means that anyone using x86 (i.e. most users) will
> be unaffected.
that's pretty crap for the non x86 crowd (who are the ones most likely
to need that flag in the first place).
Probably not. Most of the DEMs I've come across (e.g. ETOPO30) are in
big-endian format, so it's the x86 users who need -s.
> IOW, I've yet to come across a situation which actually has a
> legitimate reason to know the system's byte order.
my feelings are: seamless at the user end is good. ambiguity is bad.
Exactly.
A specific command should achieve a specific result without the user
first having to run "uname -m" then find out whether SPARC is big- or
little-endian so that they know whether or not to use -s.
There should be one option for creating little-endian files and
another for big-endian files. The system's byte order is irrelevant.
> BTW, when it comes to floating-point values, the situation isn't as
> simple as big- or little-endian. On some systems, FP values may use a
> different byte order to integers, or double-precision FP values may
> have the 32-bit halves in a different order than the order of bytes
> within a word.
fun fun fun
Yep. Which is presumably why libgis uses XDR for FP rasters.
--
Glynn Clements <glynn@gclements.plus.com>