[GRASS-dev] configure: testing arch endian

Hello,

I would like to add a macro into aclocal.m4/configure.in to test for
architecture byte order. This would then define something like
G_BIGENDIAN 0/1 in include/config.h.in.

I think this would be better than testing for it constantly with
G_is_little_endian(). This would also keep it from being reimplemented
in libraries that are not interdependent (eg. lib/gis,
lib/vector/dglib).

I noticed that endian is not even considered in lib/image. I'll fix
that if I get the clear for the above (or some reversion of).

Comments? Caveats?

--
Brad Douglas <rez touchofmadness com> KB8UYR
Address: 37.493,-121.924 / WGS84 National Map Corps #TNMC-3785

I would like to add a macro into aclocal.m4/configure.in to test for
architecture byte order. This would then define something like
G_BIGENDIAN 0/1 in include/config.h.in.

I think this would be better than testing for it constantly with
G_is_little_endian().

what's so bad about that? If it's in a loop or somewhere where it will
be a performance hit just save the result to a variable at the start of
the module.

This would also keep it from being reimplemented
in libraries that are not interdependent (eg. lib/gis,
lib/vector/dglib).

I noticed that endian is not even considered in lib/image. I'll fix
that if I get the clear for the above (or some reversion of).

Comments? Caveats?

what about the case of "universal" binaries on OSX, where the endianness
at run time is not necessarily the endianness at configure/compile time?

Hamish

On Mon, 2006-05-08 at 14:48 +1200, Hamish wrote:

> I would like to add a macro into aclocal.m4/configure.in to test for
> architecture byte order. This would then define something like
> G_BIGENDIAN 0/1 in include/config.h.in.
>
> I think this would be better than testing for it constantly with
> G_is_little_endian().

what's so bad about that? If it's in a loop or somewhere where it will
be a performance hit just save the result to a variable at the start of
the module.

In the interest of simplification and code reduction...

> This would also keep it from being reimplemented
> in libraries that are not interdependent (eg. lib/gis,
> lib/vector/dglib).
>
> I noticed that endian is not even considered in lib/image. I'll fix
> that if I get the clear for the above (or some reversion of).
>
> Comments? Caveats?

what about the case of "universal" binaries on OSX, where the endianness
at run time is not necessarily the endianness at configure/compile time?

http://developer.apple.com/documentation/Porting/Conceptual/PortingUnix/compiling/chapter_4_section_3.html

According to Apple, we should be including <endian.h> and test
__BIG_ENDIAN or __LITTLE_ENDIAN. That's even simpler. Everything is
handled by the '--host=', '--build=', and '--target=' configure
switches.

Maybe change G_is_little_endian() to something like this?
{
#ifdef __LITTLE_ENDIAN
    return 1;
#elif __BIG_ENDIAN
    return 0;
#else /* probably redundant case */
    return -1;
#endif
}

--
Brad Douglas <rez touchofmadness com> KB8UYR
Address: 37.493,-121.924 / WGS84 National Map Corps #TNMC-3785

Brad Douglas wrote:

I would like to add a macro into aclocal.m4/configure.in to test for
architecture byte order. This would then define something like
G_BIGENDIAN 0/1 in include/config.h.in.

I think this would be better than testing for it constantly with
G_is_little_endian(). This would also keep it from being reimplemented
in libraries that are not interdependent (eg. lib/gis,
lib/vector/dglib).

I noticed that endian is not even considered in lib/image. I'll fix
that if I get the clear for the above (or some reversion of).

Comments? Caveats?

Personally, I would suggest just writing code which doesn't rely upon
the system's byte order; i.e. explicitly convert byte-arrays to words
using left-shift+OR and vice-versa using right-shift+AND.

Apart from endianness issues, this can also prevent alignment
problems. Explicit [de]serialisation doesn't require the byte stream
to be word-aligned (on platforms other than x86, this matters).

Looking at the places which currently use G_is_little_endian():

  lib/ogsf/gsd_img.c
  lib/ogsf/gsd_img_ppm.c
  lib/ogsf/gsd_img_tif.c
  raster/r.in.mat/main.c
  raster/r.out.bin/main.c
  raster/r.out.mat/main.c

The first three indicate a design flaw in gsd_getimage(), namely that
it's assuming that "unsigned long" is 4 bytes. That's the root of the
recently-reported "NVIZ image dump crashes on 64-bit systems" bug.

The solution there is to simply treat the buffer as an arrays of
unsigned char; there's no need to serialise words or to know the
system's byte order.

r.in.mat and r.out.mat are littered with "sizeof(long) == 4"
assumptions. Also, AFAICT, r.out.mat always writes the output in the
system's byte-order, and r.in.mat just assumes that the file is in the
system's byte-order (it checks, but doesn't do anything in the event
of a mismatch).

Both of those programs need to be substantially re-written.

Finally, the only reason that r.out.bin needs to know the endianness
is due to the brain-damaged -s flag. Rather than allowing the user to
specify directly whether the file is written in big- or little-endian
order, the user has a choice of "the same order as this system" (no -s
flag) or "the opposite order to this system" (-s flag given).

In practical terms, this means that the user has to figure out the
system's byte order in order to determine whether or not to use the -s
flag.

If you change the semantics of that flag so that the absence of the -s
switch means little-endian while the presence of the flag means
big-endian, r.out.bin doesn't need to know the host's byte order, and
a given r.out.bin command achieves the same result (file in big-endian
format or file in little-endian format) regardless of the system's
byte order.

As for backwards compatibility, making the default byte order (no -s
flag) little-endian means that anyone using x86 (i.e. most users) will
be unaffected.

IOW, I've yet to come across a situation which actually has a
legitimate reason to know the system's byte order.

BTW, when it comes to floating-point values, the situation isn't as
simple as big- or little-endian. On some systems, FP values may use a
different byte order to integers, or double-precision FP values may
have the 32-bit halves in a different order than the order of bytes
within a word.

--
Glynn Clements <glynn@gclements.plus.com>

r.in.mat and r.out.mat are littered with "sizeof(long) == 4"
assumptions.

The MATv4 file format specifies this (IIRC, it's been a while now & I'd
have to look it up), at least for the header but I think for the arrays
as well. The file header specifies which endianness the data that
follows was written in, it allows either.

If there are int/TYPE_CELL operations which are making this assumption,
missing casts, or are likely to be promoted to something other than
expected then yes that's something to look into.

Also, AFAICT, r.out.mat always writes the output in the
system's byte-order,

as specified by the format, byte order used is recorded in the header,

and r.in.mat just assumes that the file is in the system's byte-order
(it checks, but doesn't do anything in the event of a mismatch).

In the event of a mismatch it triggers a warning that this is "TODO" and
the rest will likely not succeed. I prefer that to a G_fatal_error(), it
encourages help with debugging. It is likely that more of those warnings
are needed for other endian/64bit permutations. The situation is also
mentioned in the r.in.mat help page. I'd rather invest the time fixing
the problem vs. going to great lengths to add more elablorate tests to
provide "sorry," messages.

Both of those programs need to be substantially re-written.

I welcome help. Bitwise operations are not my forte.
Anyone using r.*.mat on a G5? No one has complained so far, but if
anyone does have problems the code is commented re. the possible use
of uint32 and I've tried to leave breadcrumbs along the way.

[r.out.bin]

If you change the semantics of that flag so that the absence of the -s
switch means little-endian while the presence of the flag means
big-endian, r.out.bin doesn't need to know the host's byte order, and
a given r.out.bin command achieves the same result (file in big-endian
format or file in little-endian format) regardless of the system's
byte order.

mmph. don't make the confusion worse; change the flag's letter to
something else and loudly warn -s is superseded. Are you advocating that
the default mode should not write out in the native byte order?! Is that
really "expected behavior"? I would think it preferable to have -b and
-l flags to force big or little if you want them, otherwise go native.
(but -b is taken for BIL, "-l" should be avoided as it looks the same as
"-1" in some fonts, and -e,-E has issues for mingw32 people...?)

As for backwards compatibility, making the default byte order (no -s
flag) little-endian means that anyone using x86 (i.e. most users) will
be unaffected.

that's pretty crap for the non x86 crowd (who are the ones most likely
to need that flag in the first place).

IOW, I've yet to come across a situation which actually has a
legitimate reason to know the system's byte order.

my feelings are: seamless at the user end is good. ambiguity is bad.

BTW, when it comes to floating-point values, the situation isn't as
simple as big- or little-endian. On some systems, FP values may use a
different byte order to integers, or double-precision FP values may
have the 32-bit halves in a different order than the order of bytes
within a word.

fun fun fun

Hamish

I agree - at least that endian checking should be done at runtime, if the program needs to know. From my point of view:

Mac OS X PPC/Intel universal binaries. You can build a universal binary in one go, by adding dual -arch options to all compile and link commands. This works just fine for any Mac application, they don't have endian issues, and Apple has some info on writing endian-independent code. But when endian is set during configure for that build, it's fixed for the architecture you're building on (say PPC), and the other arch's (Intel, then) build is broken.

What I end up doing then is config-make for my arch. Config another copy and hack eny endian settings to the other arch, make that. Lipo the two together. I actually replace endian settings with a test for a macro that I supply at build time, so I can toggle all installed dependency headers (defaults to big-endian for now).

On May 7, 2006, at 11:06 PM, Glynn Clements wrote:

Brad Douglas wrote:

I would like to add a macro into aclocal.m4/configure.in to test for
architecture byte order. This would then define something like
G_BIGENDIAN 0/1 in include/config.h.in.

I think this would be better than testing for it constantly with
G_is_little_endian(). This would also keep it from being reimplemented
in libraries that are not interdependent (eg. lib/gis,
lib/vector/dglib).

I noticed that endian is not even considered in lib/image. I'll fix
that if I get the clear for the above (or some reversion of).

Comments? Caveats?

Personally, I would suggest just writing code which doesn't rely upon
the system's byte order; i.e. explicitly convert byte-arrays to words
using left-shift+OR and vice-versa using right-shift+AND.

-----
William Kyngesburye <kyngchaos@kyngchaos.com>
http://www.kyngchaos.com/

"Oh, look, I seem to have fallen down a deep, dark hole. Now what does that remind me of? Ah, yes - life."

- Marvin

Brad Douglas wrote:

According to Apple, we should be including <endian.h> and test
__BIG_ENDIAN or __LITTLE_ENDIAN. That's even simpler. Everything is
handled by the '--host=', '--build=', and '--target=' configure
switches.

<endian.h> isn't standard, so we would need a configure test, and an
alternative when the header isn't found.

--
Glynn Clements <glynn@gclements.plus.com>

Hamish wrote:

> r.in.mat and r.out.mat are littered with "sizeof(long) == 4"
> assumptions.

The MATv4 file format specifies this (IIRC, it's been a while now & I'd
have to look it up), at least for the header but I think for the arrays
as well. The file header specifies which endianness the data that
follows was written in, it allows either.

You're getting confused. sizeof(long) is whatever the compiler says it
is; the MATv4 format has no say in the matter.

If you mean that the fields are supposed to be 4-byte integers, that's
a different matter. In that case, the code needs to use 4-byte
integers, not "long".

> Also, AFAICT, r.out.mat always writes the output in the
> system's byte-order,

as specified by the format, byte order used is recorded in the header,

But there's no requirement that it's the same as the system's
byte-order, right?

> and r.in.mat just assumes that the file is in the system's byte-order
> (it checks, but doesn't do anything in the event of a mismatch).

In the event of a mismatch it triggers a warning that this is "TODO" and
the rest will likely not succeed. I prefer that to a G_fatal_error(), it
encourages help with debugging. It is likely that more of those warnings
are needed for other endian/64bit permutations. The situation is also
mentioned in the r.in.mat help page. I'd rather invest the time fixing
the problem vs. going to great lengths to add more elablorate tests to
provide "sorry," messages.

Right. So explicit [de]serialisation will prevent all of those
problems while eliminating the need to check the system's byte order.

> Both of those programs need to be substantially re-written.

I welcome help. Bitwise operations are not my forte.

Converting an integer to 4 bytes, little-endian:

  void serialise_int32_le(unsigned char *buf, long x)
  {
    int i;
    for (i = 0; i < 4; i++)
      buf[i] = (x >> i*8) & 0xFF;
  }
  
Converting an integer to 4 bytes, big-endian:
  
  void serialise_int32_be(unsigned char *buf, long x)
  {
    int i;
    for (i = 0; i < 4; i++)
      buf[3-i] = (x >> i*8) & 0xFF;
  }
  
Converting 4 bytes, little-endian, to an integer;
  
  long deserialise_int32_le(const unsigned char *buf)
  {
    long x = 0;
    for (i = 0; i < 4; i++)
      x |= (buf[i] << i*8);
    return x;
  }
  
Converting 4 bytes, big-endian, to an integer;
  
  long deserialise_int32_le(const unsigned char *buf)
  {
    long x = 0;
    for (i = 0; i < 4; i++)
      x |= (buf[3-i] << i*8);
    return x;
  }

These work regardless of whether the system is big- or little-endian
or whether x is 32 or 64 bits.

[r.out.bin]
> If you change the semantics of that flag so that the absence of the -s
> switch means little-endian while the presence of the flag means
> big-endian, r.out.bin doesn't need to know the host's byte order, and
> a given r.out.bin command achieves the same result (file in big-endian
> format or file in little-endian format) regardless of the system's
> byte order.

mmph. don't make the confusion worse; change the flag's letter to
something else and loudly warn -s is superseded. Are you advocating that
the default mode should not write out in the native byte order?!

That's correct. The system's byte order is irrelevant for file
formats.

Is that really "expected behavior"? I would think it preferable to have -b and
-l flags to force big or little if you want them, otherwise go native.

-b and -l make sense, but the default should be one or other
regardless of the CPU type. The system's byte order is irrelevant for
file formats.

(but -b is taken for BIL, "-l" should be avoided as it looks the same as
"-1" in some fonts, and -e,-E has issues for mingw32 people...?)

> As for backwards compatibility, making the default byte order (no -s
> flag) little-endian means that anyone using x86 (i.e. most users) will
> be unaffected.

that's pretty crap for the non x86 crowd (who are the ones most likely
to need that flag in the first place).

Probably not. Most of the DEMs I've come across (e.g. ETOPO30) are in
big-endian format, so it's the x86 users who need -s.

> IOW, I've yet to come across a situation which actually has a
> legitimate reason to know the system's byte order.

my feelings are: seamless at the user end is good. ambiguity is bad.

Exactly.

A specific command should achieve a specific result without the user
first having to run "uname -m" then find out whether SPARC is big- or
little-endian so that they know whether or not to use -s.

There should be one option for creating little-endian files and
another for big-endian files. The system's byte order is irrelevant.

> BTW, when it comes to floating-point values, the situation isn't as
> simple as big- or little-endian. On some systems, FP values may use a
> different byte order to integers, or double-precision FP values may
> have the 32-bit halves in a different order than the order of bytes
> within a word.

fun fun fun

Yep. Which is presumably why libgis uses XDR for FP rasters.

--
Glynn Clements <glynn@gclements.plus.com>

On Mon, 2006-05-08 at 23:26 +0100, Glynn Clements wrote:

Converting an integer to 4 bytes, little-endian:

  void serialise_int32_le(unsigned char *buf, long x)
  {
    int i;
    for (i = 0; i < 4; i++)
      buf[i] = (x >> i*8) & 0xFF;
  }
  
Converting an integer to 4 bytes, big-endian:
  
  void serialise_int32_be(unsigned char *buf, long x)
  {
    int i;
    for (i = 0; i < 4; i++)
      buf[3-i] = (x >> i*8) & 0xFF;
  }
  
Converting 4 bytes, little-endian, to an integer;
  
  long deserialise_int32_le(const unsigned char *buf)
  {
    long x = 0;
    for (i = 0; i < 4; i++)
      x |= (buf[i] << i*8);
    return x;
  }
  
Converting 4 bytes, big-endian, to an integer;
  
  long deserialise_int32_le(const unsigned char *buf)
  {
    long x = 0;
    for (i = 0; i < 4; i++)
      x |= (buf[3-i] << i*8);
    return x;
  }

These work regardless of whether the system is big- or little-endian
or whether x is 32 or 64 bits.

Understood re: endian programming issues. Should it go into SUBMITTING?

Does it make sense to have these functions, string/debug, and memory
related functions in a separate library? More trouble than it's worth?

I think we could benefit from having a small library (libcommon?) that
all libraries would link against (where appropriate), rather than having
a number of libraries link to libgis or duplicate efforts where it
really isn't necessary.

--
Brad Douglas <rez touchofmadness com> KB8UYR
Address: 37.493,-121.924 / WGS84 National Map Corps #TNMC-3785

Brad Douglas wrote:

Understood re: endian programming issues. Should it go into SUBMITTING?

SUBMITTING isn't supposed to be a tutorial on C programming (even if
many contributors badly need one).

Does it make sense to have these functions, string/debug, and memory
related functions in a separate library? More trouble than it's worth?

I'm not sure that a library makes sense; it's probably better for
import/export modules to do it themselves. Apart from anything else,
it allows the compiler to inline the code; such modules will typically
be [de]serialising arrays rather than individual integers.

I think we could benefit from having a small library (libcommon?) that
all libraries would link against (where appropriate), rather than having
a number of libraries link to libgis or duplicate efforts where it
really isn't necessary.

Anything which is part of GRASS should link against libgis for core
functionality such as memory allocation, error handling, accessing
files within the database directory, etc. There aren't many cases
where a program which doesn't use anything from libgis warrants being
included as part of GRASS.

--
Glynn Clements <glynn@gclements.plus.com>

On Wed, 2006-05-10 at 02:43 +0100, Glynn Clements wrote:

Brad Douglas wrote:

> Understood re: endian programming issues. Should it go into SUBMITTING?

SUBMITTING isn't supposed to be a tutorial on C programming (even if
many contributors badly need one).

I don't entirely agree. A lot of this is undocumented overarching
design decisions/goals. You may find this self-evident, but I do
not. :frowning:

--
Brad Douglas <rez touchofmadness com> KB8UYR
Address: 37.493,-121.924 / WGS84 National Map Corps #TNMC-3785

> > Understood re: endian programming issues. Should it go into
> > SUBMITTING?
>
> SUBMITTING isn't supposed to be a tutorial on C programming (even if
> many contributors badly need one).

FWIW, this is what the Linux Kernel's version looks like:
  http://lxr.linux.no/source/Documentation/CodingStyle

Hamish

On Wed, 2006-05-10 at 18:29 +1200, Hamish wrote:

> > > Understood re: endian programming issues. Should it go into
> > > SUBMITTING?
> >
> > SUBMITTING isn't supposed to be a tutorial on C programming (even if
> > many contributors badly need one).

FWIW, this is what the Linux Kernel's version looks like:
  http://lxr.linux.no/source/Documentation/CodingStyle

I believe GRASS should have a similar document that is a bit more
expansive and covering design decisions that affect coding style. I
think it would cut down on repeat questions.

I have no issues producing the first draft.

--
Brad Douglas <rez touchofmadness com> KB8UYR
Address: 37.493,-121.924 / WGS84 National Map Corps #TNMC-3785

Brad:

> Does it make sense to have these functions, string/debug, and memory
> related functions in a separate library? More trouble than it's
> worth?

Glynn:

I'm not sure that a library makes sense; it's probably better for
import/export modules to do it themselves. Apart from anything else,
it allows the compiler to inline the code; such modules will typically
be [de]serialising arrays rather than individual integers.

They are sufficiently abstruse to encourage subtle mistakes, so having
them predefined somewhere would be nice. ... What if the endian
transforms were written as macros (or inline functions) that could be
#included from a header file when needed? Does that solve both problems?
The alternative is cut&paste into each module and hope that down the
road no one changes one version but not the others.

Glynn:

SUBMITTING isn't supposed to be a tutorial on C programming
(even if many contributors badly need one).

Many of us are researchers who know a little C programming, not the
other way around. We are lucky to have a few experts around to help
point out the errs of our ways, but still we need all the help and
on-the-job training we can get. It may be redundant and beyond the scope
of the SUBMITTING document to recreate K&R in full, but it wouldn't hurt
to throw in a small reference section at the end of the document
pointing to some background info on safe strings, pointers, memory
allocation, etc., usage; links to other project's SUMITTING files..

me:

> FWIW, this is what the Linux Kernel's version looks like:
> http://lxr.linux.no/source/Documentation/CodingStyle

Brad:

I believe GRASS should have a similar document that is a bit more
expansive and covering design decisions that affect coding style. I
think it would cut down on repeat questions.

Can you provide some examples of "design decisions"? Just curious.

Hamish

Hamish wrote:

> > Does it make sense to have these functions, string/debug, and memory
> > related functions in a separate library? More trouble than it's
> > worth?
Glynn:
> I'm not sure that a library makes sense; it's probably better for
> import/export modules to do it themselves. Apart from anything else,
> it allows the compiler to inline the code; such modules will typically
> be [de]serialising arrays rather than individual integers.

They are sufficiently abstruse to encourage subtle mistakes, so having
them predefined somewhere would be nice. ...

I'll have to defer to the opinions of others on that. To me, they look
quite trivial.

It may be more clear to unroll the loops, i.e.:

  void serialize_int32_le(unsigned char *buf, long x)
  {
    buf[0] = (x >> 0) & 0xFF;
    buf[1] = (x >> 8) & 0xFF;
    buf[2] = (x >> 16) & 0xFF;
    buf[3] = (x >> 24) & 0xFF;
  }
  
  void serialize_int32_be(unsigned char *buf, long x)
  {
    buf[0] = (x >> 24) & 0xFF;
    buf[1] = (x >> 16) & 0xFF;
    buf[2] = (x >> 8) & 0xFF;
    buf[3] = (x >> 0) & 0xFF;
  }
  
  long deserialize_int32_le(const unsigned char *buf)
  {
    return buf[0] | (buf[1] << 8) | (buf[2] << 16) | (buf[3] << 24);
  }
  
  long deserialize_int32_be(const unsigned char *buf)
  {
    return (buf[0] << 24) | (buf[1] << 16) | (buf[2] << 8) | buf[3];
  }

[There was a typo in my previous email; the last two functions both
had the _le suffix.]

What if the endian
transforms were written as macros (or inline functions) that could be
#included from a header file when needed? Does that solve both problems?
The alternative is cut&paste into each module and hope that down the
road no one changes one version but not the others.

Both macros and inline functions have problems of their own. The main
problem with macros is that macro calls look like function calls but
don't necessarily behave like them; if this results in a bug, it can
be very hard to track down. The main problem with inline functions is
the compiler isn't guaranteed to inline them.

Between the two, I'd suggest macros, e.g.:

  #define serialize_int32_le(buf, x) \
  do { \
    (buf)[0] = ((x) >> 0) & 0xFF; \
    (buf)[1] = ((x) >> 8) & 0xFF; \
    (buf)[2] = ((x) >> 16) & 0xFF; \
    (buf)[3] = ((x) >> 24) & 0xFF; \
  } while(0)
    
  #define serialize_int32_be(buf, x) \
  do { \
    (buf)[0] = ((x) >> 24) & 0xFF; \
    (buf)[1] = ((x) >> 16) & 0xFF; \
    (buf)[2] = ((x) >> 8) & 0xFF; \
    (buf)[3] = ((x) >> 0) & 0xFF; \
  } while(0)
  
  #define deserialize_int32_le(buf) \
    ((buf)[0] | ((buf)[1] << 8) | ((buf)[2] << 16) | ((buf)[3] << 24))
  
  #define deserialize_int32_be(buf) \
    (((buf)[0] << 24) | ((buf)[1] << 16) | ((buf)[2] << 8) | (buf)[3])

Glynn:
> SUBMITTING isn't supposed to be a tutorial on C programming
> (even if many contributors badly need one).

In retrospect, I'd like to apologise for the unnecessarily snarky tone
of that comment.

Many of us are researchers who know a little C programming, not the
other way around. We are lucky to have a few experts around to help
point out the errs of our ways, but still we need all the help and
on-the-job training we can get. It may be redundant and beyond the scope
of the SUBMITTING document to recreate K&R in full, but it wouldn't hurt
to throw in a small reference section at the end of the document
pointing to some background info on safe strings, pointers, memory
allocation, etc., usage; links to other project's SUMITTING files..

But what to include?

A SUBMITING file is normally used for issues which are specific to
that particular project.

I'd suggest including a list of common errors, but there aren't really
any specific errors which occur frequently. Which isn't really
surprising given that GRASS consists of so many different types of
code.

--
Glynn Clements <glynn@gclements.plus.com>

On Thu, 2006-05-11 at 17:43 +1200, Hamish wrote:

Glynn:
> I'm not sure that a library makes sense; it's probably better for
> import/export modules to do it themselves. Apart from anything else,
> it allows the compiler to inline the code; such modules will typically
> be [de]serialising arrays rather than individual integers.

They are sufficiently abstruse to encourage subtle mistakes, so having
them predefined somewhere would be nice. ... What if the endian
transforms were written as macros (or inline functions) that could be
#included from a header file when needed? Does that solve both problems?
The alternative is cut&paste into each module and hope that down the
road no one changes one version but not the others.

I suppose that is a middle ground. I don't see any of the macro
pitfalls really applying. If there are no objections, I'll add them to
include/gis.h.

Something along the lines of:

#define SERIALISZE_INT32_LE(buf, x) do { \
  buf[0] = (x >> 0) & 0xFF; \
  buf[1] = (x >> 8) & 0xFF; \
  buf[2] = (x >> 16) & 0xFF; \
  buf[3] = (x >> 24) & 0xFF; \
} while(0)

#define DESERIALIZE_INT32_LE(buf) ((buf[0] | (buf[1] << 8) | (buf[2] << 16) | (buf[3] << 24))

...

I'm really against re-writing the same code over and over again, no
matter how trivial (where constraints allow). Yeah, function calls are
expensive, but human error and trying to upgrade duplication can be far
more expensive.

Glynn:
> SUBMITTING isn't supposed to be a tutorial on C programming
> (even if many contributors badly need one).

Many of us are researchers who know a little C programming, not the
other way around. We are lucky to have a few experts around to help
point out the errs of our ways, but still we need all the help and
on-the-job training we can get. It may be redundant and beyond the scope
of the SUBMITTING document to recreate K&R in full, but it wouldn't hurt
to throw in a small reference section at the end of the document
pointing to some background info on safe strings, pointers, memory
allocation, etc., usage; links to other project's SUBMITTING files..

Both you and Glynn have valid points, here. I would like to see a
separate chapter in the programmer's manual as a HOWTO/tutorial. We can
use the SUBMITTING* files as a starting point, where applicable.

It looks like some individuals have made some effort in this respect.
Some parts are written in more of a tutorial style (eg. vector
architecture page).

me:
> > FWIW, this is what the Linux Kernel's version looks like:
> > http://lxr.linux.no/source/Documentation/CodingStyle
Brad:
> I believe GRASS should have a similar document that is a bit more
> expansive and covering design decisions that affect coding style. I
> think it would cut down on repeat questions.

Can you provide some examples of "design decisions"? Just curious.

Decisions made in the past that the majority of us who haven't been here
for the long-haul understand. Few things have been formalized and there
probably was no reason to do so early on, but GRASS has grown to such a
size that warrants formalization to some extent, IMHO.

Examples? A weak example is the decision to determine endianess at run
time. Or the decision to use lib/gis as a repository for functions
common to other libraries (memory allocation, etc.), then some of those
libraries not using lib/gis account of "bloat".

If some of these things sound trivial, it's probably because I take a
very user-centric approach. I try to gear things to make the user
experience better/easier, perhaps to a fault at times.

Does this help explain my stance a bit better?

--
Brad Douglas <rez touchofmadness com> KB8UYR
Address: 37.493,-121.924 / WGS84 National Map Corps #TNMC-3785

I would like to humbly mention that the issues below are another reason why
GRASS probably needs something like a PSC and process for formalizing such
decisions now. It has grown from a CERL in-house project to an enormous,
exceedingly complex, wonderfully active, and very powerful geospatial
research tool. A year or so ago, before some code clean up, someone
(Hamish?) noted that GRASS is one of the largest open-source projects in
existence with respect to the size of its code base.

These features that make this such a great project and useful software also
make it increasingly complex to self-mangage in an organized way. We all do
an impressive job of it, but a more structured way of coming to design
decisions would be helpful and will become increasingly important if the
project continues to grow and be dynamic. Maybe we need a fairly loose and
heterarchical kind of organization, but some kind of more formal process
would be helpful to me at least.

Michael
__________________________________________
Michael Barton, Professor of Anthropology
School of Human Evolution & Social Change
Center for Social Dynamics & Complexity
Arizona State University

phone: 480-965-6213
fax: 480-965-7671
www: http://www.public.asu.edu/~cmbarton

From: Brad Douglas <rez@touchofmadness.com>
Reply-To: <rez@touchofmadness.com>
Date: Sat, 20 May 2006 15:02:41 -0700
To: Hamish <hamish_nospam@yahoo.com>, Glynn Clements
<glynn@gclements.plus.com>
Cc: <grass-dev@grass.itc.it>
Subject: Re: [GRASS-dev] endian & generic library

Brad:

I believe GRASS should have a similar document that is a bit more
expansive and covering design decisions that affect coding style. I
think it would cut down on repeat questions.

Can you provide some examples of "design decisions"? Just curious.

Decisions made in the past that the majority of us who haven't been here
for the long-haul understand. Few things have been formalized and there
probably was no reason to do so early on, but GRASS has grown to such a
size that warrants formalization to some extent, IMHO.

Examples? A weak example is the decision to determine endianess at run
time. Or the decision to use lib/gis as a repository for functions
common to other libraries (memory allocation, etc.), then some of those
libraries not using lib/gis account of "bloat".

If some of these things sound trivial, it's probably because I take a
very user-centric approach. I try to gear things to make the user
experience better/easier, perhaps to a fault at times.

Does this help explain my stance a bit better?

--
Brad Douglas <rez touchofmadness com> KB8UYR
Address: 37.493,-121.924 / WGS84 National Map Corps #TNMC-3785

On Sat, 2006-05-20 at 16:09 -0700, Michael Barton wrote:

I would like to humbly mention that the issues below are another reason why
GRASS probably needs something like a PSC and process for formalizing such
decisions now. It has grown from a CERL in-house project to an enormous,
exceedingly complex, wonderfully active, and very powerful geospatial
research tool. A year or so ago, before some code clean up, someone
(Hamish?) noted that GRASS is one of the largest open-source projects in
existence with respect to the size of its code base.

These features that make this such a great project and useful software also
make it increasingly complex to self-manage in an organized way. We all do
an impressive job of it, but a more structured way of coming to design
decisions would be helpful and will become increasingly important if the
project continues to grow and be dynamic. Maybe we need a fairly loose and
hierarchical kind of organization, but some kind of more formal process
would be helpful to me at least.

Agreed. Completely.

Markus, what is the current status of the PSC and foundation?

--
Brad Douglas <rez touchofmadness com> KB8UYR
Address: 37.493,-121.924 / WGS84 National Map Corps #TNMC-3785