Dear devs,
i have implemented a new raster3d module called r3.stats.
This module calculates volume statistics for raster3d maps.
The functionality is mostly similar to r.stats, except the category support for raster3d maps (AFAIK categories are implemented in g3dlib but not used) and the output format (i dont see the necessity of dozen format flags).
r3.stats calculates volume statistics based on subranges like r.stats
and volume statistics based on groups of unique values. Examples are available in the documentation of the module.
This module is implemented from scratch, its not based on the code of r.stats. Because r.stats has a bug in computing subrange statistics which i was not able to correct.
r3.stats is not that sophisticated like r.stats, because it does not use linked lists and hash tables, it can be very memory consuming and i guess it is by far not as fast as r.stats. Because of the quadratic complexity for subrange calculations, the number of subranges is limited to 1 - 100000.
r3.stats uses heapsort to calculate groups of equal values. Because of this and the fact that r3.univar uses heapsort and r.univar hopefully soon too, i would like to add heapsort as library function to grass:
extern int G_heapsort_int(int *array, int num);
extern int G_heapsort_float(float *array, int num);
extern int G_heapsort_double(double *array, int num);
Important:
Please do not test r3.stats bevor i have submitted the data type patch i have announced one day befor.
Because of the quadratic complexity for subrange calculations, the
number of subranges is limited to 1 - 100000.
It is much better to add a warning in that case that the operation will
take some time, or let the G_alloc() fns do their job and report a
memory error (preferably right at the start of the run).
Some folks using GRASS will have access to supercomputers or big servers
with 28+GB RAM, we shouldn't artificially put limits on them because we
don't have that hardware ourselves. Or maybe it is common to have 16GB
64bit motherboards in the next few years, then we have to revisit the
code to let that be used.
Because of the quadratic complexity for subrange calculations, the
number of subranges is limited to 1 - 100000.
It is much better to add a warning in that case that the operation will
take some time, or let the G_alloc() fns do their job and report a
memory error (preferably right at the start of the run).
I agree.
Some folks using GRASS will have access to supercomputers or big servers
with 28+GB RAM, we shouldn't artificially put limits on them because we
don't have that hardware ourselves. Or maybe it is common to have 16GB
64bit motherboards in the next few years, then we have to revisit the
code to let that be used.
I think the RAM is not the limiting factor, the quadratic complexity
bothers me. I need to figure out what Michael Shapiro had in mind, when he coded r.stats ... for me it was easier to code it by myself and
invent the weel again, then understanding the code of r.stats.
Well, i have a binary tree search for subranges in mind and
r3.stats is still in an early development state. So there are
still a lot of things i can try out to reduce the memory usage and
the needed CPU cycles.