[GRASS-dev] s.cellstats replacement for GRASS 6

Hi,

LIDAR & swath bathymetry people [3million+ input points]: I have an
idea for a s.cellstats replacement. Will call it r.in.xyz (name is from
a 2001 wish by Roy Sanderson). Output map options will be the same as
r.univar.

The module will be quite memory intensive - for most stats binning into
a 10000x10000 region will take ~800mb RAM for a CELL,FCELL map & 1.6gb
for a DCELL. Simple stats (min,max,n,sum) would take half that amount.

A low memory option could cut complex stats memory requirements in half
(e.g. for parameter=mean do in two passes, first write "n" .tmp map to
disk before populating "sum" in a second pass). Another idea to save
memory is to cut the region up into segments and process the data in
multiple passes. I blindly guess that the second method is more
efficient.

There'll be a test for z-range so the module can be used iteratively to
build 3D raster slices (for r.to.rast3). Same memory/segment stratagem
could be employed to populate a 3D raster in multiple passes of a single
call to the module.

The gist is to read in a line, test if the point is in the region
bounds, then use column = int((input_x - west) / ew_res), row=same; to
figure which cell to increment. Finally write out raster as f(a[,b]),
row by row.

Maybe the existing s.cellstats code is more efficient than my brute
force way of keeping whole output map components in memory? The old
sites format isn't too far from ascii x,y,z, so if s.cellstats is
clearly a better method, maybe it is better to adapt that instead?
No idea.

Hamish

FYI. GMT implements several important filters for XYZ data that GRASS
lacks, for example binning data into a raster (be various means) grid
prior to actually importing the data. In some cases these can
substantially decrease the processing time of griding operations as
well as remove aliasing problems associated with some data set/griding
algorithm combinations.

A module like you suggest might benefit from some of these prefilters.
See the manual pages for GMT for FILTERING OF 1-D AND 2-D DATA at the
top of the GMT man pages:

http://gmt.soest.hawaii.edu/gmt/gmt_man.html

On 5/26/06, Hamish <hamish_nospam@yahoo.com> wrote:

Hi,

LIDAR & swath bathymetry people [3million+ input points]: I have an
idea for a s.cellstats replacement. Will call it r.in.xyz (name is from
a 2001 wish by Roy Sanderson). Output map options will be the same as
r.univar.

The module will be quite memory intensive - for most stats binning into
a 10000x10000 region will take ~800mb RAM for a CELL,FCELL map & 1.6gb
for a DCELL. Simple stats (min,max,n,sum) would take half that amount.

A low memory option could cut complex stats memory requirements in half
(e.g. for parameter=mean do in two passes, first write "n" .tmp map to
disk before populating "sum" in a second pass). Another idea to save
memory is to cut the region up into segments and process the data in
multiple passes. I blindly guess that the second method is more
efficient.

There'll be a test for z-range so the module can be used iteratively to
build 3D raster slices (for r.to.rast3). Same memory/segment stratagem
could be employed to populate a 3D raster in multiple passes of a single
call to the module.

The gist is to read in a line, test if the point is in the region
bounds, then use column = int((input_x - west) / ew_res), row=same; to
figure which cell to increment. Finally write out raster as f(a[,b]),
row by row.

Maybe the existing s.cellstats code is more efficient than my brute
force way of keeping whole output map components in memory? The old
sites format isn't too far from ascii x,y,z, so if s.cellstats is
clearly a better method, maybe it is better to adapt that instead?
No idea.

Hamish

_______________________________________________
grass-dev mailing list
grass-dev@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass-dev

--
David Finlayson

On May 27, 2006, at 2:09 AM, Hamish wrote:

Hi,

LIDAR & swath bathymetry people [3million+ input points]: I have an
idea for a s.cellstats replacement. Will call it r.in.xyz (name is from
a 2001 wish by Roy Sanderson). Output map options will be the same as
r.univar.

Hamish, s.cellstats inputs and outputs a site file, what you are proposing is supposed
to read an ascii x,y,z and output a raster file? If that is true, you may want to look
also at s.to.rast2 in GRASS Add-ons: http://grass.itc.it/outgoing/
(it includes an option to include data from surrounding cells, so if there are
small holes in data, they can be filled-in).
I have used s.to.rast a lot with lidar data or the binning option that provides
the on-line LDART tool (http://ekman.csc.noaa.gov/TCM/) and I miss it a lot in GRASS6.
Your suggestion to go directly from ascii points to raster
would get around the problems that the current vector support has with large data sets
(otherwise I would have suggested to do v.in.ascii and add this to v.to.rast so that attributes
can be used instead of z as well).

The module will be quite memory intensive - for most stats binning into
a 10000x10000 region will take ~800mb RAM for a CELL,FCELL map & 1.6gb
for a DCELL. Simple stats (min,max,n,sum) would take half that amount.

A low memory option could cut complex stats memory requirements in half
(e.g. for parameter=mean do in two passes, first write "n" .tmp map to
disk before populating "sum" in a second pass). Another idea to save
memory is to cut the region up into segments and process the data in
multiple passes. I blindly guess that the second method is more
efficient.

I agree that processing it by segments is preferable.

There'll be a test for z-range so the module can be used iteratively to
build 3D raster slices (for r.to.rast3). Same memory/segment stratagem
could be employed to populate a 3D raster in multiple passes of a single
call to the module.

The gist is to read in a line, test if the point is in the region
bounds, then use column = int((input_x - west) / ew_res), row=same; to
figure which cell to increment. Finally write out raster as f(a[,b]),
row by row.

Maybe the existing s.cellstats code is more efficient than my brute
force way of keeping whole output map components in memory? The old
sites format isn't too far from ascii x,y,z, so if s.cellstats is
clearly a better method, maybe it is better to adapt that instead?

I have never used (or looked at the code of) s.cellstats or s.to.rast2,
so I cannot tell - let me know if it would be useful to try it out -
I have GRASS53 still running.
I have cc-d this to researchers working on the GEON project - we have discussed
the binning idea using quadtrees recently.
Andy may have some suggestions too,

Helena

No idea.

Hamish

_______________________________________________
grass-dev mailing list
grass-dev@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass-dev

me:

I have an idea for a s.cellstats replacement. Will call it r.in.xyz
[...] Output map options will be the same as r.univar.

David:

FYI. GMT implements several important filters for XYZ data that GRASS
lacks, for example binning data into a raster (be various means) grid
prior to actually importing the data. In some cases these can
substantially decrease the processing time of griding operations as
well as remove aliasing problems associated with some data set/griding
algorithm combinations.

Which is essentially s.in.ascii + s.cellstats + s.to.rast in GRASS 5.

So existing work arounds include "use GRASS 5", use GMT xyz2grd software
mentioned above & import GMT NetCDF to GRASS with r.in.gdal, and GRASS
s.to.rast2 add-on/LDART tool as mentioned by Helena. I guess there is a
solution using R-stats as well?

David:

A module like you suggest might benefit from some of these prefilters.
See the manual pages for GMT for FILTERING OF 1-D AND 2-D DATA at the
top of the GMT man pages:

http://gmt.soest.hawaii.edu/gmt/gmt_man.html

I'll have a look, thanks.

Helena:

s.cellstats inputs and outputs a site file,

technically, yes; but as s.cellstats output is gridded data preformed
for s.to.rast input, the s.to.rast step was implied.

what you are proposing is supposed to read an ascii x,y,z and output a
raster file?

yes. s.in.ascii + s.cellstats + s.to.rast in one step.

If that is true, you may want to look also at s.to.rast2 in GRASS
Add-ons: http://grass.itc.it/outgoing/

will do.

(it includes an option to include data from surrounding cells, so if
there are small holes in data, they can be filled-in).

For r.in.xyz, I think this is best left to r.neighbors & r.fillnulls.
Will you lose data resolution that way? I have not done the algebra to
check, but guess for the simple stats you will not. I guess you could
weight result due to uneven "n" in surrounding cells for average calc.

I have used s.to.rast a lot with lidar data or the binning option
that provides the on-line LDART tool (http://ekman.csc.noaa.gov/TCM/)
and I miss it a lot in GRASS6.

A more s.cellstats-like alternative could be a v.in.xyz.grid module,
which culls vector points to a grid ready for v.to.rast. Advantage is
access to advanced stats that s.cellstats has but r.univar does not, and
each point could have full stats as 7-10 attribute columns. This could
even be a -v flag from r.in.xyz. I'd rather the solution be that
r.univar get advanced stats support (median,..) from new library fns &
we could reuse the method in r.in.xyz. Note v.in.ascii memory limit is
much smaller than r.in.xyz (single-pass) memory limit. (~ 1750^2 grid vs
~ 10000^2 grid)

Your suggestion to go directly from ascii points to raster would get
around the problems that the current vector support has with large
data sets

That is the motivation & idea.

(otherwise I would have suggested to do v.in.ascii and add this to
v.to.rast so that attributes can be used instead of z as well).

I don't dispute the general need, but in practice the only massive point
dataset I can think of with more than x,y,z is sonar signal strength.
If there was a need the r.in.xyz module could have xcolumn= ycolumn=
zcolumn= parameters so 4th+ column could be used as z to make the
raster. Non-numeric + non-categorical attributes for that many input
points will have to be left for the future.

Hamish