[GRASS5] Re: vector point dataset

I tried to change the library so that spatial index is not
stored to file and it is built only if needed.

Test on vector : 149971 boundaries, 99972 areas, 99972 centroids

module | Old version | New version |
v.build | 55s | 54s |
v.distance (from 1 point ) | 6s | 29s |
v.distance (from 100 points) | 3m50s | 4m30s |

It means, that if a module needs spatial index, it takes
5 times more time to get it ready then before, but the difference
is less important in real applications, because usually it is
not used just once.

There are circa 10 modules from 60 which needs spatial index.

We can have either faster 10 modules or we can spare the space occupied
by spatial index file.

What is your opinion?

Radim

Radim Blazek wrote:

It is normal.
Spatial index is important also for points, I think. Otherwise v.distance for example
must always go through all points. Say that you have another vector with 4,000,000
and you want to find the nearst in the first one. Withou spatial index it
must do 4,000,000 x 4,000,000 checks.
Spatial index is stored as tree of boxes, 6x8 bytes each, so 430 M is possible.

Any advice appreciated....

Brent Wood wrote:

I have a vector point dataset (ascii XYZ, the basis of a DEM).

I'm importing into GRASS5.7 with

cat <file> | v.in.ascii -Z output=nzxyz xcol=1 ycol=2 zcol=2 catcol=0

There are about 4,000,000 XYZ points.

The vector coor file is 150Mb. GRASS is now building the topology & index.
The topo file is almost 200Mb and the sidx is at 430Mb and growing.

Is this normal? It seems very excessive for a point dataset. A spatial
index & topology make more sense for line/polygon data, with over 4x the
actual data volume to store this extra info doesn't look right somehow...

The system is a SuSE Linux 9.1 23 bit OS on A64 3500 with 1Gb memory.
Swapped out 1.4Gb & still using 800Mb main memory.

Any advice appreciated....

Brent Wood

Brent Wood wrote:

Thank you for following this up, 'tis appreciated.

Hmm.... I'm not using GRASS much, mostly GMT/QGIS/PostGIS, but I want to
do more with GRASS. I appreciate your comment on the value of indices for
points. I guess with modern hard drives the space is not a huge issue, and
the time taken to import the file & create the index is only a once off,
whereas queries are likely to be ongoing.

A typical use for me, would be taking 120,000,000 point elevations and
building a DEM. I have mainly used GMT for this, so hope to simply import
the GMT netCDF grids (so far unsuccesfully), but I was also interested in
building the model with GRASS, as it supposedly has some excellent tools
for this. I've tried a few times, but so far I have been unable to get a
DEM built by GRASS.

GMT takes about 3 hrs on a fast PC. The same box with GRASS has been
running for 15hrs with no result. I do need to look into this more, & it
is not directly in answer to your question, but is background to what I'm
trying to achieve with GRASS.

As long as GRASS was doing what it should, & the index files are useful I
have no problem with them being built.

Can you specify better which GRASS modules do you use?
v.in.ascii creates by default attribute table, that is slow as
it is using SQL to write data to a database. Try to run v.in.ascii
with '-t' flag.

For v.surf.rst and v.surf.idw, both opens vector with support files
(topology, spatial index, category index). We could consider also reading vectors without support files. Currently category index is not used anyway but it can be useful if only small part of input points
is used. RST library used by v.surf.rst also uses topology
to get nodes for connected lines to don't duplicate line end points.

In general, vector modules usually open vectors with support files,
which can be slow for large files. We should check all modules if it is necessary. It may be, that even modules which select features by box
could be faster without spatial index if read all features only once
(v.surf.*,v.to.rast,...) but modules which need to search many times
certainly need spatial index (e.g. v.distance, v.select, ...).

Until now, we were trying to get things working, no we can look
better also at performance.

Something I'm not aware of is the approach used by GRASS57 for accessing
data from a PostGIS table. If the points were stored in PostGIS and
accessed by GRASS, I presume GRASS would not build an index. I have not
yet tried to build a DEM using GRASS to work with points in PostGIS.

You can define external data source with v.external. v.external
also creates support files, but you can also create link to external
data in text editor. That is not solution however, because v.surf.*
will ask you to build topology.

Radim

I have submitted the changes: spatial index is not saved to
file, it is built only if necessary. It is built automaticaly
if Vect_select_* is called, but it is probably better
to call Vect_build_spatial_index from module (like in d.what.vect).

Another change: it took lang time to free memory occupied by support
structures and usually it is not necessary. So now, by default
support structures are not released when vector is closed.
This reduced time for v.build from previous example from 54
to 39s.

It is possible to use Vect_set_release_support if necessary
(for example v.clean). Let me know about other cases where
Vect_set_release_support should be added.

Radim

Brent Wood wrote:

On Tue, 16 Nov 2004, Radim Blazek wrote:

I tried to change the library so that spatial index is not
stored to file and it is built only if needed.

Test on vector : 149971 boundaries, 99972 areas, 99972 centroids

module | Old version | New version |
v.build | 55s | 54s |
v.distance (from 1 point ) | 6s | 29s |
v.distance (from 100 points) | 3m50s | 4m30s |

It means, that if a module needs spatial index, it takes
5 times more time to get it ready then before, but the difference
is less important in real applications, because usually it is
not used just once.

There are circa 10 modules from 60 which needs spatial index.

We can have either faster 10 modules or we can spare the space occupied
by spatial index file.

What is your opinion?

Thank you for following this up, 'tis appreciated.

Hmm.... I'm not using GRASS much, mostly GMT/QGIS/PostGIS, but I want to
do more with GRASS. I appreciate your comment on the value of indices for
points. I guess with modern hard drives the space is not a huge issue, and
the time taken to import the file & create the index is only a once off,
whereas queries are likely to be ongoing.

A typical use for me, would be taking 120,000,000 point elevations and
building a DEM. I have mainly used GMT for this, so hope to simply import
the GMT netCDF grids (so far unsuccesfully), but I was also interested in
building the model with GRASS, as it supposedly has some excellent tools
for this. I've tried a few times, but so far I have been unable to get a
DEM built by GRASS.

GMT takes about 3 hrs on a fast PC. The same box with GRASS has been
running for 15hrs with no result. I do need to look into this more, & it
is not directly in answer to your question, but is background to what I'm
trying to achieve with GRASS.

As long as GRASS was doing what it should, & the index files are useful I
have no problem with them being built.

Something I'm not aware of is the approach used by GRASS57 for accessing
data from a PostGIS table. If the points were stored in PostGIS and
accessed by GRASS, I presume GRASS would not build an index. I have not
yet tried to build a DEM using GRASS to work with points in PostGIS.

Thanks,

    Brent

Radim

Radim Blazek wrote:

It is normal.
Spatial index is important also for points, I think. Otherwise
v.distance for example
must always go through all points. Say that you have another vector with
4,000,000
and you want to find the nearst in the first one. Withou spatial index it
must do 4,000,000 x 4,000,000 checks.
Spatial index is stored as tree of boxes, 6x8 bytes each, so 430 M is
possible.

Any advice appreciated....

Brent Wood wrote:

I have a vector point dataset (ascii XYZ, the basis of a DEM).

I'm importing into GRASS5.7 with

cat <file> | v.in.ascii -Z output=nzxyz xcol=1 ycol=2 zcol=2 catcol=0

There are about 4,000,000 XYZ points.

The vector coor file is 150Mb. GRASS is now building the topology &
index.
The topo file is almost 200Mb and the sidx is at 430Mb and growing.

Is this normal? It seems very excessive for a point dataset. A spatial
index & topology make more sense for line/polygon data, with over 4x the
actual data volume to store this extra info doesn't look right somehow...

The system is a SuSE Linux 9.1 23 bit OS on A64 3500 with 1Gb memory.
Swapped out 1.4Gb & still using 800Mb main memory.

Any advice appreciated....

Brent Wood

You can delete old sidx files manualy or you can run v.build.all.

Radim

Radim Blazek wrote:

I have submitted the changes: spatial index is not saved to
file, it is built only if necessary. It is built automaticaly
if Vect_select_* is called, but it is probably better
to call Vect_build_spatial_index from module (like in d.what.vect).

Another change: it took lang time to free memory occupied by support
structures and usually it is not necessary. So now, by default
support structures are not released when vector is closed.
This reduced time for v.build from previous example from 54
to 39s.

It is possible to use Vect_set_release_support if necessary
(for example v.clean). Let me know about other cases where
Vect_set_release_support should be added.

Radim

Brent Wood wrote:

On Tue, 16 Nov 2004, Radim Blazek wrote:

I tried to change the library so that spatial index is not
stored to file and it is built only if needed.

Test on vector : 149971 boundaries, 99972 areas, 99972 centroids

module | Old version | New version |
v.build | 55s | 54s |
v.distance (from 1 point ) | 6s | 29s |
v.distance (from 100 points) | 3m50s | 4m30s |

It means, that if a module needs spatial index, it takes
5 times more time to get it ready then before, but the difference
is less important in real applications, because usually it is
not used just once.

There are circa 10 modules from 60 which needs spatial index.

We can have either faster 10 modules or we can spare the space occupied
by spatial index file.

What is your opinion?

Thank you for following this up, 'tis appreciated.

Hmm.... I'm not using GRASS much, mostly GMT/QGIS/PostGIS, but I want to
do more with GRASS. I appreciate your comment on the value of indices for
points. I guess with modern hard drives the space is not a huge issue, and
the time taken to import the file & create the index is only a once off,
whereas queries are likely to be ongoing.

A typical use for me, would be taking 120,000,000 point elevations and
building a DEM. I have mainly used GMT for this, so hope to simply import
the GMT netCDF grids (so far unsuccesfully), but I was also interested in
building the model with GRASS, as it supposedly has some excellent tools
for this. I've tried a few times, but so far I have been unable to get a
DEM built by GRASS.

GMT takes about 3 hrs on a fast PC. The same box with GRASS has been
running for 15hrs with no result. I do need to look into this more, & it
is not directly in answer to your question, but is background to what I'm
trying to achieve with GRASS.

As long as GRASS was doing what it should, & the index files are useful I
have no problem with them being built.

Something I'm not aware of is the approach used by GRASS57 for accessing
data from a PostGIS table. If the points were stored in PostGIS and
accessed by GRASS, I presume GRASS would not build an index. I have not
yet tried to build a DEM using GRASS to work with points in PostGIS.

Thanks,

    Brent

Radim

Radim Blazek wrote:

It is normal.
Spatial index is important also for points, I think. Otherwise
v.distance for example
must always go through all points. Say that you have another vector with
4,000,000
and you want to find the nearst in the first one. Withou spatial index it
must do 4,000,000 x 4,000,000 checks.
Spatial index is stored as tree of boxes, 6x8 bytes each, so 430 M is
possible.

Any advice appreciated....

Brent Wood wrote:

I have a vector point dataset (ascii XYZ, the basis of a DEM).

I'm importing into GRASS5.7 with

cat <file> | v.in.ascii -Z output=nzxyz xcol=1 ycol=2 zcol=2 catcol=0

There are about 4,000,000 XYZ points.

The vector coor file is 150Mb. GRASS is now building the topology &
index.
The topo file is almost 200Mb and the sidx is at 430Mb and growing.

Is this normal? It seems very excessive for a point dataset. A spatial
index & topology make more sense for line/polygon data, with over 4x the
actual data volume to store this extra info doesn't look right somehow...

The system is a SuSE Linux 9.1 23 bit OS on A64 3500 with 1Gb memory.
Swapped out 1.4Gb & still using 800Mb main memory.

Any advice appreciated....

Brent Wood

_______________________________________________
grass5 mailing list
grass5@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass5

Brent,

can you write me more about what you are doing?
Which version of GRASS are you using, what are your inputs
(ascii sites, vector points), what is the resolution and size
of the resulting grid, which program are you using to create the DEM
(s.to.rast, s.surf.idw, s.surf.rst or their v.* versions for 5.7),
and where exactly are you having problems.
We have a project (with authors of r.terraflow) for computing massive DEMs
and your data set looks like the size that we are aiming at, but there may be
other bottlenecks in GRASS than gridding so we need to look at those too.

Thank you,

Helena

Do you know which method is used by GMT to build the model?

Radim Blazek wrote:

Brent Wood wrote:

Thank you for following this up, 'tis appreciated.

Hmm.... I'm not using GRASS much, mostly GMT/QGIS/PostGIS, but I want to
do more with GRASS. I appreciate your comment on the value of indices for
points. I guess with modern hard drives the space is not a huge issue, and
the time taken to import the file & create the index is only a once off,
whereas queries are likely to be ongoing.

A typical use for me, would be taking 120,000,000 point elevations and
building a DEM. I have mainly used GMT for this, so hope to simply import
the GMT netCDF grids (so far unsuccesfully), but I was also interested in
building the model with GRASS, as it supposedly has some excellent tools
for this. I've tried a few times, but so far I have been unable to get a
DEM built by GRASS.

GMT takes about 3 hrs on a fast PC. The same box with GRASS has been
running for 15hrs with no result. I do need to look into this more, & it
is not directly in answer to your question, but is background to what I'm
trying to achieve with GRASS.

As long as GRASS was doing what it should, & the index files are useful I
have no problem with them being built.

Can you specify better which GRASS modules do you use?
v.in.ascii creates by default attribute table, that is slow as
it is using SQL to write data to a database. Try to run v.in.ascii
with '-t' flag.

For v.surf.rst and v.surf.idw, both opens vector with support files
(topology, spatial index, category index). We could consider also reading vectors without support files. Currently category index is not used anyway but it can be useful if only small part of input points
is used. RST library used by v.surf.rst also uses topology
to get nodes for connected lines to don't duplicate line end points.

In general, vector modules usually open vectors with support files,
which can be slow for large files. We should check all modules if it is necessary. It may be, that even modules which select features by box
could be faster without spatial index if read all features only once
(v.surf.*,v.to.rast,...) but modules which need to search many times
certainly need spatial index (e.g. v.distance, v.select, ...).

Until now, we were trying to get things working, no we can look
better also at performance.

Something I'm not aware of is the approach used by GRASS57 for accessing
data from a PostGIS table. If the points were stored in PostGIS and
accessed by GRASS, I presume GRASS would not build an index. I have not
yet tried to build a DEM using GRASS to work with points in PostGIS.

You can define external data source with v.external. v.external
also creates support files, but you can also create link to external
data in text editor. That is not solution however, because v.surf.*
will ask you to build topology.

Radim

_______________________________________________
grass5 mailing list
grass5@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass5

On Thu, 18 Nov 2004 01:22, Helena wrote:

We have a project (with authors of r.terraflow) for computing massive DEMs
and your data set looks like the size that we are aiming at, but there may
be other bottlenecks in GRASS than gridding so we need to look at those
too.

I'm working with bathymetry data from multibeam echosounders, typically
working with 10s of millions of points to create grids a few thousand by a
few thousand cells.

Is this project likely to be of interest to me (or vice versa)?

>> A typical use for me, would be taking 120,000,000 point elevations and
>> building a DEM. I have mainly used GMT for this, so hope to simply
>> import the GMT netCDF grids (so far unsuccesfully)

I've basically given up on creating the DEM in GRASS and do all the work in
MBSystem, which generates GMT format grids.

I haven't been able to import GMT netCDF grids (haven't really tried) but have
no trouble importing grids in what MBSystem describes as
"GMT format id 1: Native binary single precision floats in scanlines with
leading grd header" using 'r.in.bin -hf'. I think MBSystem also describes
this format as "binary file (GMT version 1 GRD file)".

I'm not real familiar with GMT, but this might be of help to you.

Regards
Gordon

--

Gordon Keith
Programmer/Data Analyst
Marine Acoustics
CSIRO Marine Research
http://www.marine.csiro.au

God showed his love for us by sending his only Son into the world,
so that we might have life through him.
  -- 1 John 4:9

On Thu, 18 Nov 2004 10:07, Brent Wood wrote:

I do occasionally work with multibeam data, tho that is peripheral at
present. I use Walter Smith's new S&S/GEBCO blended global topographic
grid for much of my bathymetric background work, where it involves regions
NIWA does not have it's own data. I think there are around 230,000,000
points/cells in this dataset (1 minute grid) I don't know if it is
published yet, but was posted on the GMT list.

I use the Australian bathymetry and topography grid, .01 degree data for 8-52
S 102-172E, about 30M points, for backgrounds. Grass handles this data fairly
well.

Importing it was very straightforward as it is already gridded, so v.to.rast
works fine so long as the region is set correctly.
g.region n=-7.995 s=52.005 e=101.995 w=172.005 res=.01

v.to.rast works fairly quickly. The other methods of creating a raster from
points don't seem to cope at all with large data set.

Actually, looking back, I used v.to.rast on the 20 different 1.5M point files
the data came in. I can't remember if that was just because the data came
that way or grass had problems with a single 30M point file.

Regards
Gordon

--

Gordon Keith
Programmer/Data Analyst
Marine Acoustics
CSIRO Marine Research
http://www.marine.csiro.au

We love because God first loved us.
  -- 1 John 4:19

We have a project (with authors of r.terraflow) for computing massive
DEMs and your data set looks like the size that we are aiming at, but
there may be other bottlenecks in GRASS than gridding so we need to
look at those too.

[sorry if I'm mixing email threads here]

while hunting around in google for some digital sidescan sonar test data
so everyone can play along at home (didn't find any), I found this --
maybe of use to the sidescan folks:
  http://www.omg.unb.ca/~jhc/SwathEd.html

LIDAR data is an analogy re. data density and issues you might want to
think about while processing into a DEM and I think any GRASS method for
dealing with one dataset would deal with the other as well.

lidar sample data here:
  http://agassiz.la.asu.edu:8080/lservlet

Note along with x,y,z there might also be a return signal strength
number and timestamp to deal with.

Helena, Laura:
Creating massive DEMs into TINs - are you using Radial Basis Functions
for picking the optimum position and number of mesh points?

see e.g.,
  http://wwwradig.in.tum.de/people/buck/RBF/

(non-commercial-use C++ library + source)
[perhaps could be talked into GPL?]

best,
Hamish

On Thu, 18 Nov 2004 15:24, Hamish wrote:

I found this --
maybe of use to the sidescan folks:
http://www.omg.unb.ca/~jhc/SwathEd.html

It's very nice software. We'd love to be able to afford to be "sponsors of the
Chair in Ocean Mapping" but its well outside our reach.

That's why we use MBSystem, it is GPL.

Regards
Gordon

--

Gordon Keith
Programmer/Data Analyst
Marine Acoustics
CSIRO Marine Research
http://www.marine.csiro.au

The world and everything in it that people desire is passing away;
but he who does the will of God lives for ever.
  -- 1 John 2:17

On Thu, Nov 18, 2004 at 09:29:54AM +1100, Gordon Keith wrote:

> >> A typical use for me, would be taking 120,000,000 point elevations and
> >> building a DEM. I have mainly used GMT for this, so hope to simply
> >> import the GMT netCDF grids (so far unsuccesfully)

I've basically given up on creating the DEM in GRASS and do all the work in
MBSystem, which generates GMT format grids.

I haven't been able to import GMT netCDF grids (haven't really tried) but have
no trouble importing grids in what MBSystem describes as
"GMT format id 1: Native binary single precision floats in scanlines with
leading grd header" using 'r.in.bin -hf'. I think MBSystem also describes
this format as "binary file (GMT version 1 GRD file)".

I'm not real familiar with GMT, but this might be of help to you.

Did you try a fresh GDAL (r.in.gdal then)?
http://gdal.org/frmt_various.html#netCDF
"This driver is primarily intended to provide a mechanism for grid interchange with the GMT package".

Markus