[GRASS-user] v.to.db/v.what rast on large vector sets

Dear grass-users

First off all- sorry for "spamming" this user-list recently with questions. I don't know any grass-users that I could ask. So this list is the only feedback I can get for support and I am happy it works so well. So my Thanks to all of you, for the recommendations given!

Now my issue:
I have a dataset with 30Mio Vectorpoints that need to get attributes added: the coordinates and the values of approx.60 rasters (resolution of 25m across Switzerland). While the processing of the raster-data was fast, this final join is very slow. Only adding x and y -coordinate is at 14% after 15h of processing; v.to.db currently still using 100%CPU. Is this expected behavior or an error of my system (Grass70 with SQLITE, Ubuntu 15.04)?

Regards,
Patrick

On Mon, Aug 10, 2015 at 9:51 AM, patrick s. <patrick_gis@gmx.net> wrote:

Dear grass-users

First off all- sorry for "spamming" this user-list recently with questions.

That's fine!

I don't know any grass-users that I could ask. So this list is the only
feedback I can get for support and I am happy it works so well. So my
Thanks to all of you, for the recommendations given!

Now my issue:
I have a dataset with 30Mio Vectorpoints that need to get attributes added:
the coordinates and the values of approx.60 rasters (resolution of 25m
across Switzerland). While the processing of the raster-data was fast, this
final join is very slow. Only adding x and y -coordinate is at 14% after 15h
of processing; v.to.db currently still using 100%CPU. Is this expected
behavior or an error of my system (Grass70 with SQLITE, Ubuntu 15.04)?

For easier testing, could you provide a simple cmd line example?
Ideally with the North Carolina dataset?

To simulate 30 mio Vector points, just set the raster resolution to cm
or the like.
An example with way less points is also fine, scaling it to more
points is easier than writing it from scratch...

thanks
Markus

Here the example on the North Carolina Workframe using random data. Its a Shell-Script to loop the process. Hope this is ok?
Scaling can be done on the variable in the first line. Would be great to get feedback on processing time other experience on large datasets.

Patrick

####################################################
RES=100 #resolution
RAST=10 #nr of rasters to match
PTNR=100 #nr of points to create

#____INIT__________________________________________________________
g.region rast=elevation_shade res=$RES -a #to keep extend and change resolution
v.random out=pt_grid npoints=$PTNR --o
v.db.addtable map=pt_grid

#generate rasters
i=0
while ((i<=RAST)) #`g.list type=rast`
do
     echo $i
     let i++
     r.mapcalc expr="mymap${i}=rand(1,1000)" --o -s #random raster
done

#____join to points________________________________________________
#add x,y
v.db.addcolumn map=pt_grid columns="x double precision, y double precision"
v.to.db map=pt_grid opt=coor columns="x,y"

#iterate through all rasters
#NOTE This leads does not allow to control for type, if raster is wrong coded
#NOTE Raster can be of type CELL (integer), DCELL (double prec.), FCELL(single prec.)
for i in `g.list type=rast`
do
     echo "adding layer '$i'"
     eval `r.info -g $i`
     if [ $datatype == "CELL" ]
     then
         v.db.addcolumn map=pt_grid column="$i integer"
     else #DCELL; FCELL
         v.db.addcolumn map=pt_grid column="$i double"
     fi
     v.what.rast map=pt_grid rast=$i col=$i
done;

##########################################################

On 11.08.2015 23:33, Markus Neteler wrote:

On Mon, Aug 10, 2015 at 9:51 AM, patrick s. <patrick_gis@gmx.net> wrote:

Dear grass-users

First off all- sorry for "spamming" this user-list recently with questions.

That's fine!

I don't know any grass-users that I could ask. So this list is the only
feedback I can get for support and I am happy it works so well. So my
Thanks to all of you, for the recommendations given!

Now my issue:
I have a dataset with 30Mio Vectorpoints that need to get attributes added:
the coordinates and the values of approx.60 rasters (resolution of 25m
across Switzerland). While the processing of the raster-data was fast, this
final join is very slow. Only adding x and y -coordinate is at 14% after 15h
of processing; v.to.db currently still using 100%CPU. Is this expected
behavior or an error of my system (Grass70 with SQLITE, Ubuntu 15.04)?

For easier testing, could you provide a simple cmd line example?
Ideally with the North Carolina dataset?

To simulate 30 mio Vector points, just set the raster resolution to cm
or the like.
An example with way less points is also fine, scaling it to more
points is easier than writing it from scratch...

thanks
Markus

Dear List

···

I am coming back to you with my problem on using v.to.db and v.what rast for very large datasets. My original post is attached, but meanwhile I tested v.to.db in different sizes. It seems to have a scaling issue, even when using sqlite. Querying a random sample with different sizes shows following processing times:
#1000 points => real 0m0.403s
#10.000 points => real 0m1.395s
#100.000 points => real 0m18.171s
#1.000.000 points => real 42m54.718s

Running the process for 7Mio-points takes more than 48h. Interesting enough the db.execute command is very fast (16sec for 1Mio). In the example below also v.out.ascii writes out the data very slow for large dataset- eventually because it adds east and north.

My system runs Ubuntu 15.04 (on SSD) with GRASS7.0.1., 16GB of RAM and the data is stored on a 2nd harddrive (not SSD). The run with 1Mio points eats 100%CPU-power, but leaves 40% of Memory free.

Does this mean that I have a limit through my CPU on running large datasets and need to process in chunks? Or is it something inside v.to.db as compared to db.execute?

Any help is appreciated

Patrick

#############################################
###Testcode for v.to.db using NorthCarolina
size=1000000 #used as variable below

#clean for multiple runs
g.remove type=rast name=randmap -f
g.remove type=vect name=randmap -f

#random raster and conversion to vector
g.region raster=elevation -p
r.random elevation raster_output=randmap n=$size
r.to.vect input=randmap output=randmap type=point

#add x,y
v.db.addcolumn map=randmap columns=“E double precision, N double precision, E_calctest integer”
time v.to.db map=randmap opt=coor columns=“E,N”
db.execute sql=“UPDATE randmap SET E_calctest=E+N”

#save as csv: This adds east and north again and is also slow
v.out.ascii input=randmap output=testdata.csv format=point sep=tab columns=* --o -c

-------- Forwarded Message --------

Subject:

Re: [GRASS-user] v.to.db/v.what rast on large vector sets

Date:

Thu, 3 Sep 2015 17:13:45 +0200

From:

patrick s. <patrick_gis@gmx.net>

To:

Markus Neteler <neteler@osgeo.org>

CC:

GRASS user list <grass-user@lists.osgeo.org>

Here the example on the North Carolina Workframe using random data. Its 
a Shell-Script to loop the process. Hope this is ok?
Scaling can be done on the variable in the first line. Would be great to 
get feedback on processing time other experience on large datasets.

Patrick

####################################################
RES=100  #resolution
RAST=10  #nr of rasters to match
PTNR=100 #nr of points to create

#____INIT__________________________________________________________
g.region rast=elevation_shade res=$RES -a #to keep extend and change 
resolution
v.random out=pt_grid npoints=$PTNR --o
v.db.addtable map=pt_grid

#generate rasters
i=0
while ((i<=RAST)) #`g.list type=rast`
do
    echo $i
    let i++
    r.mapcalc expr="mymap${i}=rand(1,1000)" --o -s #random raster
done

#____join to points________________________________________________
#add x,y
v.db.addcolumn map=pt_grid columns="x double precision, y double precision"
v.to.db map=pt_grid opt=coor columns="x,y"

#iterate through all rasters
#NOTE This leads does not allow to control for type, if raster is wrong 
coded
#NOTE Raster can be of type CELL (integer), DCELL (double prec.), 
FCELL(single prec.)
for i in `g.list type=rast`
do
    echo "adding layer '$i'"
    eval `r.info -g $i`
    if [ $datatype == "CELL" ]
    then
        v.db.addcolumn map=pt_grid column="$i integer"
    else #DCELL; FCELL
        v.db.addcolumn map=pt_grid column="$i double"
    fi
    v.what.rast map=pt_grid rast=$i col=$i
done;

##########################################################

On 11.08.2015 23:33, Markus Neteler wrote:
> On Mon, Aug 10, 2015 at 9:51 AM, patrick s. [<patrick_gis@gmx.net>](mailto:patrick_gis@gmx.net) wrote:
>> Dear grass-users
>>
>> First off all- sorry for "spamming" this user-list recently with questions.
> That's fine!
>
>> I don't know any grass-users that I could ask. So this list is the only
>> feedback I can get for support and I am happy it works so well.  So my
>> Thanks to all of you, for the recommendations given!
>>
>> Now my issue:
>> I have a dataset with 30Mio Vectorpoints that need to get attributes added:
>> the coordinates and the values of approx.60 rasters (resolution of 25m
>> across Switzerland). While the processing of the raster-data was fast, this
>> final join is very slow. Only adding x and y -coordinate is at 14% after 15h
>> of processing; v.to.db currently still using 100%CPU. Is this expected
>> behavior or an error of my system (Grass70 with SQLITE, Ubuntu 15.04)?
> For easier testing, could you provide a simple cmd line example?
> Ideally with the North Carolina dataset?
>
> To simulate 30 mio Vector points, just set the raster resolution to cm
> or the like.
> An example with way less points is also fine, scaling it to more
> points is easier than writing it from scratch...
>
> thanks
> Markus