[GRASS-user] v.generalize: does it take forever?

Fabio_Dias · December 28, 2014, 7:04pm

Hello all,

Context: I've loaded some shp files into postgis, containing
information over the amazon forest. For reference, the sql script has
around 6Gb.

Problem: I managed do import, clean and dissolve properly, but when I
run the generalization, by my estimates, it would take almost an year
to complete.

I also noticed that neither grass nor postgis are capable of parallel
processing...

Question: Am I using the correct tool for that? Is there a way to
speed up the processing?

For reference, the commands i've used (grass70, beta4, 22 dez 2014):

v.in.ogr -e --verbose input="pg:host=localhost (...)" layer=ap10
output=ap10 snap=1e-6
v.clean -c --verbose input=ap10 output=ap10c tool=bpol,break,rmsa type=line
v.dissolve --verbose input=ap10c column=tc_2010 output=ap10d --overwrite

Try #1 ) v.generalize --verbose --overwrite input=ap10d output=ap10r
method=reduction threshold=0.00025 --overwrite
Try #2 ) v.generalize --verbose --overwrite input=ap10d output=ap10g
method=douglas threshold=0.00025 --overwrite

Thanks in advance and happy holidays

F

-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

neteler · December 31, 2014, 2:23pm

On Sun, Dec 28, 2014 at 8:04 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

Hello all,

Context: I've loaded some shp files into postgis, containing
information over the amazon forest. For reference, the sql script has
around 6Gb.

How many polygons do you have there approximately?

Problem: I managed do import, clean and dissolve properly, but when I
run the generalization, by my estimates, it would take almost an year
to complete.

This will also depend on the generalization method you selected.

I also noticed that neither grass nor postgis are capable of parallel
processing...

Yeah, hot topic for 2015 Indeed, worth a thesis in my view!

Question: Am I using the correct tool for that? Is there a way to
speed up the processing?

For reference, the commands i've used (grass70, beta4, 22 dez 2014):

(Glad you use beta4, so we have a recent code base to check.)

v.in.ogr -e --verbose input="pg:host=localhost (...)" layer=ap10
output=ap10 snap=1e-6

--> Please tell us how many polygons or lines the layer "ap10" contains.

v.clean -c --verbose input=ap10 output=ap10c tool=bpol,break,rmsa type=line

--> Not sure but should type be "boundary" or "line"?

v.dissolve --verbose input=ap10c column=tc_2010 output=ap10d --overwrite

--> How many polygons does "ap10d" contain?

Try #1 ) v.generalize --verbose --overwrite input=ap10d output=ap10r
method=reduction threshold=0.00025 --overwrite
Try #2 ) v.generalize --verbose --overwrite input=ap10d output=ap10g
method=douglas threshold=0.00025 --overwrite

I suppose that method=douglas is faster that method=reduction?

What is the projection you are working with? Given the threshold and
assuming LatLong, I get a short distance:

GRASS 7.1.svn (latlong):~ > g.region -g res=0.00025 -a
n=4.15
s=-16.369
w=-76.23975
e=-45.1125
nsres=0.00025
ewres=0.00025
...

GRASS 7.1.svn (latlong):~ > g.region -m...
nsres=27.64959657
ewres=27.21883498
...

Probably you want to try a larger threshold first?

Markus

Fabio_Dias · December 31, 2014, 4:20pm

On Wed, Dec 31, 2014 at 12:23 PM, Markus Neteler <neteler@osgeo.org> wrote:

On Sun, Dec 28, 2014 at 8:04 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

Hello all,

Context: I've loaded some shp files into postgis, containing
information over the amazon forest. For reference, the sql script has
around 6Gb.

How many polygons do you have there approximately?

That depends. The information is separated by states. In this case, AP
corresponds to the state of Amapá, which is the smallest one,
datawise, with only 70ish mb. The state of Pará has 2.3gb.
Ideally, I would generalize the information as a whole, not each state
independently, so I don't get gaps/etc. The whole thing, for one of
the 4 years available, has around 5M polygons (counting from postgis,
I do not have the data imported on grass at the moment. I'm importing,
but it will take a while). The other years have more polygons, and it
wouldn't be unreasonable to expect around 10M.

Problem: I managed do import, clean and dissolve properly, but when I
run the generalization, by my estimates, it would take almost an year
to complete.

This will also depend on the generalization method you selected.

Yes, but in a minor way, as I'll detail in the next part.

I also noticed that neither grass nor postgis are capable of parallel
processing...

Yeah, hot topic for 2015 Indeed, worth a thesis in my view!

I fussed about the v.generalize code, thinking about pthread
parallelization. The geometry part of the code is *really* fast and
can be easily parallelized so it can run even faster. But, according
to the profile google-perf gave me, the real bottleneck is inside the
check_topo function (which uses static vars and inserts a new line
into the vector, not only checks if it breaks topo - got stuck a while
in there due to the misnomer). More specifically in the Rtree function
used to check if one line intersects other lines.

I commented out the check_topo call and it ran a whole lot faster. The
result, obviously, was really bad and messed up, topologically, but it
confirmed that it is indeed the bottleneck.

Question: Am I using the correct tool for that? Is there a way to
speed up the processing?

For reference, the commands i've used (grass70, beta4, 22 dez 2014):

(Glad you use beta4, so we have a recent code base to check.)

v.in.ogr -e --verbose input="pg:host=localhost (...)" layer=ap10
output=ap10 snap=1e-6

--> Please tell us how many polygons or lines the layer "ap10" contains.

ap10 was just a 'toy' dataset to try out the script. It is
considerably smaller than the real dataset. The postgis table of this
data has 50k records/polygons.
v.info for it on: http://pastebin.com/8RZELd8p

v.clean -c --verbose input=ap10 output=ap10c tool=bpol,break,rmsa type=line

--> Not sure but should type be "boundary" or "line"?

I tried combinations and variations, I'm not that sure either. My
postgis data is composed of polygons. It is a landuse classification
data (or something like this, I'm not that familiar with
geo-nomenclature).

v.dissolve --verbose input=ap10c column=tc_2010 output=ap10d --overwrite

--> How many polygons does "ap10d" contain?

120k boundaries (http://pastebin.com/8RZELd8p)

Try #1 ) v.generalize --verbose --overwrite input=ap10d output=ap10r
method=reduction threshold=0.00025 --overwrite
Try #2 ) v.generalize --verbose --overwrite input=ap10d output=ap10g
method=douglas threshold=0.00025 --overwrite

I suppose that method=douglas is faster that method=reduction?

With the full dataset, both were painfully slow. And by slow, I mean
more than 24h without printing the 1% message slow.

What is the projection you are working with? Given the threshold and
assuming LatLong, I get a short distance:

4674. It is indeed latlong.
The idea is to have multiple generalizations as different tables on
postgis and fetch data from the correct table using the current zoom
level in the web interface (googlemaps based). I considered serving
the map using wms/geoserver and also rendering on the client using
node.js (io.js now, apparently) and topojson.

GRASS 7.1.svn (latlong):~ > g.region -g res=0.00025 -a
n=4.15
s=-16.369
w=-76.23975
e=-45.1125
nsres=0.00025
ewres=0.00025
...

GRASS 7.1.svn (latlong):~ > g.region -m...
nsres=27.64959657
ewres=27.21883498
...

Probably you want to try a larger threshold first?

Empirically, that valued removed only the jagged edges, so it was a
good first generalization. My idea was that, afterwards, I'd increase
the threshold and generate more generalizations.

thanks again,
F

Markus_Metz · January 1, 2015, 10:13pm

On Wed, Dec 31, 2014 at 5:20 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

On Wed, Dec 31, 2014 at 12:23 PM, Markus Neteler <neteler@osgeo.org> wrote:

On Sun, Dec 28, 2014 at 8:04 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

Hello all,

Context: I've loaded some shp files into postgis, containing
information over the amazon forest. For reference, the sql script has
around 6Gb.

How many polygons do you have there approximately?

That depends. The information is separated by states. In this case, AP
corresponds to the state of Amapá, which is the smallest one,
datawise, with only 70ish mb. The state of Pará has 2.3gb.
Ideally, I would generalize the information as a whole, not each state
independently, so I don't get gaps/etc.

Makes sense.

The whole thing, for one of
the 4 years available, has around 5M polygons (counting from postgis,
I do not have the data imported on grass at the moment. I'm importing,
but it will take a while). The other years have more polygons, and it
wouldn't be unreasonable to expect around 10M.

I would avoid the postgis step and import the shapefiles directly to GRASS.

Problem: I managed do import, clean and dissolve properly, but when I
run the generalization, by my estimates, it would take almost an year
to complete.

This will also depend on the generalization method you selected.

Yes, but in a minor way, as I'll detail in the next part.

I also noticed that neither grass nor postgis are capable of parallel
processing...

Yeah, hot topic for 2015 Indeed, worth a thesis in my view!

I fussed about the v.generalize code, thinking about pthread
parallelization. The geometry part of the code is *really* fast and
can be easily parallelized so it can run even faster. But, according
to the profile google-perf gave me, the real bottleneck is inside the
check_topo function (which uses static vars and inserts a new line
into the vector, not only checks if it breaks topo - got stuck a while
in there due to the misnomer). More specifically in the Rtree function
used to check if one line intersects other lines.

The check_topo function can not be executed in parallel because 1)
topology must not be modified for several boundaries in parallel, 2)
data are written to disk, and disk IO is by nature not parallel.

I commented out the check_topo call and it ran a whole lot faster. The
result, obviously, was really bad and messed up, topologically, but it
confirmed that it is indeed the bottleneck.

Question: Am I using the correct tool for that? Is there a way to
speed up the processing?

For reference, the commands i've used (grass70, beta4, 22 dez 2014):

(Glad you use beta4, so we have a recent code base to check.)

A pity you use beta4, please use current trunk, because there are a
few improvements in trunk not available in beta4. v.generalize should
be quite a bit faster in trunk than in beta4.

I suppose that method=douglas is faster that method=reduction?

Yes.

With the full dataset, both were painfully slow. And by slow, I mean
more than 24h without printing the 1% message slow.

What is the projection you are working with? Given the threshold and
assuming LatLong, I get a short distance:

4674. It is indeed latlong.

That does not matter.

The idea is to have multiple generalizations as different tables on
postgis and fetch data from the correct table using the current zoom
level in the web interface (googlemaps based). I considered serving
the map using wms/geoserver and also rendering on the client using
node.js (io.js now, apparently) and topojson.

As you mentioned above, the whole dataset should be generalized at
once to avoid gaps and overlapping parts.

Probably you want to try a larger threshold first?

No, rather try a very small threshold first.

Markus M

Markus_Metz · January 4, 2015, 7:54pm

On Wed, Dec 31, 2014 at 5:20 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

I fussed about the v.generalize code, thinking about pthread
parallelization. The geometry part of the code is *really* fast and
can be easily parallelized so it can run even faster. But, according
to the profile google-perf gave me, the real bottleneck is inside the
check_topo function (which uses static vars and inserts a new line
into the vector, not only checks if it breaks topo - got stuck a while
in there due to the misnomer). More specifically in the Rtree function
used to check if one line intersects other lines.

The function used in check_topo is Vect_line_intersection() which does
much more than just testing for intersections. The process could be
made much faster if Vect_line_check_intersection() would be modified
such that connections by end points are ignored. But I don't know if
this would break other modules or other functionality.

Markus M

Fabio_Dias · January 4, 2015, 8:01pm

Attached is pdf generated with google-perf of v.generalize, using
g7b4. I'm running it again for trunk.
-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Sun, Jan 4, 2015 at 5:54 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Wed, Dec 31, 2014 at 5:20 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

I fussed about the v.generalize code, thinking about pthread
parallelization. The geometry part of the code is *really* fast and
can be easily parallelized so it can run even faster. But, according
to the profile google-perf gave me, the real bottleneck is inside the
check_topo function (which uses static vars and inserts a new line
into the vector, not only checks if it breaks topo - got stuck a while
in there due to the misnomer). More specifically in the Rtree function
used to check if one line intersects other lines.

The function used in check_topo is Vect_line_intersection() which does
much more than just testing for intersections. The process could be
made much faster if Vect_line_check_intersection() would be modified
such that connections by end points are ignored. But I don't know if
this would break other modules or other functionality.

Markus M

(attachments)

v.gen.profile.pdf (10.9 KB)

Fabio_Dias · January 4, 2015, 9:45pm

As promised, profile of v.generalize, as of r63952.
(The data might not be exactly the same, I might have run v.clean somewhere).

I still have the raw profiles, if anyone wants them.

F
-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Sun, Jan 4, 2015 at 6:01 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

Attached is pdf generated with google-perf of v.generalize, using
g7b4. I'm running it again for trunk.
-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Sun, Jan 4, 2015 at 5:54 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Wed, Dec 31, 2014 at 5:20 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

I fussed about the v.generalize code, thinking about pthread
parallelization. The geometry part of the code is *really* fast and
can be easily parallelized so it can run even faster. But, according
to the profile google-perf gave me, the real bottleneck is inside the
check_topo function (which uses static vars and inserts a new line
into the vector, not only checks if it breaks topo - got stuck a while
in there due to the misnomer). More specifically in the Rtree function
used to check if one line intersects other lines.

The function used in check_topo is Vect_line_intersection() which does
much more than just testing for intersections. The process could be
made much faster if Vect_line_check_intersection() would be modified
such that connections by end points are ignored. But I don't know if
this would break other modules or other functionality.

Markus M

(attachments)

tr.pdf (10.1 KB)

Fabio_Dias · January 5, 2015, 7:49pm

Just for further reference, the v.dissolve takes around 24h in this
dataset. I'll post the v.info of both as soon as it is finished.

Any other ideas? I have a fairly powerful server at my disposal, but
I'm out of ideas...
-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Sun, Jan 4, 2015 at 7:45 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

As promised, profile of v.generalize, as of r63952.
(The data might not be exactly the same, I might have run v.clean somewhere).

I still have the raw profiles, if anyone wants them.

F
-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Sun, Jan 4, 2015 at 6:01 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

Attached is pdf generated with google-perf of v.generalize, using
g7b4. I'm running it again for trunk.
-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Sun, Jan 4, 2015 at 5:54 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Wed, Dec 31, 2014 at 5:20 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

I fussed about the v.generalize code, thinking about pthread
parallelization. The geometry part of the code is *really* fast and
can be easily parallelized so it can run even faster. But, according
to the profile google-perf gave me, the real bottleneck is inside the
check_topo function (which uses static vars and inserts a new line
into the vector, not only checks if it breaks topo - got stuck a while
in there due to the misnomer). More specifically in the Rtree function
used to check if one line intersects other lines.

The function used in check_topo is Vect_line_intersection() which does
much more than just testing for intersections. The process could be
made much faster if Vect_line_check_intersection() would be modified
such that connections by end points are ignored. But I don't know if
this would break other modules or other functionality.

Markus M

Fabio_Dias · January 6, 2015, 5:48pm

Original:
GRASS 7.1.svn (brasil):~ > v.info map=tc10
+----------------------------------------------------------------------------+
| Name: tc10 |
| Mapset: terraclass |
| Location: brasil |
| Database: /home/externo/fabioasd/grass |
| Title: |
| Map scale: 1:1 |
| Name of creator: fabioasd |
| Organization: |
| Source date: Sat Jan 3 23:38:40 2015 |
| Timestamp (first layer): none |
|----------------------------------------------------------------------------|
| Map format: native |
|----------------------------------------------------------------------------|
| Type of map: vector (level: 2) |
| |
| Number of points: 0 Number of centroids: 5323741 |
| Number of lines: 0 Number of boundaries: 12889264 |
| Number of areas: 5573197 Number of islands: 1332382 |
| |
| Map is 3D: No |
| Number of dblinks: 1 |
| |
| Projection: Latitude-Longitude |
| |
| N: 5:16:18.443667N S: 18:02:29.687783S |
| E: 43:59:58.760386W W: 73:59:29.009623W |
| |
| Digitization threshold: 0 |
| Comment: |
| |
+----------------------------------------------------------------------------+

After dissolve:

+----------------------------------------------------------------------------+
| Name: tc10d |
| Mapset: terraclass |
| Location: brasil |
| Database: /home/externo/fabioasd/grass |
| Title: |
| Map scale: 1:1 |
| Name of creator: fabioasd |
| Organization: |
| Source date: Sat Jan 3 23:38:40 2015 |
| Timestamp (first layer): none |
|----------------------------------------------------------------------------|
| Map format: native |
|----------------------------------------------------------------------------|
| Type of map: vector (level: 2) |
| |
| Number of points: 0 Number of centroids: 5120039 |
| Number of lines: 0 Number of boundaries: 12641473 |
| Number of areas: 5369494 Number of islands: 1366321 |
| |
| Map is 3D: No |
| Number of dblinks: 1 |
| |
| Projection: Latitude-Longitude |
| |
| N: 5:16:18.443667N S: 18:02:29.687783S |
| E: 43:59:58.760386W W: 73:59:29.009623W |
| |
| Digitization threshold: 0 |
| Comment: |
| |
+----------------------------------------------------------------------------+
-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Mon, Jan 5, 2015 at 5:49 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

Just for further reference, the v.dissolve takes around 24h in this
dataset. I'll post the v.info of both as soon as it is finished.

Any other ideas? I have a fairly powerful server at my disposal, but
I'm out of ideas...
-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Sun, Jan 4, 2015 at 7:45 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

As promised, profile of v.generalize, as of r63952.
(The data might not be exactly the same, I might have run v.clean somewhere).

I still have the raw profiles, if anyone wants them.

F
-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Sun, Jan 4, 2015 at 6:01 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

Attached is pdf generated with google-perf of v.generalize, using
g7b4. I'm running it again for trunk.
-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Sun, Jan 4, 2015 at 5:54 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Wed, Dec 31, 2014 at 5:20 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

I fussed about the v.generalize code, thinking about pthread
parallelization. The geometry part of the code is *really* fast and
can be easily parallelized so it can run even faster. But, according
to the profile google-perf gave me, the real bottleneck is inside the
check_topo function (which uses static vars and inserts a new line
into the vector, not only checks if it breaks topo - got stuck a while
in there due to the misnomer). More specifically in the Rtree function
used to check if one line intersects other lines.

The function used in check_topo is Vect_line_intersection() which does
much more than just testing for intersections. The process could be
made much faster if Vect_line_check_intersection() would be modified
such that connections by end points are ignored. But I don't know if
this would break other modules or other functionality.

Markus M

Fabio_Dias · January 7, 2015, 4:12pm

Another interesting update....
I believed that doing a dissolve before generalizing would speed up
the process, because it would remove a lot of edges. The data is very
segmented, stitching would be the right term, I suppose.

Turns out, that belief is really wrong. The really expensive part of
the code is checking if the new line intersect with other lines. To
reduce the comparisons, it check the bounding boxes.
By dissolving, I turned all the small lines into really, really big
ones. Then all bounding boxes intercept and the algorithm does a whole
lot more comparisons....

750 minutes of processing, 5% progress, reduction method.

F

-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Tue, Jan 6, 2015 at 3:48 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

Original:
GRASS 7.1.svn (brasil):~ > v.info map=tc10
+----------------------------------------------------------------------------+
| Name: tc10 |
| Mapset: terraclass |
| Location: brasil |
| Database: /home/externo/fabioasd/grass |
| Title: |
| Map scale: 1:1 |
| Name of creator: fabioasd |
| Organization: |
| Source date: Sat Jan 3 23:38:40 2015 |
| Timestamp (first layer): none |
|----------------------------------------------------------------------------|
| Map format: native |
|----------------------------------------------------------------------------|
| Type of map: vector (level: 2) |
| |
| Number of points: 0 Number of centroids: 5323741 |
| Number of lines: 0 Number of boundaries: 12889264 |
| Number of areas: 5573197 Number of islands: 1332382 |
| |
| Map is 3D: No |
| Number of dblinks: 1 |
| |
| Projection: Latitude-Longitude |
| |
| N: 5:16:18.443667N S: 18:02:29.687783S |
| E: 43:59:58.760386W W: 73:59:29.009623W |
| |
| Digitization threshold: 0 |
| Comment: |
| |
+----------------------------------------------------------------------------+

After dissolve:

+----------------------------------------------------------------------------+
| Name: tc10d |
| Mapset: terraclass |
| Location: brasil |
| Database: /home/externo/fabioasd/grass |
| Title: |
| Map scale: 1:1 |
| Name of creator: fabioasd |
| Organization: |
| Source date: Sat Jan 3 23:38:40 2015 |
| Timestamp (first layer): none |
|----------------------------------------------------------------------------|
| Map format: native |
|----------------------------------------------------------------------------|
| Type of map: vector (level: 2) |
| |
| Number of points: 0 Number of centroids: 5120039 |
| Number of lines: 0 Number of boundaries: 12641473 |
| Number of areas: 5369494 Number of islands: 1366321 |
| |
| Map is 3D: No |
| Number of dblinks: 1 |
| |
| Projection: Latitude-Longitude |
| |
| N: 5:16:18.443667N S: 18:02:29.687783S |
| E: 43:59:58.760386W W: 73:59:29.009623W |
| |
| Digitization threshold: 0 |
| Comment: |
| |
+----------------------------------------------------------------------------+
-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Mon, Jan 5, 2015 at 5:49 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

Just for further reference, the v.dissolve takes around 24h in this
dataset. I'll post the v.info of both as soon as it is finished.

Any other ideas? I have a fairly powerful server at my disposal, but
I'm out of ideas...
-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Sun, Jan 4, 2015 at 7:45 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

As promised, profile of v.generalize, as of r63952.
(The data might not be exactly the same, I might have run v.clean somewhere).

I still have the raw profiles, if anyone wants them.

F
-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Sun, Jan 4, 2015 at 6:01 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

Attached is pdf generated with google-perf of v.generalize, using
g7b4. I'm running it again for trunk.
-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Sun, Jan 4, 2015 at 5:54 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Wed, Dec 31, 2014 at 5:20 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

I fussed about the v.generalize code, thinking about pthread
parallelization. The geometry part of the code is *really* fast and
can be easily parallelized so it can run even faster. But, according
to the profile google-perf gave me, the real bottleneck is inside the
check_topo function (which uses static vars and inserts a new line
into the vector, not only checks if it breaks topo - got stuck a while
in there due to the misnomer). More specifically in the Rtree function
used to check if one line intersects other lines.

The function used in check_topo is Vect_line_intersection() which does
much more than just testing for intersections. The process could be
made much faster if Vect_line_check_intersection() would be modified
such that connections by end points are ignored. But I don't know if
this would break other modules or other functionality.

Markus M

Fabio_Dias · January 8, 2015, 5:29pm

Another info update (and shameless bump)
Around 18M primitives and 130M vertices.

And I'm left wondering... Has nobody tried to generalize this amount
of data yet? Or I'm going about this on the wrong way?

I even mounted the grassdata dir as a ramdisk to try to speed up the
process, but it is still way too slow...

Thanks for everything,
F
-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Wed, Jan 7, 2015 at 2:12 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

Another interesting update....
I believed that doing a dissolve before generalizing would speed up
the process, because it would remove a lot of edges. The data is very
segmented, stitching would be the right term, I suppose.

Turns out, that belief is really wrong. The really expensive part of
the code is checking if the new line intersect with other lines. To
reduce the comparisons, it check the bounding boxes.
By dissolving, I turned all the small lines into really, really big
ones. Then all bounding boxes intercept and the algorithm does a whole
lot more comparisons....

750 minutes of processing, 5% progress, reduction method.

F

-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Tue, Jan 6, 2015 at 3:48 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

Original:
GRASS 7.1.svn (brasil):~ > v.info map=tc10
+----------------------------------------------------------------------------+
| Name: tc10 |
| Mapset: terraclass |
| Location: brasil |
| Database: /home/externo/fabioasd/grass |
| Title: |
| Map scale: 1:1 |
| Name of creator: fabioasd |
| Organization: |
| Source date: Sat Jan 3 23:38:40 2015 |
| Timestamp (first layer): none |
|----------------------------------------------------------------------------|
| Map format: native |
|----------------------------------------------------------------------------|
| Type of map: vector (level: 2) |
| |
| Number of points: 0 Number of centroids: 5323741 |
| Number of lines: 0 Number of boundaries: 12889264 |
| Number of areas: 5573197 Number of islands: 1332382 |
| |
| Map is 3D: No |
| Number of dblinks: 1 |
| |
| Projection: Latitude-Longitude |
| |
| N: 5:16:18.443667N S: 18:02:29.687783S |
| E: 43:59:58.760386W W: 73:59:29.009623W |
| |
| Digitization threshold: 0 |
| Comment: |
| |
+----------------------------------------------------------------------------+

After dissolve:

+----------------------------------------------------------------------------+
| Name: tc10d |
| Mapset: terraclass |
| Location: brasil |
| Database: /home/externo/fabioasd/grass |
| Title: |
| Map scale: 1:1 |
| Name of creator: fabioasd |
| Organization: |
| Source date: Sat Jan 3 23:38:40 2015 |
| Timestamp (first layer): none |
|----------------------------------------------------------------------------|
| Map format: native |
|----------------------------------------------------------------------------|
| Type of map: vector (level: 2) |
| |
| Number of points: 0 Number of centroids: 5120039 |
| Number of lines: 0 Number of boundaries: 12641473 |
| Number of areas: 5369494 Number of islands: 1366321 |
| |
| Map is 3D: No |
| Number of dblinks: 1 |
| |
| Projection: Latitude-Longitude |
| |
| N: 5:16:18.443667N S: 18:02:29.687783S |
| E: 43:59:58.760386W W: 73:59:29.009623W |
| |
| Digitization threshold: 0 |
| Comment: |
| |
+----------------------------------------------------------------------------+
-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Mon, Jan 5, 2015 at 5:49 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

Just for further reference, the v.dissolve takes around 24h in this
dataset. I'll post the v.info of both as soon as it is finished.

Any other ideas? I have a fairly powerful server at my disposal, but
I'm out of ideas...
-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Sun, Jan 4, 2015 at 7:45 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

As promised, profile of v.generalize, as of r63952.
(The data might not be exactly the same, I might have run v.clean somewhere).

I still have the raw profiles, if anyone wants them.

F
-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Sun, Jan 4, 2015 at 6:01 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

Attached is pdf generated with google-perf of v.generalize, using
g7b4. I'm running it again for trunk.
-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Sun, Jan 4, 2015 at 5:54 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Wed, Dec 31, 2014 at 5:20 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

I fussed about the v.generalize code, thinking about pthread
parallelization. The geometry part of the code is *really* fast and
can be easily parallelized so it can run even faster. But, according
to the profile google-perf gave me, the real bottleneck is inside the
check_topo function (which uses static vars and inserts a new line
into the vector, not only checks if it breaks topo - got stuck a while
in there due to the misnomer). More specifically in the Rtree function
used to check if one line intersects other lines.

The function used in check_topo is Vect_line_intersection() which does
much more than just testing for intersections. The process could be
made much faster if Vect_line_check_intersection() would be modified
such that connections by end points are ignored. But I don't know if
this would break other modules or other functionality.

Markus M

Markus_Metz · January 9, 2015, 9:56pm

On Sun, Jan 4, 2015 at 10:45 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

As promised, profile of v.generalize, as of r63952.
(The data might not be exactly the same, I might have run v.clean somewhere).

Thanks for your thorough code analysis!

My initial guess was wrong, Vect_line_intersection2() is not the
limiting factor. The R tree is also used to feed
Vect_line_intersection2(), but here it seems to be no bottleneck. The
limit was Vect_rewrite_line() and the functions called by it.

I have optimized the GRASS vector library in trunk r64032 and added
another topology check to v.generalize in trunk r64033. The profile of
v.generalize now shows that it is limited by disk I/O speed (on my
laptop with a standard laptop-like spinning HDD), which means that the
algorithms are, under the test conditions, close to their optimum.
This picture might change as soon as you use a high-performance server
or a SSD.

The speed improvement is non-linear: for small datasets as in the
official GRASS datasets, you won't notice a difference. For one tile
of Terraclass, the processing speed should be about 2-4 times faster
than before. For the full Terraclass dataset, the processing speed
could be >10 times faster than before. You will need to wait until say
10% of the processing has been reached in order to estimate the total
processing time. Simplifying each line takes its own time, therefore
the processing time of 100% is not equal to 100 x the processing time
of 1%.

Another user has applied v.generalize to NLCD2011 and it took nearly 2
months. Your dataset is probably a bit smaller, but the Terraclass
shapefiles are full of errors. If you want to fix these errors, this
will take some time.

I recommend to test the new v.generalize first on a subregion of
Terraclass. Only if the processing speed and the results are
acceptable, proceed with the full dataset. Otherwise, please report.

Markus M

I still have the raw profiles, if anyone wants them.

F
-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Sun, Jan 4, 2015 at 6:01 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

Attached is pdf generated with google-perf of v.generalize, using
g7b4. I'm running it again for trunk.
-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Sun, Jan 4, 2015 at 5:54 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Wed, Dec 31, 2014 at 5:20 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

I fussed about the v.generalize code, thinking about pthread
parallelization. The geometry part of the code is *really* fast and
can be easily parallelized so it can run even faster. But, according
to the profile google-perf gave me, the real bottleneck is inside the
check_topo function (which uses static vars and inserts a new line
into the vector, not only checks if it breaks topo - got stuck a while
in there due to the misnomer). More specifically in the Rtree function
used to check if one line intersects other lines.

The function used in check_topo is Vect_line_intersection() which does
much more than just testing for intersections. The process could be
made much faster if Vect_line_check_intersection() would be modified
such that connections by end points are ignored. But I don't know if
this would break other modules or other functionality.

Markus M

Fabio_Dias · January 10, 2015, 6:23pm

I have optimized the GRASS vector library in trunk r64032 and added
another topology check to v.generalize in trunk r64033. The profile of
v.generalize now shows that it is limited by disk I/O speed (on my
laptop with a standard laptop-like spinning HDD), which means that the
algorithms are, under the test conditions, close to their optimum.
This picture might change as soon as you use a high-performance server
or a SSD.

Then I should do a profile on my current setup. My grassdata dir is
not a disk, but a mounted ramdisk, which is, basically, ram, aka
really, really fast. It should be interesting.
By the way, it is really easy to do, at least on linux, and it should
really improve the performance for big datasets. Obviously, you'd need
a big machine too, but well, a big nail needs a big hammer.

cd ~
mkdir -p grassdata
sudo mount -t tmpfs -o size=512M tmpfs grassdata

In my case, the machine has 128Gb, so I made a 32Gb ramdisk. Each
vector directory has 6Gb, so it is plenty.
Of course, the data will be lost if you shutdown or reboot the
machine, so extra care is needed.
I did not compare the result with and without the ramdisk btw.

The speed improvement is non-linear: for small datasets as in the
official GRASS datasets, you won't notice a difference. For one tile
of Terraclass, the processing speed should be about 2-4 times faster
than before. For the full Terraclass dataset, the processing speed
could be >10 times faster than before. You will need to wait until say
10% of the processing has been reached in order to estimate the total
processing time. Simplifying each line takes its own time, therefore
the processing time of 100% is not equal to 100 x the processing time
of 1%.

Indeed, but it was a (very) rough approximation.

Another user has applied v.generalize to NLCD2011 and it took nearly 2
months. Your dataset is probably a bit smaller, but the Terraclass
shapefiles are full of errors. If you want to fix these errors, this
will take some time.

You know this dataset? The errors are really bugging me. It is, mostly
due to the process/tools they usually use. We have passed over the
request for a more topologically correct approach. Maybe on the next
iteration. But I'll create another thread asking advice regarding
these errors shortly

I recommend to test the new v.generalize first on a subregion of
Terraclass. Only if the processing speed and the results are
acceptable, proceed with the full dataset. Otherwise, please report.

Testing before deploying? Where's the fun in that ?
Joking aside, I did that before trying the full dataset. I did,
however interrupt the processing to start over with the new trunk
version, because you said it would be faster. And indeed it is, thank
you very much.
By not previously dissolving and further doing v.clean tool=break the
original data, I've reduced the processing time from more than 30h for
1% to 24h to 11%. With the latest release, 9% in 18h.

However, this whole thing got me thinking about you said on an early message:

The check_topo function can not be executed in parallel because 1)
topology must not be modified for several boundaries in parallel, 2)
data are written to disk, and disk IO is by nature not parallel.

Well, disk IO, there's not much we can do about it. On high end
servers, again, I'm thinking big hammers, this shouldn't really be a
bottleneck nor lock the threads for long, between the disk speed and
cache, this should barely lock each thread. Assuming the "vector
access" functions to be thread safe (which I think they will
eventually be, IMHO it would be the first step to make the whole
software "parallel-capable"), we could allow parallel changes in the
topology by carefully choosing which lines are going to be considered
at a time. One simple example might be lines whose bounding boxes do
not intercept. Not sure how much overhead this would cause, or if it
would be worth it.

Thanks again,

F

Markus_Metz · January 11, 2015, 10:32pm

On Sat, Jan 10, 2015 at 7:23 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

I have optimized the GRASS vector library in trunk r64032 and added
another topology check to v.generalize in trunk r64033. The profile of
v.generalize now shows that it is limited by disk I/O speed (on my
laptop with a standard laptop-like spinning HDD), which means that the
algorithms are, under the test conditions, close to their optimum.
This picture might change as soon as you use a high-performance server
or a SSD.

Then I should do a profile on my current setup.

I have updated v.generalize again in trunk r64067. Please test the
latest version.

[...] the Terraclass
shapefiles are full of errors. If you want to fix these errors, this
will take some time.

You know this dataset? The errors are really bugging me. It is, mostly
due to the process/tools they usually use. We have passed over the
request for a more topologically correct approach. Maybe on the next
iteration. But I'll create another thread asking advice regarding
these errors shortly

I know the Terraclass dataset a bit. I used some tiles for testing. I
was not able to import any of my test tiles without errors (after
years of thinking about the conversion of non-topological vectors to
topological vectors). Terraclass data are based on PRODES data, which
I know pretty well. The PRODES classification also comes as shapfiles
which are also full of errors, but these I managed to remove by
carefully choosing the snapping threshold for v.in.ogr.

By not previously dissolving and further doing v.clean tool=break the
original data, I've reduced the processing time from more than 30h for
1% to 24h to 11%. With the latest release, 9% in 18h.

9% in 18h seems promising.

However, this whole thing got me thinking about you said on an early message:

The check_topo function can not be executed in parallel because 1)
topology must not be modified for several boundaries in parallel, 2)
data are written to disk, and disk IO is by nature not parallel.

Well, disk IO, there's not much we can do about it.

We can here and there sometimes reduce disk IO (which I did in some of
my recent changes).

Markus M

Fabio_Dias · January 14, 2015, 2:54pm

Hello,

Hopefully my last question regarding v.generalize and speeding up the process.

Context:

I have multiple years of data that need to be generalized. For each
year, I need a number of different generalizations (specific number
TBD).

Question:

What would be the best way to do that in parallel? One mapset for each
year? Can I run multiple v.generalizes on the same input with
different outputs?

My first thought was to run completely separated grass processes for
each simplification, but I didn't find a way to make it search
something different than .grass / .grass70 for the configuration
stuff....

Thanks again

F
-=--=-=-
Fábio Augusto Salve Dias
ICMC - USP
http://sites.google.com/site/fabiodias/

On Sun, Jan 11, 2015 at 8:32 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Sat, Jan 10, 2015 at 7:23 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

I have optimized the GRASS vector library in trunk r64032 and added
another topology check to v.generalize in trunk r64033. The profile of
v.generalize now shows that it is limited by disk I/O speed (on my
laptop with a standard laptop-like spinning HDD), which means that the
algorithms are, under the test conditions, close to their optimum.
This picture might change as soon as you use a high-performance server
or a SSD.

Then I should do a profile on my current setup.

I have updated v.generalize again in trunk r64067. Please test the
latest version.

[...] the Terraclass
shapefiles are full of errors. If you want to fix these errors, this
will take some time.

You know this dataset? The errors are really bugging me. It is, mostly
due to the process/tools they usually use. We have passed over the
request for a more topologically correct approach. Maybe on the next
iteration. But I'll create another thread asking advice regarding
these errors shortly

I know the Terraclass dataset a bit. I used some tiles for testing. I
was not able to import any of my test tiles without errors (after
years of thinking about the conversion of non-topological vectors to
topological vectors). Terraclass data are based on PRODES data, which
I know pretty well. The PRODES classification also comes as shapfiles
which are also full of errors, but these I managed to remove by
carefully choosing the snapping threshold for v.in.ogr.

By not previously dissolving and further doing v.clean tool=break the
original data, I've reduced the processing time from more than 30h for
1% to 24h to 11%. With the latest release, 9% in 18h.

9% in 18h seems promising.

However, this whole thing got me thinking about you said on an early message:

The check_topo function can not be executed in parallel because 1)
topology must not be modified for several boundaries in parallel, 2)
data are written to disk, and disk IO is by nature not parallel.

Well, disk IO, there's not much we can do about it.

We can here and there sometimes reduce disk IO (which I did in some of
my recent changes).

Markus M

neteler · January 14, 2015, 3:06pm

On Wed, Jan 14, 2015 at 3:54 PM, Fábio Dias <fabio.dias@gmail.com> wrote:
...

What would be the best way to do that in parallel? One mapset for each
year? Can I run multiple v.generalizes on the same input with
different outputs?

Yes sure.

My first thought was to run completely separated grass processes for
each simplification, but I didn't find a way to make it search
something different than .grass / .grass70 for the configuration
stuff....

Maybe take a look at this approach
http://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs#Grid_Engine

but even sending different v.generalize jobs to background (&) should
work if you have enough RAM.

markusN

Markus_Metz · January 18, 2015, 11:19pm

On Sun, Jan 11, 2015 at 11:32 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Sat, Jan 10, 2015 at 7:23 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

I have optimized the GRASS vector library in trunk r64032 and added
another topology check to v.generalize in trunk r64033. The profile of
v.generalize now shows that it is limited by disk I/O speed (on my
laptop with a standard laptop-like spinning HDD), which means that the
algorithms are, under the test conditions, close to their optimum.
This picture might change as soon as you use a high-performance server
or a SSD.

Then I should do a profile on my current setup.

I have updated v.generalize again in trunk r64067. Please test the
latest version.

[...] the Terraclass
shapefiles are full of errors. If you want to fix these errors, this
will take some time.

You know this dataset? The errors are really bugging me. It is, mostly
due to the process/tools they usually use. We have passed over the
request for a more topologically correct approach. Maybe on the next
iteration. But I'll create another thread asking advice regarding
these errors shortly

I know the Terraclass dataset a bit. I used some tiles for testing. I
was not able to import any of my test tiles without errors (after
years of thinking about the conversion of non-topological vectors to
topological vectors). Terraclass data are based on PRODES data, which
I know pretty well. The PRODES classification also comes as shapfiles
which are also full of errors, but these I managed to remove by
carefully choosing the snapping threshold for v.in.ogr.

By not previously dissolving and further doing v.clean tool=break the
original data, I've reduced the processing time from more than 30h for
1% to 24h to 11%. With the latest release, 9% in 18h.

9% in 18h seems promising.

As of trunk r64234, the simplification itself should be done within
minutes (heavy optimization, only updating those parts of the vector
topology that actually get changed). Please test.

Markus M

However, this whole thing got me thinking about you said on an early message:

The check_topo function can not be executed in parallel because 1)
topology must not be modified for several boundaries in parallel, 2)
data are written to disk, and disk IO is by nature not parallel.

Well, disk IO, there's not much we can do about it.

We can here and there sometimes reduce disk IO (which I did in some of
my recent changes).

Markus M

Fabio_Dias · January 25, 2015, 5:11pm

Hi,

Running r64249, with a couple of stuff in parallel using &. It seems
to be considerably slower. More than 100h, no 1% printed. To be fair,
I'm not entirely sure I'll see it when it prints, 10 v.generalize
running (5 for each year) + 1 v.in.ogr for 2012. That v.in.ogr is
running for almost 100h too. I'm loading the shps directly, as advised
way, way back in this thread.

AFAIK, no disk is been used, the whole thing is cached (after more
than 24h processing, cumulative iotop shows only a few mb
written/read). I'm no longer using a ramdisk for the grassdata dir.

However, it appears to be considerably slower, probably because of the
parallel running jobs.

My question then would be, considering the thread I saw about sqlite,
should I be using something else as backend? When it starts to make
sense to change it?

F

-=--=-=-
Fábio Augusto Salve Dias
ICMC - USP
http://sites.google.com/site/fabiodias/

On Wed, Jan 14, 2015 at 1:06 PM, Markus Neteler <neteler@osgeo.org> wrote:

On Wed, Jan 14, 2015 at 3:54 PM, Fábio Dias <fabio.dias@gmail.com> wrote:
...

What would be the best way to do that in parallel? One mapset for each
year? Can I run multiple v.generalizes on the same input with
different outputs?

Yes sure.

My first thought was to run completely separated grass processes for
each simplification, but I didn't find a way to make it search
something different than .grass / .grass70 for the configuration
stuff....

Maybe take a look at this approach
http://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs#Grid_Engine

but even sending different v.generalize jobs to background (&) should
work if you have enough RAM.

markusN

Markus_Metz · January 26, 2015, 8:30am

On Sun, Jan 25, 2015 at 6:11 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

Hi,

Running r64249, with a couple of stuff in parallel using &. It seems
to be considerably slower.

Very strange. Are you using trunk or GRASS 7.0?

More than 100h, no 1% printed. To be fair,
I'm not entirely sure I'll see it when it prints, 10 v.generalize
running (5 for each year) + 1 v.in.ogr for 2012. That v.in.ogr is
running for almost 100h too. I'm loading the shps directly, as advised
way, way back in this thread.

What exactly do you mean with "loading shps directly"? For
v.generalize, you should import them with v.in.ogr.

What about memory consumption on your system? With 10 v.generalize + 1
v.in.ogr on such a big dataset, quite a lot of memory would be used.

Markus M

AFAIK, no disk is been used, the whole thing is cached (after more
than 24h processing, cumulative iotop shows only a few mb
written/read). I'm no longer using a ramdisk for the grassdata dir.

However, it appears to be considerably slower, probably because of the
parallel running jobs.

My question then would be, considering the thread I saw about sqlite,
should I be using something else as backend? When it starts to make
sense to change it?

F

-=--=-=-
Fábio Augusto Salve Dias
ICMC - USP
http://sites.google.com/site/fabiodias/

On Wed, Jan 14, 2015 at 1:06 PM, Markus Neteler <neteler@osgeo.org> wrote:

On Wed, Jan 14, 2015 at 3:54 PM, Fábio Dias <fabio.dias@gmail.com> wrote:
...

What would be the best way to do that in parallel? One mapset for each
year? Can I run multiple v.generalizes on the same input with
different outputs?

Yes sure.

My first thought was to run completely separated grass processes for
each simplification, but I didn't find a way to make it search
something different than .grass / .grass70 for the configuration
stuff....

Maybe take a look at this approach
http://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs#Grid_Engine

but even sending different v.generalize jobs to background (&) should
work if you have enough RAM.

markusN

Markus_Metz · January 26, 2015, 9:22am

On Mon, Jan 26, 2015 at 9:30 AM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Sun, Jan 25, 2015 at 6:11 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

Hi,

Running r64249, with a couple of stuff in parallel using &. It seems
to be considerably slower.

Very strange. Are you using trunk or GRASS 7.0?

Here, v.generalize on a TerraClass tile is down from 25 minutes to 13 seconds.

More than 100h, no 1% printed. To be fair,
I'm not entirely sure I'll see it when it prints, 10 v.generalize
running (5 for each year) + 1 v.in.ogr for 2012. That v.in.ogr is
running for almost 100h too. I'm loading the shps directly, as advised
way, way back in this thread.

What exactly do you mean with "loading shps directly"? For
v.generalize, you should import them with v.in.ogr.

What about memory consumption on your system? With 10 v.generalize + 1
v.in.ogr on such a big dataset, quite a lot of memory would be used.

Markus M

AFAIK, no disk is been used, the whole thing is cached (after more
than 24h processing, cumulative iotop shows only a few mb
written/read). I'm no longer using a ramdisk for the grassdata dir.

However, it appears to be considerably slower, probably because of the
parallel running jobs.

My question then would be, considering the thread I saw about sqlite,
should I be using something else as backend? When it starts to make
sense to change it?

F

-=--=-=-
Fábio Augusto Salve Dias
ICMC - USP
http://sites.google.com/site/fabiodias/

On Wed, Jan 14, 2015 at 1:06 PM, Markus Neteler <neteler@osgeo.org> wrote:

On Wed, Jan 14, 2015 at 3:54 PM, Fábio Dias <fabio.dias@gmail.com> wrote:
...

What would be the best way to do that in parallel? One mapset for each
year? Can I run multiple v.generalizes on the same input with
different outputs?

Yes sure.

My first thought was to run completely separated grass processes for
each simplification, but I didn't find a way to make it search
something different than .grass / .grass70 for the configuration
stuff....

Maybe take a look at this approach
http://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs#Grid_Engine

but even sending different v.generalize jobs to background (&) should
work if you have enough RAM.

markusN