[GRASS5] Re: [bug #3877] (grass) r.to.vect: severe memory leaks, I'm helpless

On pon, 2005-12-05 at 13:38 +0100, Radim Blazek via RT wrote:

please read old mails on this problem. I dont have time to explain it
again and again. AFAIK there are no big memory leaks.

Is it aknowledged by Grass developers that a machine freeze at 5 mln
vector points file is a BUG (no matter what the reason is)?

If it is aknowledged, can we expect it to be fixed? When - soon, month
time, year time? Or is it going to be a "feature" and left as is?

Maciek

--------------------
W polskim Internecie s± setki milionów stron. My przekazujemy Tobie tylko najlepsze z nich!
http://katalog.epf.pl/

On Mon, 5 Dec 2005, Maciek Sieczka wrote:

On pon, 2005-12-05 at 13:38 +0100, Radim Blazek via RT wrote:
> please read old mails on this problem. I dont have time to explain it
> again and again. AFAIK there are no big memory leaks.

Is it aknowledged by Grass developers that a machine freeze at 5 mln
vector points file is a BUG (no matter what the reason is)?

If it is aknowledged, can we expect it to be fixed? When - soon, month
time, year time? Or is it going to be a "feature" and left as is?

I think this is unfair. There has been progress in GRASS 6 on this, and
the vector architecture is much stronger than it was in GRASS 5 for
moderate and large data sets, but not for XXL.

Have you considered using GRASS 5, which has sites, a very much simpler
data model for points? Have you considered tiling your data - reading
portions of your data and patching the resulting spline surfaces? Once you
have the surface, you can transfer it to GRASS 6, because as yet the
raster storage data model is effectively unchanged.

This is not a bug, it is a mis-match of data models and intentions. While
accepting that freezing (meaning causing total OS failure, or rather
occupation of all machine resources? - I don't think that a non-root user
on a sensible OS can freeze the system so that a hard shutdown (pull
power) is required) is unfortunate, it is usually caused by 100% CPU use
and swapping caused by memory being fully occupied. In well-written
software, like GRASS 6 vector or R, say, there is a balance between how
things are written, perceived needs, and user perceptions. The GRASS 6
vector data model handles areas and lines pretty well, but because it is
trying harder on these, is not well suited to XXL points data sets. My
understanding is that the authors of the *.rst programs themselves also
use GRASS 5, among other things because its sites data model is very
simple.

Best wishes,

Roger

Maciek

--------------------
W polskim Internecie s± setki milionów stron. My przekazujemy Tobie tylko najlepsze z nich!
http://katalog.epf.pl/

_______________________________________________
grass5 mailing list
grass5@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass5

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand@nhh.no

On pon, 2005-12-05 at 19:11 +0100, Roger Bivand wrote:

On Mon, 5 Dec 2005, Maciek Sieczka wrote:

> On pon, 2005-12-05 at 13:38 +0100, Radim Blazek via RT wrote:
> > please read old mails on this problem. I dont have time to explain it
> > again and again. AFAIK there are no big memory leaks.
>
> Is it aknowledged by Grass developers that a machine freeze at 5 mln
> vector points file is a BUG (no matter what the reason is)?
>
> If it is aknowledged, can we expect it to be fixed? When - soon, month
> time, year time? Or is it going to be a "feature" and left as is?

I think this is unfair.

No. This is a question. I've got a task to accomplish which I can't in
Grass6 currently. I'm asking what are the chances my problem will be
fixed soon/ever. I have about 3 months or so for my task. I need to know
where I'm standing instead of "I don't have time to explain". And how do
I even know if those "old mails" reflect the current state?

It's Radim's answer that is unfair, not my reply. Or maybe I'm doing
something rude reporting bugs and I get what I deserve?

There has been progress in GRASS 6 on this, and
the vector architecture is much stronger than it was in GRASS 5

Am I saying it isn't progressing? I'm saying it still not good enough.
But I'm happy with any single improvement that takes place. Sorry if I
don't express it enough. Sure it is easier to point out errors than good
things. But anyway, this are a devel list and bugtracker - a place for
discussing problems mainly.

for moderate and large data sets, but not for XXL.

What's XXL? I need to reproject a detailed 5m DEM of one national park
only. To do it properly, it has to be transformed into vector points,
these will be reprojected, then a DEM in new projection will be
reinterpolated. Unfortunatelly reprojecting a DEM as raster yields
distrotions AFAIK.

Have you considered using GRASS 5, which has sites, a very much simpler
data model for points?

I had had bad experience with vector point in Grass 6 before, so
actually it was the first thing to try sites in 5.4. Although I managed
to transform my 50 mln cells DEM into sites and reproject those,
s.surf.rst crashed on such dataset. Since it was an 8GB P4 with plenty
of swap, I didn't even try it at home on my 1GB RAM machine - I wonder
if I could get at least that far.

And I didn't report the bug in s.surf.rst because Grass 5 is no longer
mantained by the core dev crew.

Then I tried 6.1, wondering how far I cold go and hoping that when I
enconter problems, I'm more likely to be helped, ie. the bug would be
fixed. Or that at least my experience and the bug report will be somehow
appreciated. That was silly I see.

Have you considered tiling your data - reading
portions of your data and patching the resulting spline surfaces?

I would like to avoid it. Is it a good idea to mosaick DEM? Won't there
be artifacts at the connetcions?

Once you
have the surface, you can transfer it to GRASS 6, because as yet the
raster storage data model is effectively unchanged.

This is not a bug,

That's a very tollerant approach toward bug definition.

it is a mis-match of data models and intentions.

Do you mean that the fact Grass is not able to handle even 5 mln vector
points (1 tenth of my whole possible dataset) is something normal?
What's 5 mln points? 2236x2236 points, 10x500 km GPS tracks at 1m
interval. Something a serious GIS vector model should handle perfect.

While
accepting that freezing (meaning causing total OS failure, or rather
occupation of all machine resources?

The latter.

- I don't think that a non-root user
on a sensible OS can freeze the system so that a hard shutdown (pull
power) is required) is unfortunate, it is usually caused by 100% CPU use
and swapping caused by memory being fully occupied.

That was the case. like I described it - both 1GB ram and 1GB swap where
used, total hang, even mouse pointer freezed, had to reset.

In well-written
software, like GRASS 6 vector

That's your perception. From my point of view Grass vector model is not
well written yet. Of course, one might say "send us a patch", and
"limited man power". Unfortunatelly I'm not able to send a patch. All I
can do for Grass is to test the code, report bugs, help in the
bugtracker a bit, help other users when I have some time (it is cool to
show off a little, isn't it). Which I do. Not as much as I would like
to, but anyway, I do contribute a bit. And even if I didn't, I do
deserve a decent reply.

or R, say, there is a balance between how
things are written, perceived needs, and user perceptions. The GRASS 6
vector data model handles areas and lines pretty well, but because it is
trying harder on these, is not well suited to XXL points data sets.

Again, is 5 mln points an XXL? If it was the raster engine in Grass to
fail on 5 mln cells, would you still say it is XXL? Or that it was a
serious bug in the raster engine?

My
understanding is that the authors of the *.rst programs themselves also
use GRASS 5, among other things because its sites data model is very
simple.

Like I said, I managed to transform to sites and reprojected them on an
8GB beast but s.surf.rst crashed anyway. Can't recall the error message,
assumed it was pointless anyway since Grass 5 is no longer mantained
AAMOF. But I could retry to reproduce the error if it would mnake sense
(ie. if any chances it would be fixed in 5.4).

Best wishes,

Thanks for your interest.

Best regards,
Maciek

--------------------
W polskim Internecie s± setki milionów stron. My przekazujemy Tobie tylko najlepsze z nich!
http://katalog.epf.pl/

On Mon, 5 Dec 2005, Maciek Sieczka wrote:

On pon, 2005-12-05 at 19:11 +0100, Roger Bivand wrote:
> On Mon, 5 Dec 2005, Maciek Sieczka wrote:
>
> > On pon, 2005-12-05 at 13:38 +0100, Radim Blazek via RT wrote:
> > > please read old mails on this problem. I dont have time to explain it
> > > again and again. AFAIK there are no big memory leaks.
> >
> > Is it aknowledged by Grass developers that a machine freeze at 5 mln
> > vector points file is a BUG (no matter what the reason is)?
> >
> > If it is aknowledged, can we expect it to be fixed? When - soon, month
> > time, year time? Or is it going to be a "feature" and left as is?
>
> I think this is unfair.

No. This is a question. I've got a task to accomplish which I can't in
Grass6 currently. I'm asking what are the chances my problem will be
fixed soon/ever. I have about 3 months or so for my task. I need to know
where I'm standing instead of "I don't have time to explain". And how do
I even know if those "old mails" reflect the current state?

It's Radim's answer that is unfair, not my reply. Or maybe I'm doing
something rude reporting bugs and I get what I deserve?

No, you have (your perception of) a real problem, and are trying to find a
feasible solution. Radim's work on the vector data model has helped with
many problems, but not this one. Getting at a problem like this is, as you
know well, very layered. You have an input DEM, which you need to warp
with high precision in the z-dimension to a different spatial reference
system.

Assuming that the input DEM is an exact representation of the
sub-vegetation surface as it was when it was surveyed, warping will
introduce some errors and interpolating will introduce others. You have
chosen to transform the raster points (abot 50M) to the target spatial
reference system and interpolate, I guess because you tried warping and
found the error unacceptable. You have three months (but there is snow in
the ground now), so field surveying to establish a baseline for error in
one or several trial plots is still feasible, maybe?

But I'm not aware of surveyed or interpolated DEMs that are without
measurement error themselves - David Unwin has a nice statement in one of
his books about the sobering effect of comparing field survey elevations
from leveling and DEM values. So what we are looking for is a way of
getting from the input DEM to the output DEM without introducing
systematic error (like the artefacts at patch/tile boundaries) and without
adding too much to the error already present.

Given that GRASS 6 vector points for even 10% of the data set are a
problem, is it possible to establish the relative performance of warping
versus interpolation on - say - a number of 1% sample plots? Does r.proj
give similar outcomes to gdalwarp? I guess you've looked at all of this,
and I apologize for thinking aloud. I just feel that getting to the main
question of making sure output map errors are not systematic is quite
difficult, and not obvious.

Since the area of interest has fairly large changes of elevation over
short distances in some parts, it might even be possible to thin out
uninformative points (say those within some threshold of their
neighbours) where the relief is not detailed, keeping points that are
"needed" for interpolation.

> There has been progress in GRASS 6 on this, and
> the vector architecture is much stronger than it was in GRASS 5

Am I saying it isn't progressing? I'm saying it still not good enough.
But I'm happy with any single improvement that takes place. Sorry if I
don't express it enough. Sure it is easier to point out errors than good
things. But anyway, this are a devel list and bugtracker - a place for
discussing problems mainly.

> for moderate and large data sets, but not for XXL.

What's XXL? I need to reproject a detailed 5m DEM of one national park
only. To do it properly, it has to be transformed into vector points,
these will be reprojected, then a DEM in new projection will be
reinterpolated. Unfortunatelly reprojecting a DEM as raster yields
distrotions AFAIK.

> Have you considered using GRASS 5, which has sites, a very much simpler
> data model for points?

I had had bad experience with vector point in Grass 6 before, so
actually it was the first thing to try sites in 5.4. Although I managed
to transform my 50 mln cells DEM into sites and reproject those,
s.surf.rst crashed on such dataset. Since it was an 8GB P4 with plenty
of swap, I didn't even try it at home on my 1GB RAM machine - I wonder
if I could get at least that far.

And I didn't report the bug in s.surf.rst because Grass 5 is no longer
mantained by the core dev crew.

Where crash means segmentation fault or complete occupation of machine
resources? Again, I'm unsure whether all the points are essential to reach
a result without larger and systematic errors.

Then I tried 6.1, wondering how far I cold go and hoping that when I
enconter problems, I'm more likely to be helped, ie. the bug would be
fixed. Or that at least my experience and the bug report will be somehow
appreciated. That was silly I see.

Not silly, but still a difference between what you expected the
combination of software and hardware to carry out and what other users and
developers have seen as being their priority.

> Have you considered tiling your data - reading
> portions of your data and patching the resulting spline surfaces?

I would like to avoid it. Is it a good idea to mosaick DEM? Won't there
be artifacts at the connetcions?

Yes, but are they larger (wide overlaps and average the values) than the
errors already in the data? If yes, we are stuck, if no, there is a way
forward, and recall that *.rst and other interpolators for this kind of
data use a (very) small moving window over the data anyway, so tiling and
patching is happening anyway. The key thing is the scale of the errors and
whether they are systematic.

> Once you
> have the surface, you can transfer it to GRASS 6, because as yet the
> raster storage data model is effectively unchanged.
>
> This is not a bug,

That's a very tollerant approach toward bug definition.

A bug is when software does not do what it is designed to do, leaving
quite a margin for interpretation in what users/developers think it is
supposed to do. Things like deleting files when not asked to are bugs, as
are overwriting objects in memory or seg-faults from freeing unallocated
pointers. Those are more "objective", even though they can be very hard to
find (valgrind helps), but differences in understanding of purpose are not
bugs in my view.

> it is a mis-match of data models and intentions.

Do you mean that the fact Grass is not able to handle even 5 mln vector
points (1 tenth of my whole possible dataset) is something normal?
What's 5 mln points? 2236x2236 points, 10x500 km GPS tracks at 1m
interval. Something a serious GIS vector model should handle perfect.

Each of these data points is carrying information, so the question is how
much you need to handle to deal with the problem - and this varies very
much indeed. I have no idea which commercial GIS could interpolate your
data or warp them perfectly, but as you've seen, perfect isn't a word I
associate with the natural world, seems to fit virtual reality better! If
your 5m grid data are really very accurate (and the accuracy is invariant
across the whole area), then going by patches or warping are options in
GRASS, but as you've demonstrated, neither interpolating in GRASS 5 (what
did gdb say when s.surf.rst failed) nor handling the 50M points in GRASS
6 in one mouthful seem to work. Any solution is going to be messy, even if
s.surf.rst had run in GRASS 5, how should we know that the same tension
parameter should be applied across the whole map? Is it at all feasible to
divide the map up into zones of similar roughness (more rough meaning
more careful choice of *.rst parameters)?

> While
> accepting that freezing (meaning causing total OS failure, or rather
> occupation of all machine resources?

The latter.

> - I don't think that a non-root user
> on a sensible OS can freeze the system so that a hard shutdown (pull
> power) is required) is unfortunate, it is usually caused by 100% CPU use
> and swapping caused by memory being fully occupied.

That was the case. like I described it - both 1GB ram and 1GB swap where
used, total hang, even mouse pointer freezed, had to reset.

You can run GRASS from the command line without a GUI - even though
response time may be very slow, having one terminal (say Ctrl-Alt-1) for
you and GRASS, and another for you logged in with the correct process
number for what you are running keyed in already to kill -9 should be
accessible. Mice die, but CLI still runs, like Duracell rabbits.

> In well-written
> software, like GRASS 6 vector

That's your perception. From my point of view Grass vector model is not
well written yet. Of course, one might say "send us a patch", and
"limited man power". Unfortunatelly I'm not able to send a patch. All I
can do for Grass is to test the code, report bugs, help in the
bugtracker a bit, help other users when I have some time (it is cool to
show off a little, isn't it). Which I do. Not as much as I would like
to, but anyway, I do contribute a bit. And even if I didn't, I do
deserve a decent reply.

I'm trying (both senses, I'm sure), but this is a serious problem (the
error propagation is what needs controlling), so finding out how the
errors behave for different approaches seems justified (and you have
certainly thought of this, your earlier many constructive and positive
contributions to the list show that you are serious about what you do).
Please forgive the length of my reply, but I still think that you may be
able to use the software as it is to get an acceptable result, but that
doing it all at once is not the only option.

> or R, say, there is a balance between how
> things are written, perceived needs, and user perceptions. The GRASS 6
> vector data model handles areas and lines pretty well, but because it is
> trying harder on these, is not well suited to XXL points data sets.

Again, is 5 mln points an XXL? If it was the raster engine in Grass to
fail on 5 mln cells, would you still say it is XXL? Or that it was a
serious bug in the raster engine?

It is quite a lot of data, certainly far more that the people who wrote
much of the software thought of handling.

> My
> understanding is that the authors of the *.rst programs themselves also
> use GRASS 5, among other things because its sites data model is very
> simple.

Like I said, I managed to transform to sites and reprojected them on an
8GB beast but s.surf.rst crashed anyway. Can't recall the error message,
assumed it was pointless anyway since Grass 5 is no longer mantained
AAMOF. But I could retry to reproduce the error if it would mnake sense
(ie. if any chances it would be fixed in 5.4).

Because the *.surf.rst programs are closely related, and are associated
with published research, I think their authors are the best people to
comment on this. I recall that you were in touch with them about faults in
September.

Roger

>
> Best wishes,

Thanks for your interest.

Best regards,
Maciek

--------------------
W polskim Internecie s± setki milionów stron. My przekazujemy Tobie tylko najlepsze z nich!
http://katalog.epf.pl/

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand@nhh.no

Roger and Radim,

I don't really want to piss anyone off but this is my opinion.

First, I searched the archives for the old archives and was unable to
find any references to r.to.vect halting the system.

Second, if this is a problem that has come up repeatedly, this is an
indication that work is needed in this area. Probably updating the
documentation outlining this problem would be a good idea so someone
doesn't have to reply with more than "Look here" or "RTFM".

Third, as a former database programmer, I would say that when a
program fails, it should fail early, loudly (with abundant information
to help solve the problem), and nicely (not halting the system). AFAICT
Maciek was using r.to.vect as the program was intended, converting a
raster to a vector. If there are is some sort of artificial file size
issues, that should be clearly stated in the documentation, or better
yet the program should check is first and fail without wasting the users
time or computing resources. Based on these reasonable user and
developer based criteria, this is a bug. That said I don't know what the
issues are surrounding this process, but I took a brief look at the
vector specification in the developers documentation this afternoon out
of curiosity and I can't see how this conversion would need to be
written for point files where file size or even computing resources
should be an issue.

I appreciate the work that Radim has put into the vector portion of
GRASS. I'm sure he is busy just like everyone else, thus point 2 above.

Roger thanks for constructive suggestions. It is always a pleasure to
read your posts on any list; always helpful and constructive.

T
--
Trevor Wiens
twiens@interbaun.com

The significant problems that we face cannot be solved at the same
level of thinking we were at when we created them.
(Albert Einstein)

I really sympathize with your current predicament, Maciek. Just to
clear things a little, I think that what Radim meant is that there are
no memory *leaks*. It's just that the memory *requirements* of your
problem are well beyond grass's architecture current capabilities. You
just don't have enough memory to perform the operation, the machine
goes into swapping out everything, including your X server, and never
goes out of it before you kill it.

The only workaround I can think of is

1) LOTS of swap and
2) LOTS of patience.

This is linux, right? You can try to add swapfiles (to more than the
8GB you already tried) and then, knowing that disk access is in the
order of 10^3 slower than RAM, maybe some timings on small datasets
might allow you to calculate the required time for it to complete.
Then launch it at night. Or week-end. I've been through this kind of
hassle for other applications, and that approach worked for me before.
Especially the "patience" part :slight_smile:

But anyway, we should indeed put these memory-full-related problems on
the TODO list for 7.0.

Best wishes.

Daniel.

On 12/5/05, Maciek Sieczka <werchowyna@epf.pl> wrote:

On pon, 2005-12-05 at 13:38 +0100, Radim Blazek via RT wrote:
> please read old mails on this problem. I dont have time to explain it
> again and again. AFAIK there are no big memory leaks.

Is it aknowledged by Grass developers that a machine freeze at 5 mln
vector points file is a BUG (no matter what the reason is)?

If it is aknowledged, can we expect it to be fixed? When - soon, month
time, year time? Or is it going to be a "feature" and left as is?

Maciek

--------------------
W polskim Internecie s± setki milionów stron. My przekazujemy Tobie tylko najlepsze z nich!
http://katalog.epf.pl/

_______________________________________________
grass5 mailing list
grass5@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass5

--
-- Daniel Calvelo Aros

How about a FAQ on the grass web site?
Any volunteer? Just scrolling through the mailing lists and putting them up
together.
All the best.
pc

At 01:17, martedì 06 dicembre 2005, Trevor Wiens has probably written:

Roger and Radim,

I don't really want to piss anyone off but this is my opinion.

First, I searched the archives for the old archives and was unable to
find any references to r.to.vect halting the system.

Second, if this is a problem that has come up repeatedly, this is an
indication that work is needed in this area. Probably updating the
documentation outlining this problem would be a good idea so someone
doesn't have to reply with more than "Look here" or "RTFM".

...
--
Paolo Cavallini
email+jabber: cavallini@faunalia.it
www.faunalia.it
Piazza Garibaldi 5 - 56025 Pontedera (PI), Italy Tel: (+39)348-3801953

How about a FAQ on the grass web site?
Any volunteer? Just scrolling through the mailing lists and putting them up
together.

Already on the wiki:

http://grass.gdf-hannover.de/twiki/bin/view/GRASS/WebHome

The wiki is open to all to contribute. Just create yourself an account.

Hamish

> for moderate and large data sets, but not for XXL.

What's XXL?

t-shirt speak for extra-extra-large.

I need to reproject a detailed 5m DEM of one national park
only. To do it properly, it has to be transformed into vector points,
these will be reprojected, then a DEM in new projection will be
reinterpolated. Unfortunatelly reprojecting a DEM as raster yields
distrotions AFAIK.

give it a try, then compare. It might not be that bad in practice. For a
non-categorical map, try 'r.proj method=cubic' for a better result.
Randomly select test points (v.random -> v.what.rast; v.proj ->
v.what.rast) for a simple quantitative comparison or error.

> If it is aknowledged, can we expect it to be fixed? When - soon,
> month time, year time? Or is it going to be a "feature" and left as
> is?

the official Free software answer to this FAQ: "It will be done when it
is ready, sooner if you help." (c.1992 ref Linux 1.0)

And I didn't report the bug in s.surf.rst because Grass 5 is no longer
mantained by the core dev crew.

severe bugs will be given a cursory look, data corruption bugs will be
explored, and if reported to the bug tracker (in the correct category;
5.4 is still there) any normal [unfixed] bugs listed will at minimum
give others seeing the same bug a sense of solidarity and closure. Minor
bugs will be likely ignored, unless trivial and/or a patch is supplied.
i.e. GRASS 5.4 is still maintained, just not actively.

> This is not a bug,
That's a very tollerant approach toward bug definition.

This is a known limitation in the data model. There is no error in the
code, it just wasn't designed to deal with that large of a dataset.
The GRASS 6 vector model was designed before the popularization of LIDAR
and multibeam sonar. Anything over 3 million points or so will use up
all your memory. Your OS should see this and kill the offending
application when this happens. All solutions and work-arounds suggested
to date have not been satisfactory, so the issue remains open until we
can figure out something better. Who knows when that will be? As far as
I understand from Radim's explainations, there appears to be no quick
fix.

An outstanding issue which can be resolved is a port of s.cellstats to
GRASS 6. This would help ameliorate the problem for some people..
Or a preprossor utility...

> - I don't think that a non-root user
> on a sensible OS can freeze the system so that a hard shutdown (pull
> power) is required) is unfortunate, it is usually caused by 100% CPU
> use and swapping caused by memory being fully occupied.

That was the case. like I described it - both 1GB ram and 1GB swap
where used, total hang, even mouse pointer freezed, had to reset.

The point being that your OS should have killed the process that was
causing the problem. GRASS just asks the OS for more memory. If the OS
keeps giving a program more memory after it has already run out and is
gasping for air, we can't help that.

That's your perception. From my point of view Grass vector model is
not well written yet.

It is beautifully written to do what it was intended to do. The problem
is that you (and others) want it to do something that it was not
intended to do. It can still call itself a good GIS vector engine in
good conscience, but it is not all things.

Also, I would suggest that this sort of comment is not the best way to
motivate a designer to make changes for you. Remember that nothing kills
the volunteer spirit faster than demands and insulting the creator's
baby.

If this problem affects enough people (and developers) it will
eventually affect someone who can a) fix it or b) pay someone else to
fix it; and then it will be fixed. This problem seems to affect a number
of people, so there is hope. Can't promise you more than that.

Trevor:

First, I searched the archives for the old archives and was unable to
find any references to r.to.vect halting the system.

No, you won't find it looking for r.to.vect; the same issue was mostly
discussed with respect to creating vector points with v.in.ascii. Try as
search terms: "LIDAR" "memory" "v.in.ascii" "valgrind"; for emails from
Radim, Helena, and myself.

Second, if this is a problem that has come up repeatedly, this is an
indication that work is needed in this area.

I think we all agree that is true. But we don't know of a good solution
yet.

Probably updating the documentation outlining this problem would be a
good idea

Yes, it would be good to mention in the v.in.ascii and r.to.vect help
pages that creating several million data points may use large amounts of
memory, which may cause problems.

Note that once the import issue is solved, other modules start to have
problems with these huge datasets too.

Third, as a former database programmer, I would say that when a
program fails, it should fail early, loudly (with abundant information
to help solve the problem), and nicely (not halting the system).
AFAICT Maciek was using r.to.vect as the program was intended,
converting a raster to a vector. If there are is some sort of
artificial file size issues, that should be clearly stated in the
documentation, or better yet the program should check is first and
fail without wasting the users time or computing resources. Based on
these reasonable user and developer based criteria, this is a bug.
That said I don't know what the issues are surrounding this process,
but I took a brief look at the vector specification in the developers
documentation this afternoon out of curiosity and I can't see how this
conversion would need to be written for point files where file size or
even computing resources should be an issue.

Band-aid approach (curing the symptom, not the cause):
Once number of points is known, a calculation of memory use could be
done (~300 bytes per vector point?, best create a test point & sizeof()
rather than hardcode "300"). G_malloc() and G_free() could be
called as a test, which will call G_fatal_error() if the dataset is too
huge to complete. I don't think this reduces the need for a solution,
just makes the failure friendlier.

Another angle of attack is to get that 300 bytes per point number lower.

10 million points should be possible given enough swap space, the
current model, and today's hardware; 50 million is really pushing it.
From the valgrind analysis the growth seems to be linear:
  http://bambi.otago.ac.nz/hamish/grass/memleak/v.in.ascii/

hint for adding more swap space to linux on the fly:
  http://grass.itc.it/pipermail/grassuser/2002-February/006070.html

Hamish

I wrote:

If this problem affects enough people (and developers) it will
eventually affect someone who can a) fix it or b) pay someone else to
fix it; and then it will be fixed. This problem seems to affect a number
of people, so there is hope. Can't promise you more than that.

well what do you know, a first step just 8 day ago:

http://grass.itc.it/pipermail/grass-commit/2005-November/019366.html

Hamish

On 12/5/05, Maciek Sieczka <werchowyna@epf.pl> wrote:

On pon, 2005-12-05 at 13:38 +0100, Radim Blazek via RT wrote:
> please read old mails on this problem. I dont have time to explain it
> again and again. AFAIK there are no big memory leaks.

Is it aknowledged by Grass developers that a machine freeze at 5 mln
vector points file is a BUG (no matter what the reason is)?

It is not bug, it is feature, bad feature.

If it is aknowledged, can we expect it to be fixed? When - soon, month
time, year time? Or is it going to be a "feature" and left as is?

I suggested solution. Unfortunately those who need to work with larger
datasets do not have will to fix that and I don't have time enough
to work on something I don't currently need.

Radim

Thanks to all Folks who replied! I appreciate your comments a lot! Sorry
if I insulted anyone. Sure nobody likes to be treated in an arogant and
dismissive way.

On wto, 2005-12-06 at 21:29 +1300, Hamish wrote:

> > for moderate and large data sets, but not for XXL.
>
> What's XXL?

t-shirt speak for extra-extra-large.

I was asking what Roger meant by "XXL datasets", might be I was not
explicit enough :).

We do have t-shirts in Poland. However, it is true we prefer hunting our
dinosaurs naked :).

> I need to reproject a detailed 5m DEM of one national park
> only. To do it properly, it has to be transformed into vector points,
> these will be reprojected, then a DEM in new projection will be
> reinterpolated. Unfortunatelly reprojecting a DEM as raster yields
> distrotions AFAIK.

give it a try, then compare. It might not be that bad in practice. For a
non-categorical map, try 'r.proj method=cubic' for a better result.
Randomly select test points (v.random -> v.what.rast; v.proj ->
v.what.rast) for a simple quantitative comparison or error.

Whatever resampling method there will be artifacts when reprojecting an
fp raster. For nearest nigbour my fp DEM will get a systematiic error
detectable on eg. aspect maps, for bilinear and more adavnced this
systematic error will decrease but original elavations will get
distorted. I don't want either.

> > If it is aknowledged, can we expect it to be fixed? When - soon,
> > month time, year time? Or is it going to be a "feature" and left as
> > is?

the official Free software answer to this FAQ: "It will be done when it
is ready, sooner if you help." (c.1992 ref Linux 1.0)

So I'm helping the way I can. Testing, repoting. Sorry if not enough. We
are all busy.

> And I didn't report the bug in s.surf.rst because Grass 5 is no longer
> mantained by the core dev crew.

severe bugs will be given a cursory look, data corruption bugs will be
explored, and if reported to the bug tracker (in the correct category;
5.4 is still there) any normal [unfixed] bugs listed will at minimum
give others seeing the same bug a sense of solidarity and closure. Minor
bugs will be likely ignored, unless trivial and/or a patch is supplied.
i.e. GRASS 5.4 is still maintained, just not actively.

Ok, when I have time to reproduce the problem in 5.4, I will report.

> > This is not a bug,
> That's a very tollerant approach toward bug definition.

This is a known limitation in the data model. There is no error in the
code, it just wasn't designed to deal with that large of a dataset.
The GRASS 6 vector model was designed before the popularization of LIDAR
and multibeam sonar. Anything over 3 million points or so will use up
all your memory. Your OS should see this and kill the offending
application when this happens. All solutions and work-arounds suggested
to date have not been satisfactory, so the issue remains open until we
can figure out something better. Who knows when that will be? As far as
I understand from Radim's explainations, there appears to be no quick
fix.

You are talking from the developer's point of view, myself from the
user's. We will never agree here.

An outstanding issue which can be resolved is a port of s.cellstats to
GRASS 6. This would help ameliorate the problem for some people..
Or a preprossor utility...

Grass 6 was supposed to be sites free. If sites functionality gets to
Grass6 the point vector engine will be in danger of being less used,
tested and fixed. This we don't want. IMHO functional sites import, and
maybe export, should be all we need here. Vecto engine should be fixed
instaed to handle any dataset. Besides, although my problem refers to
vector points, I guess it would also appear for other geometry types
when milions of objects are involved - or am I wrong here?

> > - I don't think that a non-root user
> > on a sensible OS can freeze the system so that a hard shutdown (pull
> > power) is required) is unfortunate, it is usually caused by 100% CPU
> > use and swapping caused by memory being fully occupied.
>
> That was the case. like I described it - both 1GB ram and 1GB swap
> where used, total hang, even mouse pointer freezed, had to reset.

The point being that your OS should have killed the process that was
causing the problem. GRASS just asks the OS for more memory. If the OS
keeps giving a program more memory after it has already run out and is
gasping for air, we can't help that.
> That's your perception. From my point of view Grass vector model is
> not well written yet.

It is beautifully written to do what it was intended to do. The problem
is that you (and others) want it to do something that it was not
intended to do. It can still call itself a good GIS vector engine in
good conscience, but it is not all things.

Also, I would suggest that this sort of comment is not the best way to
motivate a designer to make changes for you. Remember that nothing kills
the volunteer spirit faster than demands and insulting the creator's
baby.

Nothing kills the volunteer user spirit for testing and reporting as his
effort not being taken into account. Most Grass developers know it and
appreciate bug reports, even if sometimes silly or wrong. Look at it his
way: Lots of people use Grass. All users must have problems. How many
report bugs in a reasonable way presently, ecluding developers? 10? Out
of hundreds or thousands using Grass? Can Grass afford loosing these 10?

If this problem affects enough people (and developers) it will
eventually affect someone who can a) fix it or b) pay someone else to
fix it; and then it will be fixed. This problem seems to affect a number
of people, so there is hope. Can't promise you more than that.

Thanks.

Trevor:
> First, I searched the archives for the old archives and was unable to
> find any references to r.to.vect halting the system.

No, you won't find it looking for r.to.vect; the same issue was mostly
discussed with respect to creating vector points with v.in.ascii. Try as
search terms: "LIDAR" "memory" "v.in.ascii" "valgrind"; for emails from
Radim, Helena, and myself.

None of those emails provide an answer if and when the problem may be
fixed. I can't find it at least. Maybe it is there, but I don't see it.

> Second, if this is a problem that has come up repeatedly, this is an
> indication that work is needed in this area.

I think we all agree that is true. But we don't know of a good solution
yet.

Thanks,
Maciek

--------------------
W polskim Internecie s± setki milionów stron. My przekazujemy Tobie tylko najlepsze z nich!
http://katalog.epf.pl/

On wto, 2005-12-06 at 22:32 +1300, Hamish wrote:

well what do you know, a first step just 8 day ago:

http://grass.itc.it/pipermail/grass-commit/2005-November/019366.html

Cool. Can anybody say how many points without topology I can import
using this modified v.in.ascii?

Maciek

--------------------
W polskim Internecie s± setki milionów stron. My przekazujemy Tobie tylko najlepsze z nich!
http://katalog.epf.pl/

I haven't tried this particular patch, but I tried something very
similar and got over 20 million points with very low memory usage, so it
should scale beyond that. In 5.4, my largest set was over 500 million.

Even though v.in.ascii will be able to import large sets, other vector
modules (v.surf.rst, for example) would need to be modified to handle
the case when topology isn't built. It can be done, but as Radim
suggested, you end up with a sites model and a vector model, which Grass
6+ is trying to avoid.

I have some familiarity with working with large sets with v.surf.rst
and v.in.ascii and I can try to look into possible "fixes", though I'm
reluctant to use the word "fix". You can turn off the topology and get
rst and ascii import working for you particular application, but you
break the vector model as a result.

-Andy

On Tue, 2005-12-06 at 12:41 +0100, Maciek Sieczka wrote:

On wto, 2005-12-06 at 22:32 +1300, Hamish wrote:
> well what do you know, a first step just 8 day ago:
>
> http://grass.itc.it/pipermail/grass-commit/2005-November/019366.html

Cool. Can anybody say how many points without topology I can import
using this modified v.in.ascii?

Maciek

--------------------
W polskim Internecie s setki milionw stron. My przekazujemy Tobie tylko najlepsze z nich!
http://katalog.epf.pl/

_______________________________________________
grass5 mailing list
grass5@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass5

On Tue, 6 Dec 2005, Maciek Sieczka wrote:

On wto, 2005-12-06 at 22:32 +1300, Hamish wrote:
> well what do you know, a first step just 8 day ago:
>
> http://grass.itc.it/pipermail/grass-commit/2005-November/019366.html

Cool. Can anybody say how many points without topology I can import
using this modified v.in.ascii?

Maciek

Timings on a 1.5GHz P4 (1GB), 2D points and single int value

500 points topo < 1 second
500 points notopo < 1 second
50K points topo 17 seconds
50K points notopo 10 seconds
1M points topo 403 seconds - dbf process > 200MB, v.in.ascii > 300MB
1M points notopo 216 seconds - dbf process > 200MB

so -b saves about half the time for 1M points, but the dbf process is
still large and with more points will lead to swapping - this is writing
out the data to the dbf file. Maybe a real DBMS would relieve this, I
don't know. So even with -b, you are still constrained by memory, and
simple modules like v.info can't handle the -b case:

GRASS 6.1.cvs (tull):~/tmp/gpts > v.info gptsb
ERROR: Cannot open old vector gptsb@rsb on level 2
GRASS 6.1.cvs (tull):~/tmp/gpts > v.info gpts
+----------------------------------------------------------------------------+
| Layer: gpts Organization: |
| Mapset: rsb Source Date: |
| Location: tull Name of creator: |
| Database: /home/rsb/topics/grassdata |
| Title: |
| Map Scale: 1:1 |
| Map format: native |
|----------------------------------------------------------------------------|
| Type of Map: Vector (level: 2) |
| |
| Number of points: 1000000 Number of areas: 0 |
| Number of lines: 0 Number of islands: 0 |
| Number of boundaries: 0 Number of faces: 0 |
| Number of centroids: 0 Number of kernels: 0 |
| |
| Map is 3D: 0 |
| Number of dblinks: 1 |
| |
| Projection: UTM (zone 0) |
| N: 56.000 S: 46.000 |
| E: 25.000 W: 5.000 |
| B: 0.000 T: 0.000 |
| |
| Digitize threshold: 0.00000 |
| Comments: |
| |
+----------------------------------------------------------------------------+

Data generated in R:

set.seed(051206)
v <- rpois(1000000, 6)
x <- runif(1000000, 5, 25)
y <- runif(1000000, 46, 56)
cat(file="gpts.txt", paste(x, "|", y, "|", v, "\n", sep=""))
cat(file="gptsS.txt", paste(x[1:50000], "|", y[1:50000], "|", v[1:50000],
"\n", sep=""))
cat(file="gptsXXS.txt", paste(x[1:500], "|", y[1:500], "|", v[1:500],
"\n", sep=""))

--------------------
W polskim Internecie s± setki milionów stron. My przekazujemy Tobie tylko najlepsze z nich!
http://katalog.epf.pl/

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand@nhh.no