[GRASS5] 5.7.1

H_B · November 30, 2004, 1:41am

Hi,

I was wondering if there was interest in releasing a 5.7.1 snapshot- for
stability purposes taken from just before the freetype / i18n
modifications were added in. Some folks (specifically Debian) only want
a new packages a couple times a year and would much rather work from an
"official" release. I think recent changes to 5.7-cvs need some time to
settle before we have a new release. 5.7.0 is now 6 months old.

Could this be done without too much work?
cvs checkout -D ...?

thanks,
Hamish

Radim_Blazek · November 30, 2004, 10:10am

Hamish wrote:

Hi,

I was wondering if there was interest in releasing a 5.7.1 snapshot- for
stability purposes taken from just before the freetype / i18n
modifications were added in. Some folks (specifically Debian) only want
a new packages a couple times a year and would much rather work from an
"official" release. I think recent changes to 5.7-cvs need some time to
settle before we have a new release. 5.7.0 is now 6 months old.

Could this be done without too much work?
cvs checkout -D ...?

We must make beta versions before any official release. There were
6 beta for 5.7.0 and it took about 1 month.

I would like to start 6.0.0 beta releases soon (in few weeks).
On my TODO < 6.0.0 is only:
- db_create_index for cat cols (simple)
- grant select to group and public (almost done)
- include -> include/grass (volunteer? tools/install-header?)
- map overwriting - grass variable also for rasters, default no
                    (volunteer?)
- WIND - WIND3 sync ?
- v.overlay - join tables
- v/r.random add cats
- optional fflush in vectlib (use only after Vect_write_line on level2)
- all vector modules (help me!):
     - check option and flag names (native speakers welcome)
     - check if table is copied
     - check multicats behaviour, especially more cats in the same layer
- options:
     - short description/prompt (for GUI)?
     - description for each item in options list of option
        (for manual and GUI)
- region lock ?

It is not necessary to have everything we want in 6.0.0.
We cannot change/remove modules/options/library/behaviour
during 6.x line but we can add new features.
For 6.0.0, it is more important to remove everything which
seems to be bad (but not the whole source) rather than
to add new features.

During the beta testing we can add new modules (hopefully v.db.select, v.db.alter, v.db.table?, db.in.ascii, d.mrast, v.join...) but we should not change any existing module/lib, I think.

Let me know about other things which must be done before 6.0.0.

Radim

Helena_Mitasova · November 30, 2004, 3:07pm

The rst programs need to be tested - there was a problem with reading the vector
points taking extremely long (I believe this was fixed) but there may be other things,
I will try to get to it this week. I haven't used it yet.

Helena

Radim Blazek wrote:

Hamish wrote:

Hi,

I was wondering if there was interest in releasing a 5.7.1 snapshot- for
stability purposes taken from just before the freetype / i18n
modifications were added in. Some folks (specifically Debian) only want
a new packages a couple times a year and would much rather work from an
"official" release. I think recent changes to 5.7-cvs need some time to
settle before we have a new release. 5.7.0 is now 6 months old.

Could this be done without too much work?
cvs checkout -D ...?

We must make beta versions before any official release. There were
6 beta for 5.7.0 and it took about 1 month.

I would like to start 6.0.0 beta releases soon (in few weeks).
On my TODO < 6.0.0 is only:
- db_create_index for cat cols (simple)
- grant select to group and public (almost done)
- include -> include/grass (volunteer? tools/install-header?)
- map overwriting - grass variable also for rasters, default no
                   (volunteer?)
- WIND - WIND3 sync ?
- v.overlay - join tables
- v/r.random add cats
- optional fflush in vectlib (use only after Vect_write_line on level2)
- all vector modules (help me!):
    - check option and flag names (native speakers welcome)
    - check if table is copied
    - check multicats behaviour, especially more cats in the same layer
- options:
    - short description/prompt (for GUI)?
    - description for each item in options list of option
       (for manual and GUI)
- region lock ?

It is not necessary to have everything we want in 6.0.0.
We cannot change/remove modules/options/library/behaviour
during 6.x line but we can add new features.
For 6.0.0, it is more important to remove everything which
seems to be bad (but not the whole source) rather than
to add new features.

During the beta testing we can add new modules (hopefully v.db.select, v.db.alter, v.db.table?, db.in.ascii, d.mrast, v.join...) but we should not change any existing module/lib, I think.

Let me know about other things which must be done before 6.0.0.

Radim

_______________________________________________
grass5 mailing list
grass5@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass5

Radim_Blazek · November 30, 2004, 3:31pm

Helena wrote:

The rst programs need to be tested - there was a problem with reading the vector
points taking extremely long (I believe this was fixed) but there may be other things,
I will try to get to it this week. I haven't used it yet.

v.surf.rst for example, was added 2 years ago. If it was not tested
in 2 years, people are interested enough. If it is slow, it will
be slow in 6.0.0, anybody can fix it for 6.1 as anybody could
fix it in last 2 years.

Idea of 6.0.0 is to give people what is available now.
5.7 has many new features but people don't use that, because
it is marked as development version.

What people contribute will be in GRASS. It does not make sense to wait for features which nobody ever writes.

Radim

Helena_Mitasova · December 2, 2004, 5:56pm

New data set for testing the GRASS5.7 volume support has been added
to the GRASS Sample datasets (http://grass.itc.it/data.html):
it includes Slovakia DEM and Precipitation data.
You can get it directly from here:
http://mpa.itc.it/grasstutor/data_menu2nd.phtml
See sample visualization here http://grass.itc.it/grid3d/index.html
(images on the top of the page)
The volume visualization still needs some "tuning"
so below are some hints how to make the images.

Let us know if there are any serious problems
(but read the below hints before complaining that you don't see
anything)

Helena
-----------------------------------------------------------
Quick tour:
The map 'dem500' is used for a DEM.
There is a 3D file 'precip3d.500z50' that can be used
for isosurfaces and cross-sections.

nviz el=dem500

Then load the volume (Panel -> Volume -> New).

To see the isosurfaces use values between 600 and 1300
(e.g. 700 and 1000 with transparency of approx. 100 and
  a third isosurface 1300 with no transparency).

Set the polygon resolution to 3 or less (note that for
resolution 1 you may need to wait a little depending on your
machine's speed) to render the volume.

Radim_Blazek · December 3, 2004, 11:35am

Helena wrote:

it is not v.surf.rst that is the problem, it is v.in.sites that takes extremely
long even for medium size site files. So it is hard to test it when one cannot
get his data imported - it was already meantioned in the list.

Idea of 6.0.0 is to give people what is available now.

what is available now is still a development version,
at least for the applications that use substantial work with point data
(Jaro says that he has to use 5.4 because he just can't get the work done in 5.7)
But I think that at least the developers should use 5.7 extensively so that it
can be released and provide the functionality that people expect.
If all of us currently involved keep working at this pace we can have a much
better version that could be released for Christmas - it is really mostly small things that are needed,
the biggest problem that really prevents me using it is converting larger site files.

5.7 is used more by normal users than by developers and they want stable version in their distributions. If 5.7 is slow with large data sets it does not mean that it cannot become stable version. It is ok with me to write in announcement that 6.0 is not suitable for making dems from datasets > ?00000 points.

Users working with large datases did not seem to be interested in 5.7 development and testing.

I have cca 35 items on TODO > 6.0 and most of them have higher priority (for me and most users, I believe) than optimisation for large point data sets.

Can you post here times consumed by?:
s.in.ascii
v.in.ascii -t
v.in.ascii / dbf
v.in.ascii / postgres
insert attributes using insert statements + db.execute (in one step)
/ dbf
/ postgres
v.in.sites / dbf
v.in.sites / postgres
v.surf.rst layer=0 (i.e. z coor)
v.surf.rst / dbf
v.surf.rst / postgres
s.surf.rst (5.4)

so that we can better identify a problem?

How many values (columns) do you need for LIDAR interpolation, zcol only or scol too?

Radim

Radim_Blazek · December 3, 2004, 11:49am

Jaro Hofierka wrote:

Markus Neteler wrote:

so I created a small elevation site file - it imported nicely, v.info and v.univar
worked OK but v.surf.rst interpolated categories instead of fp values
by default (I guess that is what I need to fix).
and there are lot of little things, such as various messages e.g.
Reading lines from vector map ... 100%
Reading nodes from vector map ... 100%
that may be confusing for users

Helena,

This combination works for me:

v.in.sites input=u_sm output=u2
v.surf.rst input=u2 elev=u2 tension=20 zcol=flt1 scol=flt2

i.e. it looks that you did not set flt1 and instead of it you interpolated a category field which is a default option.

The real problem is that the devi option doesn't produce a vector file with atributes.

The reason is that sites (map) has no structure defined (only individual sites) so it is impossible to create table when sites (->vector) is opened. It could be possible to create table when first site is written and I'll consider that for 6.0.

I don't remember if I fixed this in updated v.surf.rst[.cv] but it should work for a CV file). I found there just old code for site format.
I am looking now at recent grass5.7 code and I am seeing 2 "site" commands (s.in.ascii and s.out.ascii). Is it a change in anticipated transition from old site format to vectors?

site commands are not compiled and will not appear in stable.
s.in.ascii and s.out.ascii can only be used for sites-vector
library testing.

Radim

Jaro_Hofierka2 · December 3, 2004, 12:06pm

Radim Blazek wrote:

5.7 is used more by normal users than by developers and they want stable version in their distributions. If 5.7 is slow with large data sets it does not mean that it cannot become stable version. It is ok with me to write in announcement that 6.0 is not suitable for making dems from datasets > ?00000 points.

I don't think hundreds of thousand points are large datasets. They are quite common if you do real projects. I routinely work with millions of points. This must be fixed otherwise many people (including me) will have to stick to 5.4 for a longer time than previously thought.

Jaro

-=x=-
Skontrolované antivírovým programom NOD32

Helena_Mitasova · December 3, 2004, 1:02pm

Radim,

I did some testing for v.in.sites, importing a file x|y|%z with gradually increasing
number of points. 5000, 8000 point files are no problem, 200000-300000 points (11MB)
reads the sites in seconds, writes the number of points and then does something
for 5 minutes (this is still OK), but with 627000 point file it freezes the machine
and I believe that it is swapping, I could not even kill it and I cannot even get
through the "Transfering sites to vector file" to get the number of points.
If it is swapping, it will depend on the size of your memory when this happens.
I will post some larger point files on the web if you don't have any.

And Jaro is right - this is not some special advanced developers issue -
site files with over million of data points are now a common place and you can
get plenty of them for free on the Web. We cannot have a GRASS release which freezes
your machine with a point file that has the size which is now common. (You may remember
recent emails users talking about tens even hundred million of points).
Below, I have inserted your recent exchange with Jaro - I am wondering whether
there may be something related going on in v.in.sites as we had with reading
point vector data in v.surf.rst.

Helena

>>>>> >>>Also I had a couple of problems in grass57. Currently I am not sure if
>>>>> >>>these problems are associated with my version of grass57 or it is more
>>>>> >>>general. For example, I couldn't read input vector file with ~200K
>>>>> >>>points,
>>
>>> >
>>> > You cannot read existing vector? Where? In v.surf.rst?
>>> > What does it mean 'couldn't read'? Does it crash or is it slow?
>
>>
>> It stops without any progress. Smaller files were OK.

It was slow because IL_vector_input_data_2d() was using
db_select_value() for each line/node, which is especially slow
if there is no index for category column in the table.

I have replaced db_select_value() by db_select_CatValArray() +
db_CatValArray_get_value_int/double(), which reduced the time
for vector loading (v.surf.rst until the end of IL_vector_input_data_2d() )
for 200000 points to 42s on 2x1.5GHz. Progress is printed for lines and nodes,
but for now, no way to print progress for (slower) db_select_CatValArray().

Radim

Blazek wrote:

Jaro Hofierka wrote:

Markus Neteler wrote:

so I created a small elevation site file - it imported nicely, v.info and v.univar
worked OK but v.surf.rst interpolated categories instead of fp values
by default (I guess that is what I need to fix).
and there are lot of little things, such as various messages e.g.
Reading lines from vector map ... 100%
Reading nodes from vector map ... 100%
that may be confusing for users

Helena,

This combination works for me:

v.in.sites input=u_sm output=u2
v.surf.rst input=u2 elev=u2 tension=20 zcol=flt1 scol=flt2

i.e. it looks that you did not set flt1 and instead of it you interpolated a category field which is a default option.

The real problem is that the devi option doesn't produce a vector file with atributes.

The reason is that sites (map) has no structure defined (only individual sites) so it is impossible to create table when sites (->vector) is opened. It could be possible to create table when first site is written and I'll consider that for 6.0.

I don't remember if I fixed this in updated v.surf.rst[.cv] but it should work for a CV file). I found there just old code for site format.
I am looking now at recent grass5.7 code and I am seeing 2 "site" commands (s.in.ascii and s.out.ascii). Is it a change in anticipated transition from old site format to vectors?

site commands are not compiled and will not appear in stable.
s.in.ascii and s.out.ascii can only be used for sites-vector
library testing.

Radim

_______________________________________________
grass5 mailing list
grass5@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass5

Radim_Blazek · December 3, 2004, 1:13pm

Jaro Hofierka wrote:

5.7 is used more by normal users than by developers and they want stable version in their distributions. If 5.7 is slow with large data sets it does not mean that it cannot become stable version. It is ok with me to write in announcement that 6.0 is not suitable for making dems from datasets > ?00000 points.

I don't think hundreds of thousand points are large datasets. They are quite common if you do real projects. I routinely work with millions of points. This must be fixed otherwise many people (including me) will have to stick to 5.4 for a longer time than previously thought.

OK, they can stick with 5.4.

Radim

Radim_Blazek · December 3, 2004, 1:16pm

Helena wrote:

Radim,

I did some testing for v.in.sites, importing a file x|y|%z with gradually increasing
number of points. 5000, 8000 point files are no problem, 200000-300000 points (11MB)
reads the sites in seconds, writes the number of points and then does something
for 5 minutes (this is still OK), but with 627000 point file it freezes the machine
and I believe that it is swapping, I could not even kill it and I cannot even get
through the "Transfering sites to vector file" to get the number of points.
If it is swapping, it will depend on the size of your memory when this happens.
I will post some larger point files on the web if you don't have any.

And Jaro is right - this is not some special advanced developers issue -
site files with over million of data points are now a common place and you can
get plenty of them for free on the Web. We cannot have a GRASS release which freezes
your machine with a point file that has the size which is now common. (You may remember
recent emails users talking about tens even hundred million of points).
Below, I have inserted your recent exchange with Jaro - I am wondering whether
there may be something related going on in v.in.sites as we had with reading
point vector data in v.surf.rst.

Helena

Any help is welcome.

Radim

Radim_Blazek · December 3, 2004, 4:38pm

Revised TODO < 6.0:

- include -> include/grass (volunteer? tools/install-header?)
- map overwriting - grass variable also for rasters, default no
                    (volunteer?)
- WIND - WIND3 sync
- v/r.random add cats
- all vector modules (help me!):
     - check option and flag names (native speakers welcome)
     - check if table is copied
     - check multicats behaviour, especially more cats in the same layer
- options:
     - short description/prompt (for GUI)?
     - description for each item in options list of option
        (for manual and GUI)

Radim

cmbarton · December 3, 2004, 4:57pm

I know this started out as a s.in.ascii discussion. But it is focused on
GRASS 5.7 and v.in.ascii is the more relevant module to be considering here.

I've imported hundreds of thousands of points using v.in.ascii. I don't have
a time, but I am remembering about 15-30 minutes for about a half-million
points. I'm using a Mac G5 single processor at 1.8 Mhz and 512 Mb RAM. This
is a nice, but not extremely high powered system. While I'd like it to go
faster, I don't think this is too bad.

v.in.ascii can also import sites files. Just use a text editor to get rid of
the # and % characters. I haven't used v.in.sites with enough real data to
get a feel for whether or not it is faster. However, v.in.ascii has more
options for controlling how the sites are to be translated to vector points.

Michael

On 12/3/04 5:06 AM, "Jaro Hofierka" <hofierka@geomodel.sk> wrote:

Radim Blazek wrote:

5.7 is used more by normal users than by developers and they want stable
version in their distributions. If 5.7 is slow with large data sets it
does not mean that it cannot become stable version. It is ok with me to
write in announcement that 6.0 is not suitable for making dems from
datasets > ?00000 points.

I don't think hundreds of thousand points are large datasets. They are
quite common if you do real projects. I routinely work with millions of
points. This must be fixed otherwise many people (including me) will
have to stick to 5.4 for a longer time than previously thought.

Jaro

-=x=-
Skontrolovan� antiv�rov�m programom NOD32

______________________________
Michael Barton, Professor of Anthropology
School of Human, Evolution and Social Change
Arizona State University
Tempe, AZ 85287-2402
USA

voice: 480-965-6262; fax: 480-965-7671
www: http://www.public.asu.edu/~cmbarton

Helena_Mitasova · December 3, 2004, 7:17pm

Michael Barton wrote:

I know this started out as a s.in.ascii discussion. But it is focused on
GRASS 5.7 and v.in.ascii is the more relevant module to be considering here.

s.in.ascii never had this problem, it is the table creation that is causing it (see below).

I've imported hundreds of thousands of points using v.in.ascii. I don't have
a time, but I am remembering about 15-30 minutes for about a half-million
points. I'm using a Mac G5 single processor at 1.8 Mhz and 512 Mb RAM. This
is a nice, but not extremely high powered system. While I'd like it to go
faster, I don't think this is too bad. >
v.in.ascii can also import sites files. Just use a text editor to get rid of
the # and % characters. I haven't used v.in.sites with enough real data to
get a feel for whether or not it is faster. However, v.in.ascii has more
options for controlling how the sites are to be translated to vector points.

Michael, thanks for the hint - I tried v.in.ascii on the same data set -
with -t option it imported in seconds, without -t it went into swapping and
I had to kill it. So apparently it is the table creation that eats-up the memory.
Radim, can the table creation be done on smaller subsets of the data to avoid the
swapping ? Or maybe there is a simpler solution?

Helena

Michael

On 12/3/04 5:06 AM, "Jaro Hofierka" <hofierka@geomodel.sk> wrote:

Radim Blazek wrote:

5.7 is used more by normal users than by developers and they want stable
version in their distributions. If 5.7 is slow with large data sets it
does not mean that it cannot become stable version. It is ok with me to
write in announcement that 6.0 is not suitable for making dems from
datasets > ?00000 points.

I don't think hundreds of thousand points are large datasets. They are
quite common if you do real projects. I routinely work with millions of
points. This must be fixed otherwise many people (including me) will
have to stick to 5.4 for a longer time than previously thought.

Jaro

-=x=-
Skontrolovan? antiv?rov?m programom NOD32

______________________________
Michael Barton, Professor of Anthropology
School of Human, Evolution and Social Change
Arizona State University
Tempe, AZ 85287-2402
USA

voice: 480-965-6262; fax: 480-965-7671
www: http://www.public.asu.edu/~cmbarton

cmbarton · December 3, 2004, 10:42pm

Helena,

Given that Jaro mentioned s.in.ascii bombing out with a dataset half the
size of mine and you mentioned v.in.ascii bombing out with table creation,
this makes me wonder about the dataset it is reading, or possibly a recent
change in GRASS.

My data has a total of 15 fields: cat, x,y, and 12 others. All are double
precision (no text or integer). One file has 133,200 points and the other
495,000 points. I had no trouble with either. I read these in with a
September 2004 version of GRASS 5.7 using v.in.ascii.

Maybe this information will help debug the problem.

Michael

On 12/3/04 12:17 PM, "Helena" <hmitaso@unity.ncsu.edu> wrote:

Michael Barton wrote:

I know this started out as a s.in.ascii discussion. But it is focused on
GRASS 5.7 and v.in.ascii is the more relevant module to be considering here.

s.in.ascii never had this problem, it is the table creation that is causing it
(see below).

I've imported hundreds of thousands of points using v.in.ascii. I don't have
a time, but I am remembering about 15-30 minutes for about a half-million
points. I'm using a Mac G5 single processor at 1.8 Mhz and 512 Mb RAM. This
is a nice, but not extremely high powered system. While I'd like it to go
faster, I don't think this is too bad. >
v.in.ascii can also import sites files. Just use a text editor to get rid of
the # and % characters. I haven't used v.in.sites with enough real data to
get a feel for whether or not it is faster. However, v.in.ascii has more
options for controlling how the sites are to be translated to vector points.

Michael, thanks for the hint - I tried v.in.ascii on the same data set -
with -t option it imported in seconds, without -t it went into swapping and
I had to kill it. So apparently it is the table creation that eats-up the
memory.
Radim, can the table creation be done on smaller subsets of the data to avoid
the
swapping ? Or maybe there is a simpler solution?

Helena

Michael

On 12/3/04 5:06 AM, "Jaro Hofierka" <hofierka@geomodel.sk> wrote:

Radim Blazek wrote:

5.7 is used more by normal users than by developers and they want stable
version in their distributions. If 5.7 is slow with large data sets it
does not mean that it cannot become stable version. It is ok with me to
write in announcement that 6.0 is not suitable for making dems from
datasets > ?00000 points.

I don't think hundreds of thousand points are large datasets. They are
quite common if you do real projects. I routinely work with millions of
points. This must be fixed otherwise many people (including me) will
have to stick to 5.4 for a longer time than previously thought.

Jaro

-=x=-
Skontrolovan? antiv?rov?m programom NOD32

______________________________
Michael Barton, Professor of Anthropology
School of Human, Evolution and Social Change
Arizona State University
Tempe, AZ 85287-2402
USA

voice: 480-965-6262; fax: 480-965-7671
www: http://www.public.asu.edu/~cmbarton

______________________________
Michael Barton, Professor of Anthropology
School of Human, Evolution and Social Change
Arizona State University
Tempe, AZ 85287-2402
USA

voice: 480-965-6262; fax: 480-965-7671
www: http://www.public.asu.edu/~cmbarton

Helena_Mitasova · December 4, 2004, 6:03am

Michael Barton wrote:

Helena,

Given that Jaro mentioned s.in.ascii bombing out with a dataset half the
size of mine

I don't think that Jaro had problem with s.in.ascii.

and you mentioned v.in.ascii bombing out with table creation,

this makes me wonder about the dataset it is reading, or possibly a recent
change in GRASS.

we are doing it completely independently and I have different data than he does.
But my data sets were from dos, so I wanted to make sure that is not the problem
and ran v.in.sites on a lidar site file with 1.5 mil points (55MB) that was created by
GRASS5.3 - I should not have done it because it took me one hour just to get it killed.
Radim, if you can prevent it from swapping
and rather give an error message, that would help me a lot for now.

I put the GRASS-generated site file that I tried with v.in.ascii here
http://skagit.meas.ncsu.edu/~helena/grasswork/grassprobl/lidar99.gz
(you can use it with x,y location, or get the ready to use location here:
http://mpa.itc.it/grasstutor/data_menu2nd.phtml
Jockey's Ridge LIDAR data

If you have some time and could try it on your Mac and let me know how it goes,
that would be great.
(Lubos just bought Mac laptop as several of his physics colleagues
did and he loves it and tries to talk me into getting one too, so if
you have any advice - off grass5 list, I will appreaciate it).

My data has a total of 15 fields: cat, x,y, and 12 others. All are double
precision (no text or integer). One file has 133,200 points and the other
495,000 points. I had no trouble with either. I read these in with a
September 2004 version of GRASS 5.7 using v.in.ascii.

I used 5.7 downloaded just few days ago - I wanted to try the new scripts
v.in.sites.all and v.convert.all to get all my data into 5.7 at once.
I am not sure what has changed between sept. and now.

Helena

Maybe this information will help debug the problem.

Michael

On 12/3/04 12:17 PM, "Helena" <hmitaso@unity.ncsu.edu> wrote:

Michael Barton wrote:

I know this started out as a s.in.ascii discussion. But it is focused on
GRASS 5.7 and v.in.ascii is the more relevant module to be considering here.

s.in.ascii never had this problem, it is the table creation that is causing it
(see below).

I've imported hundreds of thousands of points using v.in.ascii. I don't have
a time, but I am remembering about 15-30 minutes for about a half-million
points. I'm using a Mac G5 single processor at 1.8 Mhz and 512 Mb RAM. This
is a nice, but not extremely high powered system. While I'd like it to go
faster, I don't think this is too bad. >
v.in.ascii can also import sites files. Just use a text editor to get rid of
the # and % characters. I haven't used v.in.sites with enough real data to
get a feel for whether or not it is faster. However, v.in.ascii has more
options for controlling how the sites are to be translated to vector points.

Michael, thanks for the hint - I tried v.in.ascii on the same data set -
with -t option it imported in seconds, without -t it went into swapping and
I had to kill it. So apparently it is the table creation that eats-up the
memory.
Radim, can the table creation be done on smaller subsets of the data to avoid
the
swapping ? Or maybe there is a simpler solution?

Helena

Michael

On 12/3/04 5:06 AM, "Jaro Hofierka" <hofierka@geomodel.sk> wrote:

Radim Blazek wrote:

5.7 is used more by normal users than by developers and they want stable
version in their distributions. If 5.7 is slow with large data sets it
does not mean that it cannot become stable version. It is ok with me to
write in announcement that 6.0 is not suitable for making dems from
datasets > ?00000 points.

I don't think hundreds of thousand points are large datasets. They are
quite common if you do real projects. I routinely work with millions of
points. This must be fixed otherwise many people (including me) will
have to stick to 5.4 for a longer time than previously thought.

Jaro

-=x=-
Skontrolovan? antiv?rov?m programom NOD32

______________________________
Michael Barton, Professor of Anthropology
School of Human, Evolution and Social Change
Arizona State University
Tempe, AZ 85287-2402
USA

voice: 480-965-6262; fax: 480-965-7671
www: http://www.public.asu.edu/~cmbarton

______________________________
Michael Barton, Professor of Anthropology
School of Human, Evolution and Social Change
Arizona State University
Tempe, AZ 85287-2402
USA

voice: 480-965-6262; fax: 480-965-7671
www: http://www.public.asu.edu/~cmbarton

neteler · December 5, 2004, 2:56pm

Helena,

I just made a test with lidaratm2.txt.gz from
http://mpa.itc.it/grasstutor/data_menu2nd.phtml

wc -l lidaratm2.txt
1158424 lidaratm2.txt

(-> 1.1 Mio points)

cat lidaratm2.txt | time v.in.ascii -z out=mypoints zcol=3 catcol=0 fs=',' columns='x double, y double, z double'

97.05 user
7.50 system
1:52.08 elapsed
...
0 swaps <--- 750MB RAM

cat /proc/cpuinfo |grep MHz
cpu MHz : 2533.639

It's acceptable (since you have to do it one time only).

Did you try
s.out.ascii file | v.in.ascii ... ?

Markus

neteler · December 6, 2004, 4:07pm

Helena,
(cc grass5)

On Sun, Dec 05, 2004 at 04:07:22PM -0500, Helena wrote:

Markus,

as I said already all we need for now is to prevent the code to go into
swapping and
freeze the users machines when their file is too big for their memory (this
seems to
be needed for both v.in.sites and v.in.ascii run without flags).

I don't think that we can prevent GRASS from swapping as it is
done by the operating system (however, better to discuss this in 'grass5' with
the other experts).

I tried v.in.sites on
http://mpa.itc.it/grasstutor/data_menu2nd.phtml
nclidar-utm.tar.gz: Ready to use GRASS LOCATION in UTM (39.5MB) for North Carolina

and it swaps too much for me as well (1GB RAM). There seems to be a memory
leak either in v.in.sites or the underlying DBMI engine.

Radim is out of town currently, and I have zero experience to track
down memory leaks.
Maybe someone skilled to replicate n times a local sites file and
use v.in.sites? No need to download the large nclidar-utm.tar.gz for that.

If there is
enough memory, it runs just fine. The problem is that I cannot import the
files
created in 5.3 into 5.7 on the same machine - I am not trying bigger files,
I am just asking
for 5.7 being able to read the same size site file as 5.3 without buying a
new computer
or installing more memory and if that is not possible, have the program say
that.

I clearly understand what you wrote.
But if v.in.sites fails, maybe better v.in.ascii?

If this approach works for you, we could even retire v.in.sites and make
above code a script with identical name.

We will look at the vector code with Jaro and maybe
others that I work with to see what we can do to get around this issue both
for importing the site file (skipping the table? do I need it for sites in a
way that
it is done right now?) and reading the points in v.surf.rst. But this has
been an unexpected issue and it is delaying the updates to v.surf.rst

As for trying s.out.ascii | v.in.ascii combination - yes I tried it, as you
might have noticed
in my previous emails - it reads it without problems if I use -t flag.

It should also work without -t flag.

I will try -z, this may help because there will be no attributes in case I
don't have
variable smoothing parameter. But I really don't want to run any of the
v.in.* programs
until there is an error message and the program ends in some decent way
when it runs out of memory and starts swapping.

I assume that there is a (small, but accumulated with large data sets)
memory leak somewhere in the vector engine/DBMI. But as not being real
programmer I don't have the slightest idea how to track this down.

Markus

Roger_Bivand · December 6, 2004, 5:58pm

On Mon, 6 Dec 2004, Markus Neteler wrote:

Helena,
(cc grass5)

On Sun, Dec 05, 2004 at 04:07:22PM -0500, Helena wrote:
> Markus,
>
> as I said already all we need for now is to prevent the code to go into
> swapping and
> freeze the users machines when their file is too big for their memory (this
> seems to
> be needed for both v.in.sites and v.in.ascii run without flags).

I don't think that we can prevent GRASS from swapping as it is
done by the operating system (however, better to discuss this in 'grass5' with
the other experts).

I tried v.in.sites on
http://mpa.itc.it/grasstutor/data_menu2nd.phtml
nclidar-utm.tar.gz: Ready to use GRASS LOCATION in UTM (39.5MB) for North Carolina

and it swaps too much for me as well (1GB RAM). There seems to be a memory
leak either in v.in.sites or the underlying DBMI engine.

Radim is out of town currently, and I have zero experience to track
down memory leaks.

Recently Professor Brian Ripley applied valgrind to contributed packages
in the R project with positive results, valgrind is not invasive, and may
point to memory leaks: http://valgrind.kde.org/, a copy of his posting

https://stat.ethz.ch/pipermail/r-devel/2004-November/031264.html

"Valgrind is easy to use. Valgrind uses dynamic binary translation, so you
don't need to modify, recompile or relink your applications. Just prefix
your command line with valgrind and everything works."

It sounds a bit as though trying this here might bring rewards?

Roger

Maybe someone skilled to replicate n times a local sites file and
use v.in.sites? No need to download the large nclidar-utm.tar.gz for that.

> If there is
> enough memory, it runs just fine. The problem is that I cannot import the
> files
> created in 5.3 into 5.7 on the same machine - I am not trying bigger files,
> I am just asking
> for 5.7 being able to read the same size site file as 5.3 without buying a
> new computer
> or installing more memory and if that is not possible, have the program say
> that.

I clearly understand what you wrote.
But if v.in.sites fails, maybe better v.in.ascii?

cat `g.gisenv GISDBASE`/`g.gisenv LOCATION_NAME`/`g.gisenv MAPSET`/site_lists/lidar99 |\
     grep -v '^#' | sed 's+#++g' | sed 's+ %+|+g' | sed 's+ $++g' |\
     v.in.ascii -z out=lidar99 xcol=1 ycol=2 catcol=3 zcol=4 \
                columns='cat int, x double, y double, height double' fs='|'

If this approach works for you, we could even retire v.in.sites and make
above code a script with identical name.

> We will look at the vector code with Jaro and maybe
> others that I work with to see what we can do to get around this issue both
> for importing the site file (skipping the table? do I need it for sites in a
> way that
> it is done right now?) and reading the points in v.surf.rst. But this has
> been an unexpected issue and it is delaying the updates to v.surf.rst
>
> As for trying s.out.ascii | v.in.ascii combination - yes I tried it, as you
> might have noticed
> in my previous emails - it reads it without problems if I use -t flag.

It should also work without -t flag.

> I will try -z, this may help because there will be no attributes in case I
> don't have
> variable smoothing parameter. But I really don't want to run any of the
> v.in.* programs
> until there is an error message and the program ends in some decent way
> when it runs out of memory and starts swapping.

I assume that there is a (small, but accumulated with large data sets)
memory leak somewhere in the vector engine/DBMI. But as not being real
programmer I don't have the slightest idea how to track this down.

Markus

_______________________________________________
grass5 mailing list
grass5@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass5

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand@nhh.no

neteler · December 6, 2004, 10:10pm

On Mon, Dec 06, 2004 at 06:58:12PM +0100, Roger Bivand wrote:

On Mon, 6 Dec 2004, Markus Neteler wrote:

> Helena,
> (cc grass5)
>
> On Sun, Dec 05, 2004 at 04:07:22PM -0500, Helena wrote:
> > Markus,
> >
> > as I said already all we need for now is to prevent the code to go into
> > swapping and
> > freeze the users machines when their file is too big for their memory (this
> > seems to
> > be needed for both v.in.sites and v.in.ascii run without flags).
>
> I don't think that we can prevent GRASS from swapping as it is
> done by the operating system (however, better to discuss this in 'grass5' with
> the other experts).
>
> I tried v.in.sites on
> http://mpa.itc.it/grasstutor/data_menu2nd.phtml
> nclidar-utm.tar.gz: Ready to use GRASS LOCATION in UTM (39.5MB) for North Carolina
>
> and it swaps too much for me as well (1GB RAM). There seems to be a memory
> leak either in v.in.sites or the underlying DBMI engine.
>
> Radim is out of town currently, and I have zero experience to track
> down memory leaks.

Recently Professor Brian Ripley applied valgrind to contributed packages
in the R project with positive results, valgrind is not invasive, and may
point to memory leaks: http://valgrind.kde.org/, a copy of his posting

[Rd] Ways to catch segfaults before they happen

"Valgrind is easy to use. Valgrind uses dynamic binary translation, so you
don't need to modify, recompile or relink your applications. Just prefix
your command line with valgrind and everything works."

It sounds a bit as though trying this here might bring rewards?

Roger,

thanks for your suggestion! 'urpmi valgrid' brought it to me (Mdk10.0, V2.1.0) and
I could run it easily (naturally pretty slow):

cat ~/grass57/lidaratm3.txt | valgrind --tool=memcheck --leak-check=yes v.in.ascii \
-z out=mypoints zcol=3 catcol=0 fs=',' columns='x double, y double, z double'
[...]
==16082== 12 bytes in 1 blocks are definitely lost in loss record 1 of 16
==16082== at 0x4002ADAB: malloc (vg_replace_malloc.c:160)
==16082== by 0x4023F599: Vect_new_list (list.c:29)
==16082== by 0x4023477D: Vect_build_nat (build_nat.c:454)
==16082== by 0x40233016: Vect_build_partial (build.c:138)
==16082==
==16082==
==16082== 32 bytes in 2 blocks are definitely lost in loss record 3 of 16
==16082== at 0x4002ADAB: malloc (vg_replace_malloc.c:160)
==16082== by 0x40235979: Vect__new_cats_struct (cats.c:50)
==16082== by 0x40235935: Vect_new_cats_struct (cats.c:39)
==16082== by 0x804B95E: points_to_bin (points.c:186)
==16082==
==16082==
==16082== 60 bytes in 3 blocks are definitely lost in loss record 4 of 16
==16082== at 0x4002ADAB: malloc (vg_replace_malloc.c:160)
==16082== by 0x4023E319: Vect__new_line_struct (line.c:50)
==16082== by 0x4023E2D5: Vect_new_line_struct (line.c:39)
==16082== by 0x804B953: points_to_bin (points.c:185)
==16082==
==16082==
==16082== 1214 bytes in 22 blocks are definitely lost in loss record 6 of 16
==16082== at 0x4002ADAB: malloc (vg_replace_malloc.c:160)
==16082== by 0x40694840: G_malloc (alloc.c:23)
==16082== by 0x406AE3B4: G__location_path (location.c:83)
==16082== by 0x406AE2DC: G_location_path (location.c:43)
==16082==
==16082==
==16082== 1904 bytes in 4 blocks are definitely lost in loss record 8 of 16
==16082== at 0x4002ADAB: malloc (vg_replace_malloc.c:160)
==16082== by 0x4027CCAD: RTreeNewNode (node.c:47)
==16082== by 0x4027C3E5: RTreeNewIndex (index.c:27)
==16082== by 0x4025F8F8: dig_spidx_init (spindex.c:38)
...

The output looks useful (I tried with a small file, will re-run with a large
one to catch the real problem).

Thanks for the hint, hopefully others try as well.

Markus