[GRASSLIST:6666] Overriding awk locales

Jose_Gomez-Dans · April 28, 2005, 4:02pm

Hi!
This is not a GRASS-specific question, but it crops up in GRASS use,
and I don't know how to solve it. I want to use v.in.ascii to load
some points. These points are in a file, and I have used awk '{printf
"%16.6f|%16.6f|%16.6f\n", $1,$2,$3}' to generate my files. However,
due to the locale that I have, the numbers have a comma "," as a
separator for decimals, and GRASS expects a dot ".". (sorry for the
artistic punctuation :D).

I can do several things: change the locale to C, change the locale for
awk only (or make awk override any locale values it finds), or GRASS
should be able to work out that I'm using another locale with a
different number representation. Up to now, I use sed to change the
commas into dots, and seems to be working, but I thought I'd mention
it, in case the locale issue is in fact a GRASS bug.

Jose

Camilo_Alcantara · April 28, 2005, 2:39pm

On Thu, 28 Apr 2005, Jose Gomez-Dans wrote:

Hi!
This is not a GRASS-specific question, but it crops up in GRASS use,
and I don't know how to solve it. I want to use v.in.ascii to load
some points. These points are in a file, and I have used awk '{printf
"%16.6f|%16.6f|%16.6f\n", $1,$2,$3}' to generate my files. However,
due to the locale that I have, the numbers have a comma "," as a
separator for decimals, and GRASS expects a dot ".". (sorry for the
artistic punctuation :D).

just try:

awk '{printf"%16.6f|%16.6f|%16.6f\n", $1,$2,$3}' | sed 's/\,/\./g'

I can do several things: change the locale to C, change the locale for
awk only (or make awk override any locale values it finds), or GRASS
should be able to work out that I'm using another locale with a
different number representation. Up to now, I use sed to change the
commas into dots, and seems to be working, but I thought I'd mention
it, in case the locale issue is in fact a GRASS bug.

Jose

--
Pedro Camilo Alcántara Concepción
-----------------
camilo @ pcbiol.posgrado.unam.mx
camiloalcantara @ hotmail.com
-----------------
pcbiol.posgrado.unam.mx Thursday Apr 28 2005 08:05:00 CST
-----------------
En el fondo, los científicos somos gente con suerte:
podemos jugar a lo que queramos durante toda la vida.
-- Lee Smolin. Físico teórico y cosmólogo estadounidense.

Jose_Gomez-Dans · April 28, 2005, 4:31pm

On 4/28/05, Pedro Camilo Alcántara Concepción
<camilo@pcbiol.posgrado.unam.mx> wrote:

> some points. These points are in a file, and I have used awk '{printf
> "%16.6f|%16.6f|%16.6f\n", $1,$2,$3}' to generate my files. However,
just try:

awk '{printf"%16.6f|%16.6f|%16.6f\n", $1,$2,$3}' | sed 's/\,/\./g'

As I mentioned on my previous e-mail, that is exactly what I did.
Which works. BUT, on my machine, v.in.ascii scans the whole file first
(8.3M records!) and then starts importing it. The memory v.in.ascii
uses starts to mount up, until around 3.7M records have been read in.
At around this point, the kernel kills v.in.ascii due to excessive
memory usage (I use Linux).

The v.in.ascii line is:
v.in.ascii -zt output=<lala> fs="|" x=1 y=2 z=3 cols="x double
precision, y double precision, z double precision"

I know that if you use a dbf table, the memory usage grows (see
<http://www.intevation.de/rt/webrt?serial_num=2903&display=History>\).
I use 6.0.0-1 (DebianGIS packages). Anyone got any ideas on what to do
or how I can provide extra debugging information for people to have a
look at?
Thanks!
José

H_B · April 29, 2005, 2:17am

> > some points. These points are in a file, and I have used awk
> > '{printf "%16.6f|%16.6f|%16.6f\n", $1,$2,$3}' to generate my
> > files. However,
> just try:
>
> awk '{printf"%16.6f|%16.6f|%16.6f\n", $1,$2,$3}' | sed 's/\,/\./g'

v.in.garmin has the same problem. using sed or tr to swap ,. doesn't
work well if you have multiple columns, some with real commas.

others sorted this out, run
export LC_ALL=C
first to make awk print "." for decimal places.

As I mentioned on my previous e-mail, that is exactly what I did.
Which works. BUT, on my machine, v.in.ascii scans the whole file first
(8.3M records!) and then starts importing it. The memory v.in.ascii
uses starts to mount up, until around 3.7M records have been read in.
At around this point, the kernel kills v.in.ascii due to excessive
memory usage (I use Linux).

The v.in.ascii line is:
v.in.ascii -zt output=<lala> fs="|" x=1 y=2 z=3 cols="x double
precision, y double precision, z double precision"

I know that if you use a dbf table, the memory usage grows (see
<http://www.intevation.de/rt/webrt?serial_num=2903&display=History>\).
I use 6.0.0-1 (DebianGIS packages). Anyone got any ideas on what to do
or how I can provide extra debugging information for people to have a
look at?

forgot to close that bug, now done.

I think some minor memory leaks remain; 1.15 million uses about 430mb or
so RAM. May have to import in chunks.

[time passes]

Regarding my previous message, and while trawling through the
developers list, I have found that the memory consumption issue in
GRASS 6.0 seems to have been fixed (see
http://grass.itc.it/pipermail/grass5/2005-March/017834.html).

Yes, already fixed in the CVS version.

How can I incorporate the changes into 6.0.0?

You'll need to recompile the 6.1-CVS snapshot from source. I can provide
instructions for doing this on Debian if you require.

Hamish

Jose_Gomez-Dans · April 29, 2005, 12:31pm

On 4/29/05, Hamish <hamish_nospam@yahoo.com> wrote:

Please report results of importing 8.3m points with the fixed v.in.ascii
+ DBF driver to the list. It hasn't been tested for something this big
yet that I know of.

OK, I tried the CVS version
(grass6.1.cvs-i686-pc-linux-gnu-23_04_2005), and importing the data
works up to around 3580000 points. At that point, memory usage is very
high, the mouse cursos is not responsive, and the system freezes with
significant disk activity (swapping). Eventually, the process is
killed by the kernel. I run this version on an AMD Sempron processor
with LInux kernel 2.6.10. The command line I entered was
v.in.ascii -zt input=<input> output=<output> x=1 y=2 z=3

A sample input line:
-2714620.000000| 1477320.000000| 13.290000

Note that I did not test the dbf driver (yet). i will do so shortly
and report back on that.

Many thanks
J

Jose_Gomez-Dans · April 29, 2005, 12:53pm

On 4/29/05, Jose Gomez-Dans <jgomezdans@gmail.com> wrote:

OK, I tried the CVS version
(grass6.1.cvs-i686-pc-linux-gnu-23_04_2005), and importing the data
works up to around 3580000 points. At that point, memory usage is very
high, the mouse cursos is not responsive, and the system freezes with
significant disk activity (swapping). Eventually, the process is
killed by the kernel. I run this version on an AMD Sempron processor
with LInux kernel 2.6.10. The command line I entered was
v.in.ascii -zt input=<input> output=<output> x=1 y=2 z=3

A sample input line:
-2714620.000000| 1477320.000000| 13.290000

Note that I did not test the dbf driver (yet). i will do so shortly
and report back on that.

Taking away the t option (i.e., using the dbf driver) results in the
process being killed after around 3.6M samples have been read in, so
not much change here.
Jose