[GRASS-user] Large File Support (LFS)

John wrote:

g.region -p

...

rows: 14000
cols: 11000
cells: 154000000

for type = float this will use 154000000 * 4 / 1024000 ~= 600mb
RAM for method = min. (floats are 4 bytes each) It will be
twice that for mean. so memory and the percent= option are not
an issue here. (memory use is dependent on the region rows x
columns, not the number of input points)

2.1Gb file
r.in.xyz -s -g input=/home/bob2/rawdata/clean/test4.asc
output=test4 method=min type=FCELL fs=, x=6 y=7 z=8 zscale=1.0
percent=100
Unable to open input file </home/bob2/rawdata/clean/test4.asc>

...

So anything above 2Gb fails, even using percent=25, which for > the 2.1Gb file should have had an effect if memory was an
issue.

as Glynn explained, I was wrong- glibc's fopen() does care if it
is not used on a fully 64bit software stack.

if you have a 32bit machine, the solution/workaround for this is
to pipe from stdin instead. try this:

cat /home/bob2/rawdata/clean/test4.asc | r.in.xyz input=- \
  output=test4 method=min fs=, x=6 y=7 z=8

or this:

r.in.xyz input=- output=test4 method=min fs=, x=6 y=7 z=8 < \
  /home/bob2/rawdata/clean/test4.asc

It does not really use the memory (or swap file) when using
'-s' flag.

right, the scan flag only has to remember a single set of
min/max values for x,y,z. that's just a few bytes. if it sees
a smaller/larger value it can forget the old one and just
remember the current leader.

Memory hangs around 45%, swap file=0% but the CPU hits 100%
varying between processors (watched in system monitor).
If '-s' flag is removed to create a raster then it uses 100%
of memory

of, then you should employ the percent= option until it does
not. (start with percent=50 to make it run in 2 passes)

and <2Gb files start to process fine (didn't continue
processing, to save time in testing), but >2Gb give the same
message: 'Unable to open input file...'

try sending the data via stdin rather than reading the file
directly, as in the examples above.

This 32bit machine only has 1Gb RAM (the 64bit machine has
2Gb RAM and never had a problem), but (as observed in system
monitor) I don't think memory is an issue for scanning the
data with the '-s' flag.

right, as scanning doesn't have to remember much at all.

I compiled and installed yesterday (20/03/10) following the
specific Ubuntu page, but added yes to largefile support (and
ignored the slight typo over folder locations, 'grass_current
/ grass_trunk'.)

you mean the wiki page?

... so changing to "input=-" should make it work on the 32bit
machine.

regards,
Hamish

Hamish wrote:

...
if you have a 32bit machine, the solution/workaround for this is
to pipe from stdin instead. try this:

cat /home/bob2/rawdata/clean/test4.asc | r.in.xyz input=- \
  output=test4 method=min fs=, x=6 y=7 z=8

or this:

r.in.xyz input=- output=test4 method=min fs=, x=6 y=7 z=8 < \
  /home/bob2/rawdata/clean/test4.asc

...

> and <2Gb files start to process fine (without -s flag)(didn't continue
> processing, to save time in testing), but >2Gb give the same
> message: 'Unable to open input file...'

try sending the data via stdin rather than reading the file
directly, as in the examples above.

Nice one, cheers. That solves the challenge.

> I compiled and installed yesterday (20/03/10) following the
> specific Ubuntu page, but added yes to largefile support (and
> ignored the slight typo over folder locations, 'grass_current
> / grass_trunk'.)

you mean the wiki page?

Yep: http://grass.osgeo.org/wiki/Compile_and_Install_Ubuntu#Dependencies

  "download latest source code from GRASS SVN repository in a directory on the
system (e.g. /usr/local/src)

  svn checkout https://svn.osgeo.org/grass/grass/branches/releasebranch_6_4 \
grass_current

  Above command places GRASS' source code in /usr/local/src/grass_trunk...."

For the text to remain consistent in the example (and for beginners of
compiling and using linux, e.g. me!)) this should fix any confusion:

svn checkout https://svn.osgeo.org/grass/grass/branches/releasebranch_6_4 \
/usr/local/src/grass_trunk

Thanks

John

John wrote:

I compiled and installed yesterday (20/03/10) following the
specific Ubuntu page, but added yes to largefile support (and
ignored the slight typo over folder locations, 'grass_current
/ grass_trunk'.)

wiki page now updated. ("trunk" is reserved for unrestrained main-line
development, currently that is grass 7.x. Bugfix-only release branches
are forked off from trunk every now and then when we want to stabalize
before a new release.)

thanks for pointing it out,
Hamish

Hamish wrote:

if you have a 32bit machine, the solution/workaround for this is
to pipe from stdin instead. try this:

cat /home/bob2/rawdata/clean/test4.asc | r.in.xyz input=- \
  output=test4 method=min fs=, x=6 y=7 z=8

Ok, this worked fine for r.in.xyz, thanks.

How about v.in.ascii (using sqlite)?
Is the solution to import multiple files with v.in.ascii and then v.patch?
Is this likely to work with LiDAR data where topology isn't built?

Just want to get an opinion before I spend some time on this again, or abandon
32bit compatibility for the scripts.

Cheers

John

Hamish:

> if you have a 32bit machine, the solution/workaround for this is
> to pipe from stdin instead.

...

Ok, this worked fine for r.in.xyz, thanks.

How about v.in.ascii (using sqlite)?

I assume v.in.ascii input= also tries to use glibc's fopen(), but yes
v.in.ascii can get its input from stdin as well, bypassing that problem.

I don't know if sqlite supports >2gb files on 32bit. Upgrading to the
latest versions of everything may help.

(remember that you only need a back end database if you want to store
more than simple x,y,z position data)

Currently GRASS's default is to maintain a single SQLite file per mapset.
This becomes problematic if you have many large datasets within the same
mapset and are trying to keep the $MAPSET/sqlite.db file smaller than 2gb.

Perhaps 1 DB file per map is as easy as pointing db.connect to a directory
instead of a single sqlite.db file? Or maybe it needs work in the driver
code. I'm not sure, you will have to experiment.

[I just tested, currently it only supports one file per connection,
pointing the sqlite driver to a dir results in an error. of course
you can run db.connect once per map to keep separate files, but that's
a bit more work (but easy enough in a script). once created maps remember
what their DB settings are; db.connect just sets the default which is
used by the map creation]

(or just switch over to the DBF driver, they seem to take about the same
amount of time to import a sample lidar set)

Is the solution to import multiple files with v.in.ascii and then
v.patch?

that's not needed. you could try something like:

cat file1.txt file2.txt file3.txt | \
  v.in.ascii out=all_points -zbt z=3 fs=,

(cat is really meant for concatenating many files, even if 90% of the
time it is just used to output a single file)

with -z, -b and -t flags you should be able to import many millions of
points, but I simply don't know how well GRASS's vector library supports
LFS. (if it fails please file a bug as it is a goal to support that)

to save on disk space I usually bzip2 (or gzip) big lidar text files then
use bzcat (or zcat) instead of plain cat to pipe them into the import
program.

Is this likely to work with LiDAR data where topology isn't
built?

a problem with both topology and databases is excessive memory use, as
each data point wants a small but finite amount of memory. when you get
bigger than approx. 3 million points RAM starts to be an issue.

So it is more likely to work well when topology is not built.
But without topology you are limited to what you can do with it of
course.

liblas is playing around with different spatial indexing schemes for
point data, we'll see what lessons they learn and can teach us. :slight_smile:

experiment and let us know what you find out!

Hamish

Hamish wrote:

I don't know if sqlite supports >2gb files on 32bit.

It does.

with -z, -b and -t flags you should be able to import many millions of
points, but I simply don't know how well GRASS's vector library supports
LFS. (if it fails please file a bug as it is a goal to support that)

File offsets were limited to 32-bit until relatively recently. I don't
know whether the LFS stuff made it into 6.x.

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements wrote:

Hamish wrote:

with -z, -b and -t flags you should be able to import many millions of
points, but I simply don’t know how well GRASS’s vector library supports
LFS. (if it fails please file a bug as it is a goal to support that)

GRASS 7 vector libs have full LFS, GRASS 6.x vector libs don’t have at all.

File offsets were limited to 32-bit until relatively recently. I don’t
know whether the LFS stuff made it into 6.x.

Not the vector LFS stuff, and I guess it won’t happen, too many changes required.

Markus M

Hi all,
I am trying to compile ps.output from the addons.
  I've tried both the "make MODULE_TOPDIR= etcc...." way which works for other addons as well as the g.extension way
(g.extension extension=ps.output svnurl=https://svn.osgeo.org/grass/grass-addons/ prefix=${GISBASE}).

I get this error in both ways, can someone help ??
---------------------------------------------------------------------------------------------------
Compiling ps.output...
make: *** No rule to make target `default', needed by
`first'. Stop.
ERROR: Compilation failed, sorry. Please check above error messages
---------------------------------------------------------------------------------------------------

thanks in advance!
Francesco

On Tue, Apr 20, 2010 at 3:15 PM, Francesco Mirabella <mirabell@unipg.it> wrote:

Hi all,
I am trying to compile ps.output from the addons.
I've tried both the "make MODULE_TOPDIR= etcc...." way which works for
other addons as well as the g.extension way
(g.extension extension=ps.output
svnurl=https://svn.osgeo.org/grass/grass-addons/ prefix=${GISBASE}).

I get this error in both ways, can someone help ??
---------------------------------------------------------------------------------------------------
Compiling ps.output...
make: *** No rule to make target `default', needed by
`first'. Stop.
ERROR: Compilation failed, sorry. Please check above error messages
---------------------------------------------------------------------------------------------------

There was a bug in the Makefile since the line:
default: cmd
was missing.

Fixed in SVN (r41937), please try again.

Markus

Hi Markus,
now ps.output installation works fine!

thank you as always!

Francesco

Markus Neteler wrote:

On Tue, Apr 20, 2010 at 3:15 PM, Francesco Mirabella <mirabell@unipg.it> wrote:

Hi all,
I am trying to compile ps.output from the addons.
I've tried both the "make MODULE_TOPDIR= etcc...." way which works for
other addons as well as the g.extension way
(g.extension extension=ps.output
svnurl=https://svn.osgeo.org/grass/grass-addons/ prefix=${GISBASE}).

I get this error in both ways, can someone help ??
---------------------------------------------------------------------------------------------------
Compiling ps.output...
make: *** No rule to make target `default', needed by
`first'. Stop.
ERROR: Compilation failed, sorry. Please check above error messages
---------------------------------------------------------------------------------------------------

There was a bug in the Makefile since the line:
default: cmd
was missing.

Fixed in SVN (r41937), please try again.

Markus