[GRASS-dev] [GRASS GIS] #1694: r.in.lidar tries to allocate way too much memory

#1694: r.in.lidar tries to allocate way too much memory
---------------------+------------------------------------------------------
Reporter: torsti | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Default | Version: svn-trunk
Keywords: | Platform: Linux
      Cpu: x86-64 |
---------------------+------------------------------------------------------
Trying to import a LAS dataset containing ~11 million points with
r.in.lidar I get the following error message: "ERROR: G_calloc: unable to
allocate 18446744073471563701 * 4 bytes of memory at main.c:528"

I know the dataset is large, but allocating ~64 exabytes of memory seems a
bit excessive.

Importing the same dataset using v.in.lidar works and with 6.4.2 and
las2txt neither v.in.ascii nor r.in.ascii have any problems with the
same dataset.

GRASS 7.0 version was revision 52573.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1694&gt;
GRASS GIS <http://grass.osgeo.org>

#1694: r.in.lidar tries to allocate way too much memory
------------------------+---------------------------------------------------
Reporter: torsti | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-trunk
Keywords: r.in.lidar | Platform: Linux
      Cpu: x86-64 |
------------------------+---------------------------------------------------
Changes (by martinl):

  * keywords: => r.in.lidar
  * component: Default => Raster

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1694#comment:1&gt;
GRASS GIS <http://grass.osgeo.org>

#1694: r.in.lidar tries to allocate way too much memory
------------------------+---------------------------------------------------
Reporter: torsti | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-trunk
Keywords: r.in.lidar | Platform: Linux
      Cpu: x86-64 |
------------------------+---------------------------------------------------

Comment(by dnewcomb):

>I know the dataset is large, but allocating ~64 exabytes of memory seems
a bit >excessive.

That must be a recent development . I was able to use r.in.lidar on 7 x
3.3 billion point las files simultaneously on an 8 core computer to point
count and calculate range range using the 2012_04_21 svn snapshot without
excessive memory use. How big was the region and how many cells were you
trying to process into?

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1694#comment:2&gt;
GRASS GIS <http://grass.osgeo.org>

#1694: r.in.lidar tries to allocate way too much memory
------------------------+---------------------------------------------------
Reporter: torsti | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-trunk
Keywords: r.in.lidar | Platform: Linux
      Cpu: x86-64 |
------------------------+---------------------------------------------------

Comment(by hamish):

please check your region resolution, what does g.region say about the
number of rows and columns? memory use is directly tied to the region
resolution and the statistical aggregation method used. (which method?)
See the r.in.xyz man page for discussion about it.

on a positive note, I'm happy to see that the G_alloc() calculation for
how much memory it needs seems to handle & printf into the exabyte range
without overflowing..
for some future time when the datasets are actually that big :slight_smile:

Hamish

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1694#comment:3&gt;
GRASS GIS <http://grass.osgeo.org>

#1694: r.in.lidar tries to allocate way too much memory
------------------------+---------------------------------------------------
Reporter: torsti | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-trunk
Keywords: r.in.lidar | Platform: Linux
      Cpu: x86-64 |
------------------------+---------------------------------------------------

Comment(by hamish):

> See the r.in.xyz man page for discussion about it.

(
the choice raster resolution in r.in.xyz and r.in.lidar has a profound
effect on the result, and must be chosen wisely. I typically do several
iterations at different raster resolutions and do some stats on the
(masked) results to find the optimal one. I have purposely avoided having
the modules make any attempt to choose that for you since it is such a
dataset and purpose driven choice, and needs the human operator to
consider factors beyond the numbers themselves.
)

Hamish

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1694#comment:4&gt;
GRASS GIS <http://grass.osgeo.org>

#1694: r.in.lidar tries to allocate way too much memory
------------------------+---------------------------------------------------
Reporter: torsti | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-trunk
Keywords: r.in.lidar | Platform: Linux
      Cpu: x86-64 |
------------------------+---------------------------------------------------

Comment(by torsti):

The region the las file covers is 3000 by 3000 (meters) and the resolution
was 1x1.

The command:

{{{
r.in.lidar -o --overwrite input=R4133C4.laz output=R4133C4.las method=mean
}}}

After setting the resolution to 10 by 10 (g.region res=10) it still wants
almost the same amount of memory:
{{{
ERROR: G_calloc: unable to allocate 18446744072977286712 * 4 bytes of
memory at main.c:528
}}}

With cellsize 20x20:

{{{
ERROR: G_calloc: unable to allocate 1964475953 * 4 bytes of memory at
main.c:528
}}}

This is in the range of mortal computers, I just happen to be testing on a
machine too weak for this kind of processing :wink:

With bigger cell sizes it runs, but the result is not really useful.

For small areas it runs fine on higher resolutions, e.g. a 100 by 100 area
with cellsize 1 by 1.

My issue is not that r.in.lidar can't be used on large datasets on
underpowered computers, I'm just wondering whether the 64 exabytes can be
the right amount of memory needed for cell sizes of 1x1 to 10x10 for a
total area of 3000mx3000m with an average point density a bit over 1 point
per square meter (11000000 points/ 9000000 m^2).

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1694#comment:5&gt;
GRASS GIS <http://grass.osgeo.org>

#1694: r.in.lidar tries to allocate way too much memory
------------------------+---------------------------------------------------
Reporter: torsti | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-trunk
Keywords: r.in.lidar | Platform: Linux
      Cpu: x86-64 |
------------------------+---------------------------------------------------

Comment(by mmetz):

Replying to [comment:5 torsti]:
> The region the las file covers is 3000 by 3000 (meters) and the
resolution was 1x1.

Can you provide the current region settings for the 1x1 resolution, i.e.
the output of g.region -p?

What matters is not the resolution alone but the number of rows and
columns in the current region, which are determined by the region extents
and the resolution. That is, you probably need to check and adjust the
region extents.
>
> The command:
>
{{{
r.in.lidar -o --overwrite input=R4133C4.laz output=R4133C4.las method=mean
}}}

You might try the percent option. By default the whole map is kept in
memory (percent=100)

>
> My issue is not that r.in.lidar can't be used on large datasets on
underpowered computers, I'm just wondering whether the 64 exabytes can be
the right amount of memory needed for cell sizes of 1x1 to 10x10 for a
total area of 3000mx3000m with an average point density a bit over 1 point
per square meter (11000000 points/ 9000000 m^2).

With the right region settings and making use of the percent option it
should be possible to import this dataset in no time. Instead of changing
only the resolution, you can try r.in.lidar on a subregion and here figure
out the resolution that provides the desired results. Then set the region
to cover the full input dataset (adjust extents, align to desired
resolution) and import the full dataset.

HTH,

Markus M

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1694#comment:6&gt;
GRASS GIS <http://grass.osgeo.org>

#1694: r.in.lidar tries to allocate way too much memory
------------------------+---------------------------------------------------
Reporter: torsti | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-trunk
Keywords: r.in.lidar | Platform: Linux
      Cpu: x86-64 |
------------------------+---------------------------------------------------

Comment(by hamish):

as MarkusM asked, what does `g.region -p` say?

can you turn debug level to 2? (`g.gisenv set="DEBUG=2"`, then back to 0
to turn it off)

for a 3000x3000 computational region and method=mean it should use
  3000*(3000+1)*4 * 2 / 1024000 = 70.3 MB
of RAM to hold the data, and complete in just a few seconds.

does `las2txt | r.in.xyz input=-` work? (see LIDAR page in the grass wiki
for correct usage)

Hamish

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1694#comment:7&gt;
GRASS GIS <http://grass.osgeo.org>

#1694: r.in.lidar tries to allocate way too much memory
------------------------+---------------------------------------------------
Reporter: torsti | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-trunk
Keywords: r.in.lidar | Platform: Linux
      Cpu: x86-64 |
------------------------+---------------------------------------------------

Comment(by torsti):

So memory allocation is based on the extent of the region and not the
bounding box of the LAS data, that explains a lot. That was my mistake
there! Still, to be on the safe side I've included the more detailed
information that was asked for.

g.region -p
{{{
D2/2: G__read_Cell_head
D2/2: G__read_Cell_head_array
D2/2: G__read_Cell_head
D2/2: G__read_Cell_head_array
projection: 1 (UTM)
zone: 35
datum: etrs89
ellipsoid: grs80
north: 7776450.217
south: 6605838.902
west: 61686.152
east: 732907.723
nsres: 1.00000027
ewres: 0.99999936
rows: 1170611
cols: 671222
cells: 785739856642
}}}

lasinfo R4133C4.laz
{{{
---------------------------------------------------------
   Header Summary
---------------------------------------------------------

   Version: 1.2
   Source ID: 0
   Reserved: 0
   Project ID/GUID: '00000000-0000-0000-0000-000000000000'
   System ID: ''
   Generating Software: 'EspaEngine'
   File Creation Day/Year: 0/0
   Header Byte Size 227
   Data Offset: 329
   Header Padding: 2
   Number Var. Length Records: 1
   Point Data Format: 1
   Number of Point Records: 11064863
   Compressed: True
   Compression Info: LASzip Version 2.1r0 c2 50000: POINT10 2
GPSTIME11 2
   Number of Points by Return: 0 0 0 0 0
   Scale Factor X Y Z: 0.01 0.01 0.01
   Offset X Y Z: -0.00 -0.00 -0.00
   Min X Y Z: 389000.00 7149000.00 91.36
   Max X Y Z: 391999.99 7151999.99 139.61
   Spatial Reference:
None

None

...

}}}

I updated r.in.lidar to revision
[https://trac.osgeo.org/grass/changeset/52593 52593].

both r.in.lidar and r.in.xyz complain about the amount of memory, because
the region is too big, but the amount of memory they ask for is not in the
exabyte range.

{{{
> r.in.lidar -o --overwrite input=R4133C4.laz output=R4133C4.las
method=mean

D2/2: G__read_Cell_head
D2/2: G__read_Cell_head_array
Over-riding projection check
D2/2: region.n=7776450.217000 region.s=6605838.902000
region.ns_res=1.000000
D2/2: region.rows=1170611 [box_rows=1170611] region.cols=671222
Current region rows: 1170611, cols: 671222
ERROR: G_calloc: unable to allocate 785741027253 * 4 bytes of memory at
        main.c:534
}}}

{{{
> las2txt --keep-classes 2 --parse xyz --delimiter="|" --input R4133C4.las
--output=/tmp/las.tmp
> r.in.xyz input=/tmp/las.tmp output=R4133C4_ground_points

D2/2: G__read_Cell_head
D2/2: G__read_Cell_head_array
D2/2: region.n=7776450.217000 region.s=6605838.902000
region.ns_res=1.000000
D2/2: region.rows=1170611 [box_rows=1170611] region.cols=671222
Current region rows: 1170611, cols: 671222
ERROR: G_calloc: unable to allocate 785741027253 * 4 bytes of memory at
        main.c:491
}}}

After adjusting the extent to the BBOX of the LAS data:
{{{
g.region -a n=7152000 s=7149000 e=392000 w=389000 res=1
}}}

r.in.lidar
{{{
D2/2: G__read_Cell_head
D2/2: G__read_Cell_head_array
Over-riding projection check
D2/2: region.n=7152000.000000 region.s=7149000.000000
region.ns_res=1.000000
D2/2: region.rows=3000 [box_rows=3000] region.cols=3000
Reading data ...
D2/2: pass=1/1 pass_n=7152000.000000 pass_s=7149000.000000 rows=3000
D2/2: allocating n_array
D2/2: allocating sum_array
  100%
D2/2: pass 1 finished, 11064827 coordinates in box
Writing to map ...
  100%
D1/2: close R4133C4.las compressed
D1/2: G_find_raster2(): name=R4133C4.las mapset=PERMANENT
D1/2: G_find_raster2(): name=R4133C4.las mapset=PERMANENT
D1/2: G_find_raster2(): name=R4133C4.las mapset=PERMANENT
r.in.lidar complete. 11064827 points found in region.
D1/2: Processed 11064863 points
}}}

r.in.xyz
{{{
D2/2: G__read_Cell_head
D2/2: G__read_Cell_head_array
D2/2: region.n=7152000.000000 region.s=7149000.000000
region.ns_res=1.000000
D2/2: region.rows=3000 [box_rows=3000] region.cols=3000
D2/2: estimated number of lines in file: 3224929
Reading data ...
D2/2: pass=1/1 pass_n=7152000.000000 pass_s=7149000.000000 rows=3000
D2/2: allocating n_array
D2/2: allocating sum_array
  100%
D2/2: pass 1 finished, 3135351 coordinates in box
Writing to map ...
  100%
D1/2: close R4133C4_ground_points compressed
D1/2: G_find_raster2(): name=R4133C4_ground_points mapset=PERMANENT
D1/2: G_find_raster2(): name=R4133C4_ground_points mapset=PERMANENT
D1/2: G_find_raster2(): name=R4133C4_ground_points mapset=PERMANENT
r.in.xyz complete. 3135351 points found in region.
D1/2: Processed 3135367 lines.
}}}

So everything seems to work now.

Sorry for the most likely unfounded & inaccurate bug report.

Cheers,
Torsti

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1694#comment:8&gt;
GRASS GIS <http://grass.osgeo.org>

#1694: r.in.lidar tries to allocate way too much memory
----------------------+-----------------------------------------------------
  Reporter: torsti | Owner: grass-dev@…
      Type: defect | Status: closed
  Priority: normal | Milestone: 7.0.0
Component: Raster | Version: svn-trunk
Resolution: invalid | Keywords: r.in.lidar
  Platform: Linux | Cpu: x86-64
----------------------+-----------------------------------------------------
Changes (by mmetz):

  * status: new => closed
  * resolution: => invalid

Comment:

Replying to [comment:8 torsti]:
> So memory allocation is based on the extent of the region and not the
bounding box of the LAS data, that explains a lot.

I have added a paragraph to the manuals of r.in.lidar and r.in.xyz that
emphasizes that.

>
> Sorry for the most likely unfounded & inaccurate bug report.

Let's say it was unclear documentation.

Closing ticket.

Markus M

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/1694#comment:9&gt;
GRASS GIS <http://grass.osgeo.org>