[GRASS-user] Problem importing/viewing a ~160MB shapefile

Dear All, apologies for the cross-posting but I have trouble to work
with a BIG shapefil (~160MB). I already used that shapefile in the past
without problems. But now I can't use it.

In GRASS: I (almost) managed to import in GRASS this BIG shapefile but
the process *stucks* somewhere at "breaking intersections" point (...it
was running for 10 hours). I broke the process and the file was in
GRASS' database. I run v.build but it still is problematic. I can't get
anything on-display.

In QGIS: when I try to load directly the shapefile (not the GRASS
vector) my system freezes!! I haven't seen my linuxbox frozen since I
don't remember when. What could cause such behaviour? Is there anyway to
get the shapefile fixed out of GRASS?

Thank you, Nikos

Nikos Alexandris wrote:

Dear All, apologies for the cross-posting but I have trouble to work
with a BIG shapefil (~160MB). I already used that shapefile in the past
without problems. But now I can't use it.

? what changed? did you install a new version of GDAL/OGR?
I use 400mb shapefiles without problem..

In GRASS: I (almost) managed to import in GRASS this BIG shapefile but
the process *stucks* somewhere at "breaking intersections" point (...it
was running for 10 hours). I broke the process and the file was in
GRASS' database. I run v.build but it still is problematic. I can't get
anything on-display.

It just takes a long time when lines are very long and unbroken. It
eventually gets there.

See Radim's vector TODO:
  http://trac.osgeo.org/grass/browser/grass/trunk/doc/vector/TODO#L242

and search the archives for the "Florida Coastline" problem.

the v.in.ogr spatial= option might help.

In QGIS: when I try to load directly the shapefile (not the GRASS
vector) my system freezes!! I haven't seen my linuxbox frozen since I
don't remember when. What could cause such behaviour?
Is there anyway to get the shapefile fixed out of GRASS?

once loaded into GRASS you can run the v.split command to split the long
lines, then it is much faster (at least for other grass commands).

Hamish

On Mon, 2008-10-13 at 17:07 -0700, Hamish wrote:

Nikos Alexandris wrote:
> Dear All, apologies for the cross-posting but I have trouble to work
> with a BIG shapefil (~160MB). I already used that shapefile in the past
> without problems. But now I can't use it.

? what changed? did you install a new version of GDAL/OGR?
I use 400mb shapefiles without problem..

Yes. Last time I used this file was some months ago (but the file is the
same... :-?).

> In GRASS: I (almost) managed to import in GRASS this BIG shapefile but
> the process *stucks* somewhere at "breaking intersections" point (...it
> was running for 10 hours). I broke the process and the file was in
> GRASS' database. I run v.build but it still is problematic. I can't get
> anything on-display.

It just takes a long time when lines are very long and unbroken. It
eventually gets there.

See Radim's vector TODO:
  http://trac.osgeo.org/grass/browser/grass/trunk/doc/vector/TODO#L242

and search the archives for the "Florida Coastline" problem.

I am reading...

the v.in.ogr spatial= option might help.

> In QGIS: when I try to load directly the shapefile (not the GRASS
> vector) my system freezes!! I haven't seen my linuxbox frozen since I
> don't remember when. What could cause such behaviour?
> Is there anyway to get the shapefile fixed out of GRASS?

once loaded into GRASS you can run the v.split command to split the long
lines, then it is much faster (at least for other grass commands).

Thanks for the tip.

Hamish

Regards, Nikos

On Mon, 2008-10-13 at 17:07 -0700, Hamish wrote:

Nikos Alexandris wrote:
> Dear All, apologies for the cross-posting but I have trouble to work
> with a BIG shapefil (~160MB). I already used that shapefile in the past
> without problems. But now I can't use it.

? what changed? did you install a new version of GDAL/OGR?
I use 400mb shapefiles without problem..

> In GRASS: I (almost) managed to import in GRASS this BIG shapefile but
> the process *stucks* somewhere at "breaking intersections" point (...it
> was running for 10 hours). I broke the process and the file was in
> GRASS' database. I run v.build but it still is problematic. I can't get
> anything on-display.

It just takes a long time when lines are very long and unbroken. It
eventually gets there.

See Radim's vector TODO:
  http://trac.osgeo.org/grass/browser/grass/trunk/doc/vector/TODO#L242

and search the archives for the "Florida Coastline" problem.

the v.in.ogr spatial= option might help.

> In QGIS: when I try to load directly the shapefile (not the GRASS
> vector) my system freezes!! I haven't seen my linuxbox frozen since I
> don't remember when. What could cause such behaviour?
> Is there anyway to get the shapefile fixed out of GRASS?

once loaded into GRASS you can run the v.split command to split the long
lines, then it is much faster (at least for other grass commands).

Hamish

Hamish and all,
back to this "old" post.

********
Attemp 1
********
# using gdal 1.5.3, grass6_devel
# import shapefile
v.in.ogr dsn=... out=...
# buffer overflow error (or segfault --- I can't recall).

********
Attemp 2
********
# using gdal-1.6.0beta2, grass6_devel
# importing the usual way
v.in.ogr dsn=coastlines.shp out=coastlines_longtime -ew --o

Projection of input dataset and current location appear to match
Layer: coastlines
WARNING: Column name changed: 'AREA' -> 'area'
WARNING: Column name changed: 'PERIMETER' -> 'perimeter'
[...]
Importing map 181148 features...
-----------------------------------------------------
Building topology for vector map <global_coastlines_longtime>...
Registering primitives...
181386 primitives registered
9764427 vertices registered
Building areas...
100%
181386 areas built
181386 isles built
Attaching islands...
100%
Attaching centroids...
100%
Number of nodes: 181386
Number of primitives: 181386
Number of points: 0
Number of lines: 0
Number of boundaries: 181386
Number of centroids: 0
Number of areas: 181386
Number of isles: 181386
Number of areas without centroid: 181386
-----------------------------------------------------
WARNING: Cleaning polygons, result is not guaranteed!
100%
Building topology for vector map <global_coastlines_longtime>...
Number of nodes: 181386
Number of primitives: 181386
Number of points: 0
Number of lines: 0
Number of boundaries: 181386
Number of centroids: 0
Number of areas: -
Number of isles: -
-----------------------------------------------------
Break polygons:
^[^[-----------------------------------------------------
Remove duplicates:
-----------------------------------------------------
Break boundaries:

# file get's imported and, as it can be seen from the output "breaking
polygons" and "removing duplicates" is successful
# the "^[^[" characters was me pressing something to check whether the
system is still running or not
# breaking boundaries is still there even after more than 30 hours!

********
Attemp 3
********
# v.in.ogr -c ## takes some time but is successful

# v.split (vertices=3) ## I have really no idea which number is
appropriated depending on what. ### it works

# v.clean tool=break ## never ends or it might need more than 2-3 days!?

I can open the file with QGIS preview (unstable) but not with QGIS
0.11.0. The latter freezes the whole system, as I have reported in my
first post about this.

Regards, Nikos

On Sat, 2008-11-29 at 15:12 +0100, Nikos Alexandris wrote:

On Mon, 2008-10-13 at 17:07 -0700, Hamish wrote:
> Nikos Alexandris wrote:
> > Dear All, apologies for the cross-posting but I have trouble to work
> > with a BIG shapefil (~160MB). I already used that shapefile in the past
> > without problems. But now I can't use it.
>
> ? what changed? did you install a new version of GDAL/OGR?
> I use 400mb shapefiles without problem..
>
>
> > In GRASS: I (almost) managed to import in GRASS this BIG shapefile but
> > the process *stucks* somewhere at "breaking intersections" point (...it
> > was running for 10 hours). I broke the process and the file was in
> > GRASS' database. I run v.build but it still is problematic. I can't get
> > anything on-display.
>
> It just takes a long time when lines are very long and unbroken. It
> eventually gets there.
>
> See Radim's vector TODO:
> http://trac.osgeo.org/grass/browser/grass/trunk/doc/vector/TODO#L242
>
> and search the archives for the "Florida Coastline" problem.
>
>
> the v.in.ogr spatial= option might help.
>
>
> > In QGIS: when I try to load directly the shapefile (not the GRASS
> > vector) my system freezes!! I haven't seen my linuxbox frozen since I
> > don't remember when. What could cause such behaviour?
> > Is there anyway to get the shapefile fixed out of GRASS?
>
> once loaded into GRASS you can run the v.split command to split the long
> lines, then it is much faster (at least for other grass commands).
>
>
> Hamish

Hamish and all,
back to this "old" post.

********
Attemp 1
********
# using gdal 1.5.3, grass6_devel
# import shapefile
v.in.ogr dsn=... out=...
# buffer overflow error (or segfault --- I can't recall).

********
Attemp 2
********
# using gdal-1.6.0beta2, grass6_devel
# importing the usual way
v.in.ogr dsn=coastlines.shp out=coastlines_longtime -ew --o

Projection of input dataset and current location appear to match
Layer: coastlines
WARNING: Column name changed: 'AREA' -> 'area'
WARNING: Column name changed: 'PERIMETER' -> 'perimeter'
[...]
Importing map 181148 features...
-----------------------------------------------------
Building topology for vector map <global_coastlines_longtime>...
Registering primitives...
181386 primitives registered
9764427 vertices registered
Building areas...
100%
181386 areas built
181386 isles built
Attaching islands...
100%
Attaching centroids...
100%
Number of nodes: 181386
Number of primitives: 181386
Number of points: 0
Number of lines: 0
Number of boundaries: 181386
Number of centroids: 0
Number of areas: 181386
Number of isles: 181386
Number of areas without centroid: 181386
-----------------------------------------------------
WARNING: Cleaning polygons, result is not guaranteed!
100%
Building topology for vector map <global_coastlines_longtime>...
Number of nodes: 181386
Number of primitives: 181386
Number of points: 0
Number of lines: 0
Number of boundaries: 181386
Number of centroids: 0
Number of areas: -
Number of isles: -
-----------------------------------------------------
Break polygons:
^[^[-----------------------------------------------------
Remove duplicates:
-----------------------------------------------------
Break boundaries:

# file get's imported and, as it can be seen from the output "breaking
polygons" and "removing duplicates" is successful
# the "^[^[" characters was me pressing something to check whether the
system is still running or not
# breaking boundaries is still there even after more than 30 hours!

********
Attemp 3
********
# v.in.ogr -c ## takes some time but is successful

# v.split (vertices=3) ## I have really no idea which number is
appropriated depending on what. ### it works

# v.clean tool=break ## never ends or it might need more than 2-3 days!?

I can open the file with QGIS preview (unstable) but not with QGIS
0.11.0. The latter freezes the whole system, as I have reported in my
first post about this.

Regards, Nikos

Once again apologies for cross-posting. I face this strange behaviour:

While QGIS (unstable, revision 9711) can open and report GRASS' region
setting for a lat-long location and load/view the above mentioned
"coastline" dataset (both the shapefile and the GRASS vector, the one
after v.in.ogr -c and NOT after v.clean!!), GRASS reports the error:

g.region -p
ERROR: default region is invalid
       line 4: <south: 90:22:00.17015S>

No matter what I do (even "g.region help") GRASS gives this error. From
within QGIS, the "Edit current regions settings" dialog seems to just
work. However, I cannot use any g.region.* tool from QGIS'
GRASS-Toolbox.

Shrug!!

Nikos wrote:

While QGIS (unstable, revision 9711) can open and report GRASS' region
setting for a lat-long location and load/view the above mentioned
"coastline" dataset (both the shapefile and the GRASS vector, the one
after v.in.ogr -c and NOT after v.clean!!), GRASS reports the error:

g.region -p
ERROR: default region is invalid
       line 4: <south: 90:22:00.17015S>

latitude > 90deg can not exist.

fix it in line 4 of $LOCATION/PERMANENT/DEFAULT_WIND or from within the
PERMANENT mapset run "g.region -s" to set the current (valid) region
to be the default one.

WRT large shapefiles: your buffer overflow/segfault probably has little
to do with the size of the file. Others regularly load much bigger
shapefiles into/out of grass. Large file errors typically start to show
themselves around the 2GB mark. You need to run a GDB backtrack to see
the cause.

the "florida coastline" problem (processing huge single polyline
boundaries) does not result in the program breaking. it is just an
inefficient method which takes a very very long time. It will not result
in a buffer overflow or a segfault. That is something different.

How about "v.in.ogr spatial=" ?

Hamish

On Sat, 2008-11-29 at 19:01 -0800, Hamish wrote:

Nikos wrote:
> While QGIS (unstable, revision 9711) can open and report GRASS' region
> setting for a lat-long location and load/view the above mentioned
> "coastline" dataset (both the shapefile and the GRASS vector, the one
> after v.in.ogr -c and NOT after v.clean!!), GRASS reports the error:
>
> g.region -p
> ERROR: default region is invalid
> line 4: <south: 90:22:00.17015S>

latitude > 90deg can not exist.

What might be the reason creating this "illegal" latitude value?

# the shapfile seems to be "legal"
# ogrinfo coastlines.shp -al -so

INFO: Open of
`/geo/geodata/world/coastlines/coastlines_HR/coastlines.shp'
      using driver `ESRI Shapefile' successful.

Layer name: coastlines
Geometry: Polygon
Feature Count: 181148
Extent: (-180.000000, -90.000000) - (180.000000, 83.633286)
Layer SRS WKT:
GEOGCS["GCS_WGS_1984",
    DATUM["WGS_1984",
        SPHEROID["WGS_1984",6378137.0,298.257223563]],
    PRIMEM["Greenwich",0.0],
    UNIT["Degree",0.0174532925199433]]
[...]

fix it in line 4 of $LOCATION/PERMANENT/DEFAULT_WIND or from within the
PERMANENT mapset run "g.region -s" to set the current (valid) region
to be the default one.

Instead I created a new location (g.proj -c georef=TheShapefile
location=...) and it works.
# region report
# no rasters present, so no interest for resolution(right?)
g.region -p

projection: 3 (Latitude-Longitude)
zone: 0
datum: wgs84
ellipsoid: wgs84
north: 83:37:59.82985N
south: 90S
west: 180W
east: 180E
nsres: 8:40:53.991492
ewres: 18
rows: 20
cols: 20
cells: 400

WRT large shapefiles: your buffer overflow/segfault probably has little
to do with the size of the file. Others regularly load much bigger
shapefiles into/out of grass. Large file errors typically start to show
themselves around the 2GB mark. You need to run a GDB backtrack to see
the cause.

I want to believe that segfaults were related with Ubuntu 8.10 + gdal
1.5.3. I think I have found another, recent, reference about this in the
archive.

the "florida coastline" problem (processing huge single polyline
boundaries) does not result in the program breaking. it is just an
inefficient method which takes a very very long time. It will not result
in a buffer overflow or a segfault. That is something different.

* The shapefile I am trying to "work-out" is a coastline dataset as
well. But it never really goes through the v.in.ogr process or the
"v.in.ogr -c" + "v.split" + "v.clean tool=break" (see previous posts on
this thread of course). It always "hangs" at the "breaking boundaries"
step.

----------------------------------------------------------------------
* Question: What's the difference between of "polygon" and "boundary"?

* Why the "break polygons" step during "building" process? I thought
that there are *only* -nodes & primitives- points, centroids, lines,
boundaries, areas, isles
----------------------------------------------------------------------

* After stopping the process "Ctrl+C" v.info complains (naturally I
guess) about:

ERROR: Unable to open vector map <cstlns_global@PERMANENT> on level 2.
Try
       to rebuild vector topology by v.build.

* Then, v.build warns:
WARNING: Coor files of vector map <cstlns_global@PERMANENT> is larger
than
         it should be (29979221 bytes excess)

and starts working.

* Question: How long should I leave the system working? As long as it
takes? My last attempt was >30 hours.

How about "v.in.ogr spatial=" ?
Hamish

It works, at least over Greece :-). But this does not resolve the
"problem", does it? In case one needs the whole vector map (in my case a
global dataset) it shouldn't be necessary to import it step-wise.

Regards, Nikos

On Sun, 2008-11-30 at 13:42 +0100, Nikos Alexandris wrote:

On Sat, 2008-11-29 at 19:01 -0800, Hamish wrote:
> Nikos wrote:

[...]

* The shapefile I am trying to "work-out" is a coastline dataset as
well. But it never really goes through the v.in.ogr process or the
"v.in.ogr -c" + "v.split" + "v.clean tool=break" (see previous posts on
this thread of course). It always "hangs" at the "breaking boundaries"
step.

[...]

* After stopping the process "Ctrl+C" v.info complains (naturally I
guess) about:

ERROR: Unable to open vector map <cstlns_global@PERMANENT> on level 2.
Try
       to rebuild vector topology by v.build.

* Then, v.build warns:
WARNING: Coor files of vector map <cstlns_global@PERMANENT> is larger
than
         it should be (29979221 bytes excess)

and starts working.

(Leaving QGIS-mailing list out)

v.build worked!!! For the first time.

Nikos