[pgrouting-users] Checking Network Validity (was "Nasty Crash")

Well with the ice storm, the hard disk that should have arrived today, has been delayed. So updating to an Ubuntu system with the latest pgRouting is going to have to wait.

I’ve been trying to clean up my road network, but I’m still getting the crash.
My data is the entire planet.osm OpenStreetMaps dataset loaded into PostGIS using osm2po.

It seems there isn’t a way to determine what the problem is, other than going through a list of possible network problems (and using the latest pgRouting).
Here are the problems I have thought of:

  • Zero costs or very short lengths
    I do not have negative costs or lengths.
    I did have some zero costs and lengths, so I have updated my view to set these to my geography lengths (previously I used geometry lengths, but with global data I don’t think this is a good idea as it will cause problems with routes that cross a wide range of latitudes (eg. Chile, Russia, or Canada).
    Of course using geography lengths is currently very slow, but this can be cached in its own column.

I’m using the dijkstra__sp_delta() function. If I understand correctly, this requires length, cost, and reverse cost fields?
And I should set these to the same (which is what I’m doing)?

The geography lengths mean I no longer have any zero costs, but I do have 48,000 links which are less than a metre, and 1,376 links which are less than 0.1metres.
Then I can remove all nodes which are within 1m of another node. I think I will require a Python script to do this, as it will require renumbering.
(I’m actually thinking of creating a copy of my link table that I can modify without losing my original table)

  • I don’t have any links which start and stop at the same node.
    Due to my intended application, closed loops can be removed if I do find them.

  • High node identifier values
    I have heard of problems with high node identifiers. What exactly is “too high”?
    As I’m using the entire planet.osm, I am going to have a huge number of nodes. So high node identifiers seem like they might be a problem?

Are there any other problems I should look for?