[pgrouting-dev] APSP Implementation

Hi,

I am Jay Mahadeokar, Computer Science MTech Student at Indian Institute of Technology, Kanpur.

I am interested in contributing to pgRouting and was looking at APSP implementation. I have discussed this with Mr. Daniel Kastl and we thought it would be better to float the ideas on the dev mailing list.

For the APSP implementation, I read the boost documentation, and it has implemented APSP using Johnsons algorithm as well as Warshall’s Algorithm. Which one should we look at? To implement it for pgRouting, we will need to code a boost_wrapper function (which will call the actual boost apsp function) and one outer APSP function which will use the executor/SPI to query database and call the wrapper (similar to the dijkestra’s implementation). Please correct me if I am wrong.

It would also be helpful to come up with a concrete prototype for the function. The initial RFC is drafted here: http://www.pgrouting.org/rfc/rfc-03.html

Since I am new to this community, I would like to know the development tools and conventions used.

Regards,
-Jay Mahadeokar

On 12/1/2010 9:08 AM, Jay Mahadeokar wrote:

Hi,

I am Jay Mahadeokar, Computer Science MTech Student at Indian Institute
of Technology, Kanpur.

I am interested in contributing to pgRouting and was looking at APSP
implementation. I have discussed this with Mr. Daniel Kastl and we
thought it would be better to float the ideas on the dev mailing list.

Greeting Jay,

I think this would be an excellent addition to pgRouting. I have been interested in seeing an algorithm like this added, as it is a good precursor to feed data to TSP problem using a simulated annealing algorithm.

For the APSP implementation, I read the boost documentation, and it has
implemented APSP using Johnsons algorithm as well as Warshall's
Algorithm. Which one should we look at? To implement it for pgRouting,
we will need to code a boost_wrapper function (which will call the
actual boost apsp function) and one outer APSP function which will use
the executor/SPI to query database and call the wrapper (similar to the
dijkestra's implementation). Please correct me if I am wrong.

It looks like Johnsons algorithm might be a little faster on sparse graphs, but I'm not sure if you can extract the path between the pairs if you need that and I think that would be a valuable component to the algorithm, so the Warshall Algorithm might be better if we support that extraction capability.

I'm not sure how these algorithms work from the point of view that you have to extract ALL pairs from the graph, or whether you can just extract selected pairs at a reduced cost. The use case for this would be that you have 20 locations that you are interested in and you want the APSP between those 20 locations and not the whole graph.

I think some thought needs to go into:

1. what are the results? just a V^2 table of costs?

2. can we extract the paths? In pgRouting, this has some additional implications beyond is it physically possible to extract paths from the algorithm.
   a. typically queries do not store results, they only return sets
   b. extracting paths, implies computing assembling a graph, computing the results and holding the results for additional queries against them to extract paths.
   c. while this is potentially possible using temp tables, we do not currently do that

Anyway, these are some things to start a discussion and provoke some thoughts. I do not want to grow the scope of this project to the point that it is problematic, so we should focus on what can be done and plan for future additions if they are needed.

I look forward to working with you. Thank you for your interest in pgRouting.

Best regards,
   -Steve W

It would also be helpful to come up with a concrete prototype for the
function. The initial RFC is drafted here:
http://www.pgrouting.org/rfc/rfc-03.html

Since I am new to this community, I would like to know the development
tools and conventions used.
--
Regards,
-Jay Mahadeokar

_______________________________________________
pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

Hi Stephen,

Thanks for the quick input. Some queries in-line.Greeting Jay,

It looks like Johnsons algorithm might be a little faster on sparse graphs, but I’m not sure if you can extract the path between the pairs if you need that and I think that would be a valuable component to the algorithm, so the Warshall Algorithm might be better if we support that extraction capability.

I’m not sure how these algorithms work from the point of view that you have to extract ALL pairs from the graph, or whether you can just extract selected pairs at a reduced cost. The use case for this would be that you have 20 locations that you are interested in and you want the APSP between those 20 locations and not the whole graph.

Notations that I have used:
m - edges.
n - vertices.
v - vertices in subset.

Our postgres map data constitutes a planar graph right? If so, then the number of edges m = O(n). In that case, the dijextra Shortest path algorithm will take O(m + n log n) that is O(n logn) time.

Now, if we want to find APSP between small subset of vertices v, we can call dijextra v^2 times to get total complexity O(v^2* n log n). We will have a good bound until (v^2 log n) < n^2. I guess this will also return the paths between vertices.

When (v^2 log n) > n^2 we can use Warshall algo. Extracting paths will be complicated task.

I think some thought needs to go into:

  1. what are the results? just a V^2 table of costs?

  2. can we extract the paths? In pgRouting, this has some additional implications beyond is it physically possible to extract paths from the algorithm.
    a. typically queries do not store results, they only return sets
    b. extracting paths, implies computing assembling a graph, computing the results and holding the results for additional queries against them to extract paths.
    c. while this is potentially possible using temp tables, we do not currently do that

I did not properly understand point b above.

I am not sure if the boost implementation of Warshall returns a path matrix, http://www.boost.org/doc/libs/1_45_0/libs/graph/doc/floyd_warshall_shortest.html says that it will just return the distance matrix.

In theory, we can simultaneously create a n^2 path matrix along with the distance matrix and use that to extract paths. Pseudo code is present here: http://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm. So, if we want to build the path matrix, we need to code Warshall from scratch, using maybe boost data structures.

If we have vertex ids of whole graph, the path matrix is sufficient to extract all paths (extracting one path will take O(n) time), and I think there is no need for additional queries to database.

Please correct me if I am missing something.

Anyway, these are some things to start a discussion and provoke some thoughts. I do not want to grow the scope of this project to the point that it is problematic, so we should focus on what can be done and plan for future additions if they are needed.

I look forward to working with you. Thank you for your interest in pgRouting.

Best regards,
-Steve W


pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev


Regards,
-Jay Mahadeokar

On 12/1/2010 3:16 PM, Jay Mahadeokar wrote:

Hi Stephen,

Thanks for the quick input. Some queries in-line.Greeting Jay,

    It looks like Johnsons algorithm might be a little faster on sparse
    graphs, but I'm not sure if you can extract the path between the
    pairs if you need that and I think that would be a valuable
    component to the algorithm, so the Warshall Algorithm might be
    better if we support that extraction capability.

    I'm not sure how these algorithms work from the point of view that
    you have to extract ALL pairs from the graph, or whether you can
    just extract selected pairs at a reduced cost. The use case for this
    would be that you have 20 locations that you are interested in and
    you want the APSP between those 20 locations and not the whole graph.

Notations that I have used:
m - edges.
n - vertices.
v - vertices in subset.

Our postgres map data constitutes a planar graph right? If so, then the
number of edges m = O(n). In that case, the dijextra Shortest path
algorithm will take O(m + n log n) that is O(n logn) time.

Now, if we want to find APSP between small subset of vertices v, we can
call dijextra v^2 times to get total complexity O(v^2* n log n). We will
have a good bound until (v^2 log n) < n^2. I guess this will also return
the paths between vertices.

Yes, I also thought of this point, but did not put complexity equations to the problem. I am glad you did.

When (v^2 log n) > n^2 we can use Warshall algo. Extracting paths will
be complicated task.

    I think some thought needs to go into:

    1. what are the results? just a V^2 table of costs?

    2. can we extract the paths? In pgRouting, this has some additional
    implications beyond is it physically possible to extract paths from
    the algorithm.
      a. typically queries do not store results, they only return sets
      b. extracting paths, implies computing assembling a graph,
    computing the results and holding the results for additional queries
    against them to extract paths.
      c. while this is potentially possible using temp tables, we do not
    currently do that

I did not properly understand point b above.

OK, I will try to explain. Typically the process of running a pgRouting query follows something along this pseudo code:

1. select a subset of all node that we think will be enough to solve our problem. This is done via a bounding box of the start and end nodes inw question and expanded somewhat in case the route wanders out of the bounding box.

2. these nodes and the edges connected to them are built into a boost graph

3. the boost graph is solved

4. the results are return to the SQL caller as a set of database records.

5. the boost graph is destroyed

So basically one input, one process, one output. Nothing is saved. If on the other hand we had a process like:

1. build graph
2. solve graph
3. return matrix of distances
4. hold onto graph because these might be addition requests the need the matrix
5. request path A-B from graph
6. request path D-H from graph
7. request ...
8. we are done with the graph so destroy it

Which is my assumption that we would need to do to extract paths, then we do not have a paradigm for this in SQL.

Now we may not need to do that, but I was assuming that there was information in the graph that was easy to extract about the paths that gave the distance returned in the distance matrix.

I am not sure if the boost implementation of Warshall returns a path
matrix,
http://www.boost.org/doc/libs/1_45_0/libs/graph/doc/floyd_warshall_shortest.html
says that it will just return the distance matrix.

In theory, we can simultaneously create a n^2 path matrix along with the
distance matrix and use that to extract paths. Pseudo code is present
here: http://en.wikipedia.org/wiki/Floyd–Warshall_algorithm. So,
if we want to build the path matrix, we need to code Warshall from
scratch, using maybe boost data structures.

I guess this is up to the use cases for this and maybe we should ask others to chime in based on their needs.

I think this really depends on the number of nodes that a user needs to work with. If you assume asymmetric directed graph, the you need to compute n^2 distances. for small say n<20 that is 400 routes and if at 1 sec per route it takes 6.67 minutes; n=100 it takes about 2.78 hours.

My use case would be to feed the results into TSP solver to order the nodes, then to extract the routes between the ordered nodes. I have used simulated annealing to solve TSP very fast.

http://imaptools.com/cgi-bin/tsp.cgi

Copy and paste the cities into the text area and compute the TSP. This demo application geocodes all the cities and comput the straight line distance between them and then solves the TSP problem. It would really be grate to compute the network paths before solving the TSP.

I know pgRouting has a TSP solver, but it requires the CGAL package I think which is hard to find and build on many systems.

If we have vertex ids of whole graph, the path matrix is sufficient to
extract all paths (extracting one path will take O(n) time), and I think
there is no need for additional queries to database.

I saw the path extraction code in the wiki page , but I need to go back over it in more detail.

Best regards,
   -Steve W

Please correct me if I am missing something.

    Anyway, these are some things to start a discussion and provoke some
    thoughts. I do not want to grow the scope of this project to the
    point that it is problematic, so we should focus on what can be done
    and plan for future additions if they are needed.

    I look forward to working with you. Thank you for your interest in
    pgRouting.

    Best regards,
      -Steve W
    _______________________________________________
    pgrouting-dev mailing list
    pgrouting-dev@lists.osgeo.org <mailto:pgrouting-dev@lists.osgeo.org>
    http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

--
Regards,
-Jay Mahadeokar

_______________________________________________
pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

When (v^2 log n) > n^2 we can use Warshall algo. Extracting paths will

be complicated task.

I think some thought needs to go into:

  1. what are the results? just a V^2 table of costs?

  2. can we extract the paths? In pgRouting, this has some additional
    implications beyond is it physically possible to extract paths from
    the algorithm.
    a. typically queries do not store results, they only return sets
    b. extracting paths, implies computing assembling a graph,
    computing the results and holding the results for additional queries
    against them to extract paths.
    c. while this is potentially possible using temp tables, we do not
    currently do that

I did not properly understand point b above.

OK, I will try to explain. Typically the process of running a pgRouting query follows something along this pseudo code:

  1. select a subset of all node that we think will be enough to solve our problem. This is done via a bounding box of the start and end nodes inw question and expanded somewhat in case the route wanders out of the bounding box.

  2. these nodes and the edges connected to them are built into a boost graph

  3. the boost graph is solved

  4. the results are return to the SQL caller as a set of database records.

  5. the boost graph is destroyed

So basically one input, one process, one output. Nothing is saved. If on the other hand we had a process like:

  1. build graph
  2. solve graph
  3. return matrix of distances
  4. hold onto graph because these might be addition requests the need the matrix
  5. request path A-B from graph
  6. request path D-H from graph
  7. request …
  8. we are done with the graph so destroy it

Which is my assumption that we would need to do to extract paths, then we do not have a paradigm for this in SQL.

Now we may not need to do that, but I was assuming that there was information in the graph that was easy to extract about the paths that gave the distance returned in the distance matrix.

Hey… Again, during step 3 above, if we build the path matrix along with the distance matrix for Warshall Algo, we can use that to extract all paths, we will not need any more queries to database. We also need not hold on to the graph! And the overall time is still O(n^3) :slight_smile:

I am not sure if the boost implementation of Warshall returns a path
matrix,
http://www.boost.org/doc/libs/1_45_0/libs/graph/doc/floyd_warshall_shortest.html
says that it will just return the distance matrix.

In theory, we can simultaneously create a n^2 path matrix along with the
distance matrix and use that to extract paths. Pseudo code is present
here: http://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm. So,
if we want to build the path matrix, we need to code Warshall from
scratch, using maybe boost data structures.

I guess this is up to the use cases for this and maybe we should ask others to chime in based on their needs.

I think this really depends on the number of nodes that a user needs to work with. If you assume asymmetric directed graph, the you need to compute n^2 distances. for small say n<20 that is 400 routes and if at 1 sec per route it takes 6.67 minutes; n=100 it takes about 2.78 hours.

I guess assuming 1 sec per route is too slow for modern processors! If you are including the database query latency, then again Path Matrix will make sure we need to query database only once, so the rest of the work should be fast.

My use case would be to feed the results into TSP solver to order the nodes, then to extract the routes between the ordered nodes. I have used simulated annealing to solve TSP very fast.

http://imaptools.com/cgi-bin/tsp.cgi

Copy and paste the cities into the text area and compute the TSP. This demo application geocodes all the cities and comput the straight line distance between them and then solves the TSP problem. It would really be grate to compute the network paths before solving the TSP.

I know pgRouting has a TSP solver, but it requires the CGAL package I think which is hard to find and build on many systems.

Nice Application! I am not sure what is complexity of your algorithm. But if you want to compute actual paths before feeding the algo, I guess the number of nodes in the path will increase by a big factor which will slow down APSP significantly. (You may just consider Highways to get around this, could discuss this on a separate topic!)

If we have vertex ids of whole graph, the path matrix is sufficient to
extract all paths (extracting one path will take O(n) time), and I think
there is no need for additional queries to database.

I saw the path extraction code in the wiki page , but I need to go back over it in more detail.

Best regards,
-Steve W

Please correct me if I am missing something.

Anyway, these are some things to start a discussion and provoke some
thoughts. I do not want to grow the scope of this project to the
point that it is problematic, so we should focus on what can be done
and plan for future additions if they are needed.

I look forward to working with you. Thank you for your interest in
pgRouting.

Best regards,
-Steve W


pgrouting-dev mailing list

pgrouting-dev@lists.osgeo.org mailto:[pgrouting-dev@lists.osgeo.org](mailto:pgrouting-dev@lists.osgeo.org)

http://lists.osgeo.org/mailman/listinfo/pgrouting-dev


Regards,
-Jay Mahadeokar


pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev


pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev


Regards,
-Jay Mahadeokar

Jay,

Sounds like you have a good handle on all the pieces to this problem.
What do you need to get started :slight_smile:

-Steve

On 12/1/2010 6:21 PM, Jay Mahadeokar wrote:

When (v^2 log n) > n^2 we can use Warshall algo. Extracting paths will

        be complicated task.

            I think some thought needs to go into:

            1. what are the results? just a V^2 table of costs?

            2. can we extract the paths? In pgRouting, this has some
        additional
            implications beyond is it physically possible to extract
        paths from
            the algorithm.
              a. typically queries do not store results, they only
        return sets
              b. extracting paths, implies computing assembling a graph,
            computing the results and holding the results for additional
        queries
            against them to extract paths.
              c. while this is potentially possible using temp tables,
        we do not
            currently do that

        I did not properly understand point b above.

    OK, I will try to explain. Typically the process of running a
    pgRouting query follows something along this pseudo code:

    1. select a subset of all node that we think will be enough to solve
    our problem. This is done via a bounding box of the start and end
    nodes inw question and expanded somewhat in case the route wanders
    out of the bounding box.

    2. these nodes and the edges connected to them are built into a
    boost graph

    3. the boost graph is solved

    4. the results are return to the SQL caller as a set of database
    records.

    5. the boost graph is destroyed

    So basically one input, one process, one output. Nothing is saved.
    If on the other hand we had a process like:

    1. build graph
    2. solve graph
    3. return matrix of distances
    4. hold onto graph because these might be addition requests the need
    the matrix
    5. request path A-B from graph
    6. request path D-H from graph
    7. request ...
    8. we are done with the graph so destroy it

    Which is my assumption that we would need to do to extract paths,
    then we do not have a paradigm for this in SQL.

    Now we may not need to do that, but I was assuming that there was
    information in the graph that was easy to extract about the paths
    that gave the distance returned in the distance matrix.

Hey.. Again, during step 3 above, if we build the path matrix along
with the distance matrix for Warshall Algo, we can use that to extract
all paths, we will not need any more queries to database. We also need
not hold on to the graph! And the overall time is still O(n^3) :slight_smile:

        I am not sure if the boost implementation of Warshall returns a path
        matrix,
        http://www.boost.org/doc/libs/1_45_0/libs/graph/doc/floyd_warshall_shortest.html
        says that it will just return the distance matrix.

        In theory, we can simultaneously create a n^2 path matrix along
        with the
        distance matrix and use that to extract paths. Pseudo code is
        present
        here:
        http://en.wikipedia.org/wiki/Floyd–Warshall_algorithm. So,
        if we want to build the path matrix, we need to code Warshall from
        scratch, using maybe boost data structures.

    I guess this is up to the use cases for this and maybe we should ask
    others to chime in based on their needs.

    I think this really depends on the number of nodes that a user needs
    to work with. If you assume asymmetric directed graph, the you need
    to compute n^2 distances. for small say n<20 that is 400 routes and
    if at 1 sec per route it takes 6.67 minutes; n=100 it takes about
    2.78 hours.

I guess assuming 1 sec per route is too slow for modern processors! If
you are including the database query latency, then again Path Matrix
will make sure we need to query database only once, so the rest of the
work should be fast.

    My use case would be to feed the results into TSP solver to order
    the nodes, then to extract the routes between the ordered nodes. I
    have used simulated annealing to solve TSP very fast.

    http://imaptools.com/cgi-bin/tsp.cgi

    Copy and paste the cities into the text area and compute the TSP.
    This demo application geocodes all the cities and comput the
    straight line distance between them and then solves the TSP problem.
    It would really be grate to compute the network paths before solving
    the TSP.

    I know pgRouting has a TSP solver, but it requires the CGAL package
    I think which is hard to find and build on many systems.

Nice Application! I am not sure what is complexity of your algorithm.
But if you want to compute actual paths before feeding the algo, I guess
the number of nodes in the path will increase by a big factor which will
slow down APSP significantly. (You may just consider Highways to get
around this, could discuss this on a separate topic!)

        If we have vertex ids of whole graph, the path matrix is
        sufficient to
        extract all paths (extracting one path will take O(n) time), and
        I think
        there is no need for additional queries to database.

    I saw the path extraction code in the wiki page , but I need to go
    back over it in more detail.

    Best regards,
      -Steve W

        Please correct me if I am missing something.

            Anyway, these are some things to start a discussion and
        provoke some
            thoughts. I do not want to grow the scope of this project to the
            point that it is problematic, so we should focus on what can
        be done
            and plan for future additions if they are needed.

            I look forward to working with you. Thank you for your
        interest in
            pgRouting.

            Best regards,
              -Steve W
            _______________________________________________
            pgrouting-dev mailing list
        pgrouting-dev@lists.osgeo.org
        <mailto:pgrouting-dev@lists.osgeo.org>
        <mailto:pgrouting-dev@lists.osgeo.org
        <mailto:pgrouting-dev@lists.osgeo.org>>

        http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

        --
        Regards,
        -Jay Mahadeokar

        _______________________________________________
        pgrouting-dev mailing list
        pgrouting-dev@lists.osgeo.org <mailto:pgrouting-dev@lists.osgeo.org>
        http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

    _______________________________________________
    pgrouting-dev mailing list
    pgrouting-dev@lists.osgeo.org <mailto:pgrouting-dev@lists.osgeo.org>
    http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

--
Regards,
-Jay Mahadeokar

_______________________________________________
pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

Hi Jay,

First of all I want to say welcome to the community and that it’s great you want to join development of pgRouting!
To others on the list I want to say, that Jay contacted me already a few weeks ago asking for the possibility and conditions to participate in GSoC 2011 and do some contribution to pgRouting. I really appreciate that Jay contacted us so much in advance and I’m glad that he wants to bring in his skills and contribute to the project.

Well, Anton knows a lot better about the internals how pgRouting works. My comments won’t be very technical.
I mostly follow the discussion with interest and just wanted to add a few remarks:

Distance matrix:

I think the current problem is, that for calculating a distance matrix you would make a huge amount of single pgRouting shortest path queries. Each of these queries would involve to select a bounding box, load the data and find the shortest path. That is not very efficient and takes too much time for many points. Maybe it doesn’t need to be a second per query but still too slow.
With APSP you would just make one bounding box select and load the data to memory once, where you do then the calculation of all paths you need. This should be a lot faster, right? And it would also ensure that all paths are calculated based on the same road network status at a the time of the select by the way.

TSP

TSP currently doesn’t calculate a matrix with real distances, but APSP would make this possible then.
The new DARP solver takes as a third parameter a distance matrix, so it would be another function that would have a huge benefit from a APSP function. I think TSP could be rewritten in a similar way as DARP to be able to chose the way you want to calculate the distance matrix.

I think in general we also need to talk about

  • which parameters we want to pass to the APSP function
  • how the result of the function should look like
    pgRouting’s speciality is that you can make SQL statements be a parameter and also can later post-process your result in a SQL query.

Anton, this is probably the part someone new to PostgreSQL/PostGIS/pgRouting needs some advice.

Hope this makes sense.

Daniel

2010/12/2 Stephen Woodbridge <woodbri@swoodbridge.com>

Jay,

Sounds like you have a good handle on all the pieces to this problem.
What do you need to get started :slight_smile:

-Steve

On 12/1/2010 6:21 PM, Jay Mahadeokar wrote:

When (v^2 log n) > n^2 we can use Warshall algo. Extracting paths will

be complicated task.

I think some thought needs to go into:

  1. what are the results? just a V^2 table of costs?

  2. can we extract the paths? In pgRouting, this has some
    additional
    implications beyond is it physically possible to extract
    paths from
    the algorithm.
    a. typically queries do not store results, they only
    return sets
    b. extracting paths, implies computing assembling a graph,
    computing the results and holding the results for additional
    queries
    against them to extract paths.
    c. while this is potentially possible using temp tables,
    we do not
    currently do that

I did not properly understand point b above.

OK, I will try to explain. Typically the process of running a
pgRouting query follows something along this pseudo code:

  1. select a subset of all node that we think will be enough to solve
    our problem. This is done via a bounding box of the start and end
    nodes inw question and expanded somewhat in case the route wanders
    out of the bounding box.

  2. these nodes and the edges connected to them are built into a
    boost graph

  3. the boost graph is solved

  4. the results are return to the SQL caller as a set of database
    records.

  5. the boost graph is destroyed

So basically one input, one process, one output. Nothing is saved.
If on the other hand we had a process like:

  1. build graph
  2. solve graph
  3. return matrix of distances
  4. hold onto graph because these might be addition requests the need
    the matrix
  5. request path A-B from graph
  6. request path D-H from graph
  7. request …
  8. we are done with the graph so destroy it

Which is my assumption that we would need to do to extract paths,
then we do not have a paradigm for this in SQL.

Now we may not need to do that, but I was assuming that there was
information in the graph that was easy to extract about the paths
that gave the distance returned in the distance matrix.

Hey… Again, during step 3 above, if we build the path matrix along
with the distance matrix for Warshall Algo, we can use that to extract
all paths, we will not need any more queries to database. We also need
not hold on to the graph! And the overall time is still O(n^3) :slight_smile:

I am not sure if the boost implementation of Warshall returns a path
matrix,
http://www.boost.org/doc/libs/1_45_0/libs/graph/doc/floyd_warshall_shortest.html
says that it will just return the distance matrix.

In theory, we can simultaneously create a n^2 path matrix along
with the
distance matrix and use that to extract paths. Pseudo code is
present
here:
http://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm. So,
if we want to build the path matrix, we need to code Warshall from
scratch, using maybe boost data structures.

I guess this is up to the use cases for this and maybe we should ask
others to chime in based on their needs.

I think this really depends on the number of nodes that a user needs
to work with. If you assume asymmetric directed graph, the you need
to compute n^2 distances. for small say n<20 that is 400 routes and
if at 1 sec per route it takes 6.67 minutes; n=100 it takes about
2.78 hours.

I guess assuming 1 sec per route is too slow for modern processors! If
you are including the database query latency, then again Path Matrix
will make sure we need to query database only once, so the rest of the
work should be fast.

My use case would be to feed the results into TSP solver to order
the nodes, then to extract the routes between the ordered nodes. I
have used simulated annealing to solve TSP very fast.

http://imaptools.com/cgi-bin/tsp.cgi

Copy and paste the cities into the text area and compute the TSP.
This demo application geocodes all the cities and comput the
straight line distance between them and then solves the TSP problem.
It would really be grate to compute the network paths before solving
the TSP.

I know pgRouting has a TSP solver, but it requires the CGAL package
I think which is hard to find and build on many systems.

Nice Application! I am not sure what is complexity of your algorithm.
But if you want to compute actual paths before feeding the algo, I guess
the number of nodes in the path will increase by a big factor which will
slow down APSP significantly. (You may just consider Highways to get
around this, could discuss this on a separate topic!)

If we have vertex ids of whole graph, the path matrix is
sufficient to
extract all paths (extracting one path will take O(n) time), and
I think
there is no need for additional queries to database.

I saw the path extraction code in the wiki page , but I need to go
back over it in more detail.

Best regards,
-Steve W

Please correct me if I am missing something.

Anyway, these are some things to start a discussion and
provoke some
thoughts. I do not want to grow the scope of this project to the
point that it is problematic, so we should focus on what can
be done
and plan for future additions if they are needed.

I look forward to working with you. Thank you for your
interest in
pgRouting.

Best regards,
-Steve W


pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
mailto:[pgrouting-dev@lists.osgeo.org](mailto:pgrouting-dev@lists.osgeo.org)
<mailto:pgrouting-dev@lists.osgeo.org
mailto:[pgrouting-dev@lists.osgeo.org](mailto:pgrouting-dev@lists.osgeo.org)>

http://lists.osgeo.org/mailman/listinfo/pgrouting-dev


Regards,
-Jay Mahadeokar


pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org mailto:[pgrouting-dev@lists.osgeo.org](mailto:pgrouting-dev@lists.osgeo.org)
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev


pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org mailto:[pgrouting-dev@lists.osgeo.org](mailto:pgrouting-dev@lists.osgeo.org)
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev


Regards,
-Jay Mahadeokar


pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev


pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev


Georepublic UG & Georepublic Japan
eMail: daniel.kastl@georepublic.de
Web: http://georepublic.de

Hi list,

Daniel is right - we mostly need a distance matrix, so I think there
should be at least an option to return distance matrix only and omit
extra work on extracting paths. Or should it be different function?
It would be nice if extracted paths could be in form of phRouting
'path' type with additional 'from' and 'to' columns.

Anton.

Hi Anton,

I think there will be both cases. DARP/TSP might only need the matrix at first. But when you got your optimized tour result you had to run again the shortest path search to retrieve a path, which causes two problems:

  • The network (conditions) might have changed if you run those path queries later.
  • It might be slower to later run single shortest path queries again independently
    So an option to return only the distances or together with the path would be best in my opinion.

Actually the distance would be just the length of the path, so that looks like “ST_length(path_geom)” to me. No?

Daniel

2010/12/2 Anton Patrushev <anton.patrushev@georepublic.de>

Hi list,

Daniel is right - we mostly need a distance matrix, so I think there
should be at least an option to return distance matrix only and omit
extra work on extracting paths. Or should it be different function?
It would be nice if extracted paths could be in form of phRouting
‘path’ type with additional ‘from’ and ‘to’ columns.

Anton.


pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev


Georepublic UG & Georepublic Japan
eMail: daniel.kastl@georepublic.de
Web: http://georepublic.de

On 12/1/2010 10:50 PM, Anton Patrushev wrote:

Hi list,

Daniel is right - we mostly need a distance matrix, so I think there
should be at least an option to return distance matrix only and omit
extra work on extracting paths. Or should it be different function?
It would be nice if extracted paths could be in form of phRouting
'path' type with additional 'from' and 'to' columns.

Agreed.

You can do a lot in the wrapper functions like joining to other tables or otherwise post processing the results that come back. The key to this is to have the necessary references to the other data.

I think Daniel made an excellent point earlier that I would like to hilite which is that we can build the graph once the process it multiple times in the code before we return the results.

So the process flow might look like this for a SOME PAIRS shortest path

build graph from bounding box
if num pairs is < some limit then
   loop thought pairs and solve shortest path
else
   use APSP algorithm
return results as:
from,to,distance[,array{from,node,node,...,to}]

or something like that.

A lot of good ideas.

-Steve

On 12/1/2010 11:01 PM, Daniel Kastl wrote:

Hi Anton,

I think there will be both cases. DARP/TSP might only need the matrix at
first. But when you got your optimized tour result you had to run again
the shortest path search to retrieve a path, which causes two problems:

    * The network (conditions) might have changed if you run those path
      queries later.

I guess this assumes the you are getting live feed data for traffic or something like that. What are you think here? Also be aware that the algorithms can select any of equal distance alternatives and the later if you single shortest path it might pick a different path, which may or may not be problematic.

    * It might be slower to later run single shortest path queries again
      independently

It would be interesting to enhance our single shortest path to be able to extract multiple single shortest paths based on a single graph build. So input might be something like

id, from, to
1, 27, 42
2. 101, 502
...

and the results would be

id, seq, from, to

So an option to return only the distances or together with the path
would be best in my opinion.
Actually the distance would be just the length of the path, so that
looks like "ST_length(path_geom)" to me. No?

I think we really need to get back a list of node or edges for the path not just a geometry. We can construct the geometry from the node list, but it is much harder to construct the node list from the geometry.

-Steve

Daniel

2010/12/2 Anton Patrushev <anton.patrushev@georepublic.de
<mailto:anton.patrushev@georepublic.de>>

    Hi list,

    Daniel is right - we mostly need a distance matrix, so I think there
    should be at least an option to return distance matrix only and omit
    extra work on extracting paths. Or should it be different function?
    It would be nice if extracted paths could be in form of phRouting
    'path' type with additional 'from' and 'to' columns.

    Anton.
    _______________________________________________
    pgrouting-dev mailing list
    pgrouting-dev@lists.osgeo.org <mailto:pgrouting-dev@lists.osgeo.org>
    http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

--
Georepublic UG & Georepublic Japan
eMail: daniel.kastl@georepublic.de <mailto:daniel.kastl@georepublic.de>
Web: http://georepublic.de/&gt;

_______________________________________________
pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

I think there will be both cases. DARP/TSP might only need the matrix at
first. But when you got your optimized tour result you had to run again
the shortest path search to retrieve a path, which causes two problems:

  • The network (conditions) might have changed if you run those path
    queries later.

I guess this assumes the you are getting live feed data for traffic or something like that. What are you think here? Also be aware that the algorithms can select any of equal distance alternatives and the later if you single shortest path it might pick a different path, which may or may not be problematic.

I actually don’t assume live feed data but the fact that pgRouting keeps the network in a database and it’s possible to change attributes for example, that are part of your cost parameter, at any time. Or maybe road links become unavailable during a long APSP query.

I think the library should ensure that when you make a query it reads the network data available at that time. If the query for processing a huge distance matrix takes one hour for example, but makes steadily new requests to the database, it could happen that some attributes change over time.
Whether this is problematic for an application or not depends on the application. In most cases it probably isn’t, that’s true. But pgRouting isn’t limited to road networks, so who knows what kind of applications will make use of it.

  • It might be slower to later run single shortest path queries again
    independently

It would be interesting to enhance our single shortest path to be able to extract multiple single shortest paths based on a single graph build. So input might be something like

This would be the same as k-Shortest Path, right?
http://www.pgrouting.org/rfc/rfc-04.html

Daniel


Georepublic UG & Georepublic Japan
eMail: daniel.kastl@georepublic.de
Web: http://georepublic.de

On 12/1/2010 11:36 PM, Daniel Kastl wrote:

        I think there will be both cases. DARP/TSP might only need the
        matrix at
        first. But when you got your optimized tour result you had to
        run again
        the shortest path search to retrieve a path, which causes two
        problems:

            * The network (conditions) might have changed if you run
        those path
              queries later.

    I guess this assumes the you are getting live feed data for traffic
    or something like that. What are you think here? Also be aware that
    the algorithms can select any of equal distance alternatives and the
    later if you single shortest path it might pick a different path,
    which may or may not be problematic.

I actually don't assume live feed data but the fact that pgRouting keeps
the network in a database and it's possible to change attributes for
example, that are part of your cost parameter, at any time. Or maybe
road links become unavailable during a long APSP query.

I think the library should ensure that when you make a query it reads
the network data available at that time. If the query for processing a
huge distance matrix takes one hour for example, but makes steadily new
requests to the database, it could happen that some attributes change
over time.
Whether this is problematic for an application or not depends on the
application. In most cases it probably isn't, that's true. But pgRouting
isn't limited to road networks, so who knows what kind of applications
will make use of it.

OK, these are good points. I tend to over look these because I view the graph as relatively static for my purposes, but you are right to not assume that is the case. Thanks.

            * It might be slower to later run single shortest path
        queries again
              independently

    It would be interesting to enhance our single shortest path to be
    able to extract multiple single shortest paths based on a single
    graph build. So input might be something like

This would be the same as k-Shortest Path, right?
http://www.pgrouting.org/rfc/rfc-04.html
<http://www.pgrouting.org/rfc/rfc-04.html&gt;

No, K-shortest paths is more like what google does when it gives you alternative routes. You start and end stay constant but you get the shortest, then the next shortest, etc

I'm thinking of the case for example where you solve a TSP and now you want to extract the shortest paths from A-B, B-C, ... , Z-A and you only want to load and build the graph once but get back all the routes. You could probably think of it as via points. like start-A-B-C-D-end.
You could run 5 separate queries to get all the route segments, or we could have a function that you pass it a set of path control points and it build one graph and resturns all the segments in one call. I thought you were referencing doing this in an earlier email. Regardless this would be much faster than dosing N separate queries and having to build the graph N times.

-Steve

Daniel

--
Georepublic UG & Georepublic Japan
eMail: daniel.kastl@georepublic.de <mailto:daniel.kastl@georepublic.de>
Web: http://georepublic.de/&gt;

_______________________________________________
pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

It would be interesting to enhance our single shortest path to be
able to extract multiple single shortest paths based on a single
graph build. So input might be something like

This would be the same as k-Shortest Path, right?
http://www.pgrouting.org/rfc/rfc-04.html
<http://www.pgrouting.org/rfc/rfc-04.html>

No, K-shortest paths is more like what google does when it gives you alternative routes. You start and end stay constant but you get the shortest, then the next shortest, etc

I’m thinking of the case for example where you solve a TSP and now you want to extract the shortest paths from A-B, B-C, … , Z-A and you only want to load and build the graph once but get back all the routes. You could probably think of it as via points. like start-A-B-C-D-end.
You could run 5 separate queries to get all the route segments, or we could have a function that you pass it a set of path control points and it build one graph and resturns all the segments in one call. I thought you were referencing doing this in an earlier email. Regardless this would be much faster than dosing N separate queries and having to build the graph N times.

You mean you request a route including points in between and return them all at once.
This makes sense, that’s true. I think there are many possibilities for addons. We should add it to the wishlist.

Daniel


Georepublic UG & Georepublic Japan
eMail: daniel.kastl@georepublic.de
Web: http://georepublic.de

On 12/2/2010 12:08 AM, Daniel Kastl wrote:

            It would be interesting to enhance our single shortest path
        to be
            able to extract multiple single shortest paths based on a single
            graph build. So input might be something like

        This would be the same as k-Shortest Path, right?
        http://www.pgrouting.org/rfc/rfc-04.html
        <http://www.pgrouting.org/rfc/rfc-04.html&gt;

    No, K-shortest paths is more like what google does when it gives you
    alternative routes. You start and end stay constant but you get the
    shortest, then the next shortest, etc

    I'm thinking of the case for example where you solve a TSP and now
    you want to extract the shortest paths from A-B, B-C, ... , Z-A and
    you only want to load and build the graph once but get back all the
    routes. You could probably think of it as via points. like
    start-A-B-C-D-end.
    You could run 5 separate queries to get all the route segments, or
    we could have a function that you pass it a set of path control
    points and it build one graph and resturns all the segments in one
    call. I thought you were referencing doing this in an earlier email.
    Regardless this would be much faster than dosing N separate queries
    and having to build the graph N times.

You mean you request a route including points in between and return them
all at once.
This makes sense, that's true. I think there are many possibilities for
addons. We should add it to the wishlist.

Yes, exactly. I'm not sure it requires a whole RFC, but we might want to add a wish list document to the RFC directory that would make it easy to add ideas to the wish list document.

I just looked at the RFCs and notice that we do not have one for the contraction highway algorithm. Did you notice that that was implemented for OSM? MoNav is the project.

-Steve

Let me make some proposal how the APSP function could look like.
(I mostly copy from Dijkstra: http://www.pgrouting.org/docs/1.x/dijkstra.html)


CREATE OR REPLACE FUNCTION apsp(
                                **sql** text,
                                <b>n_**ids**</b> text,
                                **m_ids** text,
                                directed boolean,
                                has_reverse_cost boolean)
        RETURNS SETOF <???>

With sql as …


SELECT id, source, target, cost FROM edge_table

… like for Dijkstra, and n_ids/m_ids as …


SELECT id FROM edge_table

to select the relevant points for the n-m distance matrix.

With a select (instead of a list of ID’s you can also select relevant points by attribute, update timestamp or whatever.

I can also think of an APSP that calculates all distances from 1 point to n points only. That’s why I added n_ids/m_ids.
In that case you could update an existing distance matrix (that you stored in some table for example) if only one or a few points changed.

One big question is what to return as a result.
And another question is if it will be possible to chose between Dijkstra, A-Star or Shooting*.

Daniel

2010/12/2 Daniel Kastl <daniel@georepublic.de>

I think there will be both cases. DARP/TSP might only need the matrix at
first. But when you got your optimized tour result you had to run again
the shortest path search to retrieve a path, which causes two problems:

  • The network (conditions) might have changed if you run those path
    queries later.

I guess this assumes the you are getting live feed data for traffic or something like that. What are you think here? Also be aware that the algorithms can select any of equal distance alternatives and the later if you single shortest path it might pick a different path, which may or may not be problematic.

I actually don’t assume live feed data but the fact that pgRouting keeps the network in a database and it’s possible to change attributes for example, that are part of your cost parameter, at any time. Or maybe road links become unavailable during a long APSP query.

I think the library should ensure that when you make a query it reads the network data available at that time. If the query for processing a huge distance matrix takes one hour for example, but makes steadily new requests to the database, it could happen that some attributes change over time.
Whether this is problematic for an application or not depends on the application. In most cases it probably isn’t, that’s true. But pgRouting isn’t limited to road networks, so who knows what kind of applications will make use of it.

  • It might be slower to later run single shortest path queries again
    independently

It would be interesting to enhance our single shortest path to be able to extract multiple single shortest paths based on a single graph build. So input might be something like

This would be the same as k-Shortest Path, right?
http://www.pgrouting.org/rfc/rfc-04.html

Daniel

Georepublic UG & Georepublic Japan
eMail: daniel.kastl@georepublic.de

Web: http://georepublic.de


Georepublic UG & Georepublic Japan
eMail: daniel.kastl@georepublic.de
Web: http://georepublic.de

Hi Daniel,

1. As I wrote few posts above - I'd like it to return just a set of
'path' rows with additional 'from' and 'to' columns.
2. It uses completely different algorithm, do Dijkstra etc. are irrelevant here.

Anton.

Hi Steve,

I don’t have enough time unfortunately to catch up with the number of ideas … also I’m missing technical expertise for some RFC’s :wink:
That’s why I collected them here as a reminder: https://github.com/pgRouting/website/issues (not sure “website” repository is the right place though)

I heard about some implementation, but don’t really remember to have heard about MoNav.

Daniel

2010/12/2 Stephen Woodbridge <woodbri@swoodbridge.com>

On 12/2/2010 12:08 AM, Daniel Kastl wrote:

It would be interesting to enhance our single shortest path
to be
able to extract multiple single shortest paths based on a single
graph build. So input might be something like

This would be the same as k-Shortest Path, right?
http://www.pgrouting.org/rfc/rfc-04.html
<http://www.pgrouting.org/rfc/rfc-04.html>

No, K-shortest paths is more like what google does when it gives you
alternative routes. You start and end stay constant but you get the
shortest, then the next shortest, etc

I’m thinking of the case for example where you solve a TSP and now
you want to extract the shortest paths from A-B, B-C, … , Z-A and
you only want to load and build the graph once but get back all the
routes. You could probably think of it as via points. like
start-A-B-C-D-end.
You could run 5 separate queries to get all the route segments, or
we could have a function that you pass it a set of path control
points and it build one graph and resturns all the segments in one
call. I thought you were referencing doing this in an earlier email.
Regardless this would be much faster than dosing N separate queries
and having to build the graph N times.

You mean you request a route including points in between and return them
all at once.
This makes sense, that’s true. I think there are many possibilities for
addons. We should add it to the wishlist.

Yes, exactly. I’m not sure it requires a whole RFC, but we might want to add a wish list document to the RFC directory that would make it easy to add ideas to the wish list document.

I just looked at the RFCs and notice that we do not have one for the contraction highway algorithm. Did you notice that that was implemented for OSM? MoNav is the project.

-Steve


pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev


Georepublic UG & Georepublic Japan
eMail: daniel.kastl@georepublic.de
Web: http://georepublic.de

2010/12/2 Anton Patrushev <anton.patrushev@georepublic.de>

Hi Daniel,

  1. As I wrote few posts above - I’d like it to return just a set of
    ‘path’ rows with additional ‘from’ and ‘to’ columns.

Yes, I remember. But it needs some discussion still, right?

  1. It uses completely different algorithm, do Dijkstra etc. are irrelevant here.

Yes, I agree. What I meant was, that we have those two different approaches:

  • Routing from vertex to vertex → Dijkstra, Astar
  • Routing from edge to edge → Shooting*
    Of course it would be nice to also have edge-to-edge routing in APSP if possible. That’s what I wanted to say.

Daniel

Anton.


pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org

http://lists.osgeo.org/mailman/listinfo/pgrouting-dev


Georepublic UG & Georepublic Japan
eMail: daniel.kastl@georepublic.de
Web: http://georepublic.de

On Thu, Dec 2, 2010 at 10:54 AM, Daniel Kastl <daniel@georepublic.de> wrote:

Let me make some proposal how the APSP function could look like.
(I mostly copy from Dijkstra: http://www.pgrouting.org/docs/1.x/dijkstra.html)

CREATE OR REPLACE FUNCTION apsp(
                                **sql** text,
                                <b>n_**ids**</b> text,
                                **m_ids** text,
                                directed boolean,
                                has_reverse_cost boolean)
        RETURNS SETOF <???>

With sql as …

SELECT id, source, target, cost FROM edge_table

… like for Dijkstra, and n_ids/m_ids as …

SELECT id FROM edge_table

to select the relevant points for the n-m distance matrix.

With a select (instead of a list of ID’s you can also select relevant points by attribute, update timestamp or whatever.

I can also think of an APSP that calculates all distances from 1 point to n points only. That’s why I added n_ids/m_ids.
In that case you could update an existing distance matrix (that you stored in some table for example) if only one or a few points changed.

Hey… The dijextras algo should be able to provide solution for single source shortest paths to all other points (1 to n points). The boost doc here: http://www.cs.brown.edu/~jwicks/boost/libs/graph/doc/dijkstra_shortest_paths.html states that: "If you provide a distance property map through the distance_map() parameter then the shortest distance from the source vertex to every other vertex in the graph will be recorded in the distance map. Also you can record the shortest paths tree in a predecessor map: for each vertex u in V, p[u] will be the predecessor of u in the shortest paths tree "

So, I think there is no need for doing APSP / Warshall O(n^3) time for this purpose, Dijextra will do that in O(m+nlog n). I am not fully aware of the pgRouting dijextra implementation, does it provide above functionality?

One more thing: Can we have a wiki page or something like that where we can put up our ideas, and we can keep on modifying the draft as the ideas are refined? I guess there are too many ideas in this thread and I am finding some difficulty to keep track of all of them all… Just a suggestion :slight_smile:

I think the library should ensure that when you make a query it reads the network data available at that time. If the query for processing a huge distance matrix takes one hour for example, but makes steadily new requests to the database, it could happen that some attributes change over time.
Whether this is problematic for an application or not depends on the application. In most cases it probably isn’t, that’s true. But pgRouting isn’t limited to road networks, so who knows what kind of applications will make use of it.

  • It might be slower to later run single shortest path queries again
    independently

It would be interesting to enhance our single shortest path to be able to extract multiple single shortest paths based on a single graph build. So input might be something like

This would be the same as k-Shortest Path, right?
http://www.pgrouting.org/rfc/rfc-04.html

Daniel

Georepublic UG & Georepublic Japan
eMail: daniel.kastl@georepublic.de

Web: http://georepublic.de


Georepublic UG & Georepublic Japan
eMail: daniel.kastl@georepublic.de
Web: http://georepublic.de


pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev


Regards,
-Jay Mahadeokar