[pgrouting-dev] Non-scheduled Routing algorithm

Kishore_Kumar · June 8, 2011, 8:21am

Hi,

I was working on solving the Non-scheduled Routing problem. I made notes in the GSoC spiral book and attached them here.

Non-scheduled Routing - It is primarily to be used for public transport routing where the absolute timetable of trips is not available or when the vehicles themselves don’t follow any schedule.

Type of Output:
There are two kinds of outputs I can imagine.
(a). List the Changeovers[1] that have maximum frequency and minimize the total travel time and then find Route numbers serving from one Changeover to another. Eg: http://busroutes.in/chennai/path/1859/1976/
(b). Find and return a single sequence of routes corresponding to least total travel time.

While output (a) gives more route numbers and calculates and dilutes the expected total travel time, output (b) gives more accurate total travel time.
The problem with output (b) is that it ignores any alternate routes serving any successive changeovers and only takes the one with least travel time.
Hence, I decided to go with output (a).

To get a list of changeovers, two graphs should be generated from GTFS input:

Frequency Graph - effective frequency between any two stops
Travel Time Graph - expected travel time between any two stops

And, the algorithm to find the changeovers:

Initially, store the direct travel time from source to destination as cutoff.
Explore the state-space of stops by exploring stops with small travel time first.
Update cutoff value as the shortest travel time to destination.
Prune the branches at cutoff.
After all branches are pruned, return the path corresponding to the cutoff travel time as the sequence of changeovers.

After the sequence of changeovers is calculated, it is trivial to get the list of routes that serve between them.

Performance:
Let n be the number of stops and m be the number of trips,

Generating Frequency Graph and Travel Time Graph from the GTFS input have an ϑ(n^2) space complexity and ϑ(n^2 * m) time complexity.
Finding the list of changeovers has an O(n!) time complexity[worst case] and o(n) in the best case.

I am no algo expert and cannot think of a better performing algorithm. Thoughts on this?

Thanks & Regards,
J Kishore kumar.

[1] - Changeover is an intermediate stop where the passenger gets down from one vehicle and boards another vehicle bound to a different route.

non-scheduled_routing.pdf (128 KB)

Kishore_Kumar · June 8, 2011, 8:24am

Mailman ate my Richtext message again.

On Wed, Jun 8, 2011 at 1:51 PM, Kishore Kumar <justjkk@gmail.com> wrote:

Hi,

I was working on solving the Non-scheduled Routing problem. I made
notes in the GSoC spiral book and attached them here.

Non-scheduled Routing - It is primarily to be used for public
transport routing where the absolute timetable of trips is not
available or when the vehicles themselves don't follow any schedule.

Type of Output:

There are two kinds of outputs I can imagine.

(a). List the Changeovers[1] that have maximum frequency and minimize
the total travel time and then find Route numbers serving from one
Changeover to another. Eg: http://busroutes.in/chennai/path/1859/1976/

(b). Find and return a single sequence of routes corresponding to
least total travel time.

While output (a) gives more route numbers and calculates and dilutes
the expected total travel time, output (b) gives more accurate total
travel time.

The problem with output (b) is that it ignores any alternate routes
serving any successive changeovers and only takes the one with least
travel time.

Hence, I decided to go with output (a).

To get a list of changeovers, two graphs should be generated from GTFS input:

* Frequency Graph - effective frequency between any two stops

* Travel Time Graph - expected travel time between any two stops

And, the algorithm to find the changeovers:

* Initially, store the direct travel time from source to destination as cutoff.

* Explore the state-space of stops by exploring stops with small
travel time first.

* Update cutoff value as the shortest travel time to destination.

* Prune the branches at cutoff.

* After all branches are pruned, return the path corresponding to the
cutoff travel time as the sequence of changeovers.

After the sequence of changeovers is calculated, it is trivial to get
the list of routes that serve between them.

Performance:

Let n be the number of stops and m be the number of trips,

* Generating Frequency Graph and Travel Time Graph from the GTFS input
have an ϑ(n^2) space complexity and ϑ(n^2 * m) time complexity.

* Finding the list of changeovers has an O(n!) time complexity[worst
case] and o(n) in the best case.

I am no algo expert and cannot think of a better performing algorithm.
Thoughts on this?

Thanks & Regards,

J Kishore kumar.

[1] - Changeover is an intermediate stop where the passenger gets down
from one vehicle and boards another vehicle bound to a different
route.

Jay_Mahadeokar · June 8, 2011, 10:28am

Hi Kishore,

My reply inline:

On Wed, Jun 8, 2011 at 1:51 PM, Kishore Kumar <justjkk@gmail.com> wrote:

Hi,

I was working on solving the Non-scheduled Routing problem. I made notes in the GSoC spiral book and attached them here.

Non-scheduled Routing - It is primarily to be used for public transport routing where the absolute timetable of trips is not available or when the vehicles themselves don’t follow any schedule.

Type of Output:
There are two kinds of outputs I can imagine.
(a). List the Changeovers[1] that have maximum frequency and minimize the total travel time and then find Route numbers serving from one Changeover to another. Eg: http://busroutes.in/chennai/path/1859/1976/
(b). Find and return a single sequence of routes corresponding to least total travel time.

While output (a) gives more route numbers and calculates and dilutes the expected total travel time, output (b) gives more accurate total travel time.
The problem with output (b) is that it ignores any alternate routes serving any successive changeovers and only takes the one with least travel time.
Hence, I decided to go with output (a).

To get a list of changeovers, two graphs should be generated from GTFS input:

Frequency Graph - effective frequency between any two stops

Travel Time Graph - expected travel time between any two stops

And, the algorithm to find the changeovers:

Initially, store the direct travel time from source to destination as cutoff.

Explore the state-space of stops by exploring stops with small travel time first.

Update cutoff value as the shortest travel time to destination.

Prune the branches at cutoff.

After all branches are pruned, return the path corresponding to the cutoff travel time as the sequence of changeovers.

Let me rephrase the problem. Please tell me if I understood it correctly…

So, the input to your algorithm is a graph, where nodes are the stops, and edges are the travel times between stops. The edges can be only traversed (i.e they actually exist) at certain time instances.

Ex - If frequency of edge u-> v is 2 hours. One departure was at 12:00 PM. Then if you arrive at u at 12:30 PM, then your changeover time at u will be 1:30 mins. Since, next departure is only possible at 2 pm.

So, I think this can be again reduced to a variant time-dependent dijkstra algorithm.

The only difference being - your travel_time is now constant for each edge, But your change_over time depends on the time at which your dijkstra search starts. So, the algorithm should again take O(m + n log n) time. You need to modify dijkstra to suit your need.

I hope I am not missing any point here. Please correct me if I am wrong. Also any clarification needed?

After the sequence of changeovers is calculated, it is trivial to get the list of routes that serve between them.

Performance:
Let n be the number of stops and m be the number of trips,

Generating Frequency Graph and Travel Time Graph from the GTFS input have an ϑ(n^2) space complexity and ϑ(n^2 * m) time complexity.

Finding the list of changeovers has an O(n!) time complexity[worst case] and o(n) in the best case.

I am no algo expert and cannot think of a better performing algorithm. Thoughts on this?

Thanks & Regards,
J Kishore kumar.

[1] - Changeover is an intermediate stop where the passenger gets down from one vehicle and boards another vehicle bound to a different route.

pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

–
Regards,
-Jay Mahadeokar

Kishore_Kumar · June 8, 2011, 1:19pm

Jay,
Thanks for commenting on this issue. Inline response below:

On Wed, Jun 8, 2011 at 3:58 PM, Jay Mahadeokar <jai.mahadeokar@gmail.com> wrote:

Ex - If frequency of edge u-> v is 2 hours. One departure was at 12:00 PM.
Then if you arrive at u at 12:30 PM, then your changeover time at u will be
1:30 mins. Since, next departure is only possible at 2 pm.

Not really. The data we are dealing with is non-scheduled trips data.
So, we cannot say the changeover time as 1:30 mins. Instead, consider
the service follows a Uniform Distribution and get the mean waiting
time as half of the headway seconds(i.e, Changeover time penalty = 1
hour). I'm also solving scheduled routing problem(easier than this)
but that is 2 weeks later.

So, I think this can be again reduced to a variant time-dependent dijkstra
algorithm.

The only difference being - your travel_time is now constant for each edge,
But your change_over time depends on the time at which your dijkstra search
starts. So, the algorithm should again take O(m + n log n) time. You need to
modify dijkstra to suit your need.

Again, this algorithm is not time-dependent(or I don't see it). So, a
non-scheduled routing query returns the same result independent of
whether the time of the day is 5am or 9am.

Let me try implement the algorithm tonight and get back if I encounter
a deadend.

Thanks,
J Kishore kumar.

Stephen_Woodbridge1 · June 8, 2011, 1:45pm

Hi Kishore,

What is a real life example of a non-scheduled trips?
Is this for example bus lines that run on a schedule, but you are trying to compute the best route and change overs independent of knowing a start time?

If so, would you want to deal with the fact the day time schedules might run at a higher frequency the say night time schedules or say rush hour schedules. I this case you might want more information about when this non-scheduled trip will take place so you can use the correct frequency tables.

-Steve

On 6/8/2011 9:19 AM, Kishore Kumar wrote:

Jay,
Thanks for commenting on this issue. Inline response below:

On Wed, Jun 8, 2011 at 3:58 PM, Jay Mahadeokar<jai.mahadeokar@gmail.com> wrote:

Ex - If frequency of edge u-> v is 2 hours. One departure was at 12:00 PM.
Then if you arrive at u at 12:30 PM, then your changeover time at u will be
1:30 mins. Since, next departure is only possible at 2 pm.

Not really. The data we are dealing with is non-scheduled trips data.
So, we cannot say the changeover time as 1:30 mins. Instead, consider
the service follows a Uniform Distribution and get the mean waiting
time as half of the headway seconds(i.e, Changeover time penalty = 1
hour). I'm also solving scheduled routing problem(easier than this)
but that is 2 weeks later.

So, I think this can be again reduced to a variant time-dependent dijkstra
algorithm.

The only difference being - your travel_time is now constant for each edge,
But your change_over time depends on the time at which your dijkstra search
starts. So, the algorithm should again take O(m + n log n) time. You need to
modify dijkstra to suit your need.

Again, this algorithm is not time-dependent(or I don't see it). So, a
non-scheduled routing query returns the same result independent of
whether the time of the day is 5am or 9am.

Let me try implement the algorithm tonight and get back if I encounter
a deadend.

Thanks,
J Kishore kumar.
_______________________________________________
pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

Jay_Mahadeokar · June 8, 2011, 4:16pm

On Wed, Jun 8, 2011 at 6:49 PM, Kishore Kumar <justjkk@gmail.com> wrote:

Jay,
Thanks for commenting on this issue. Inline response below:

On Wed, Jun 8, 2011 at 3:58 PM, Jay Mahadeokar <jai.mahadeokar@gmail.com> wrote:

Ex - If frequency of edge u-> v is 2 hours. One departure was at 12:00 PM.
Then if you arrive at u at 12:30 PM, then your changeover time at u will be
1:30 mins. Since, next departure is only possible at 2 pm.

Not really. The data we are dealing with is non-scheduled trips data.
So, we cannot say the changeover time as 1:30 mins. Instead, consider
the service follows a Uniform Distribution and get the mean waiting
time as half of the headway seconds(i.e, Changeover time penalty = 1
hour). I’m also solving scheduled routing problem(easier than this)
but that is 2 weeks later.

Hi Kishore,

Seems that I am not able to understand the problem statement clearly. Can you explain it in more detail? Maybe along with Steve’s query.

I am quite curious since you mentioned the worst-case complexity as O(n!). Almost all problems that exist today which are treated as hardest (NP-hard problems) can be solved in exponential time worst case i.e O(c^n). (Sorry for mentioning too much theory here )
Even this exponential bound is better than the factorial time bound! [1]. So, I have a feeling that the problem you are dealing with can be solved in better worst case bound.

Maybe you can give one example of input and the desired output which can make things clearer.

[1] http://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functions

So, I think this can be again reduced to a variant time-dependent dijkstra
algorithm.

The only difference being - your travel_time is now constant for each edge,
But your change_over time depends on the time at which your dijkstra search
starts. So, the algorithm should again take O(m + n log n) time. You need to
modify dijkstra to suit your need.

Again, this algorithm is not time-dependent(or I don’t see it). So, a
non-scheduled routing query returns the same result independent of
whether the time of the day is 5am or 9am.

Let me try implement the algorithm tonight and get back if I encounter
a deadend.

Thanks,
J Kishore kumar.

pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

–
Regards,
-Jay Mahadeokar

Kishore_Kumar · June 11, 2011, 10:17am

Hi,

Sorry to respond late. Non-scheduled trips are trips that have no
strict timetable. For eg, In India, most of the bus transport
authorities follow no timetable for trips. The time at which a bus
starts is at the sole discretion of the Driver but the Agency dictates
how many trips have to made per day. Thus, we have frequency of trips
but no absolute arrival and departure times.

Peak and Slack frequency affects all routes similarly. So, I guess
peak hour routing and slack hour routing will return almost the same
result. Though, it is possible to accept a time window as a parameter
and give the route corresponding to appropriate frequency. But, that
is more complex.

Thanks & Regards,
J Kishore kumar.

On Wed, Jun 8, 2011 at 7:15 PM, Stephen Woodbridge
<woodbri@swoodbridge.com> wrote:

Hi Kishore,

What is a real life example of a non-scheduled trips?
Is this for example bus lines that run on a schedule, but you are trying to
compute the best route and change overs independent of knowing a start time?

If so, would you want to deal with the fact the day time schedules might run
at a higher frequency the say night time schedules or say rush hour
schedules. I this case you might want more information about when this
non-scheduled trip will take place so you can use the correct frequency
tables.

-Steve

On 6/8/2011 9:19 AM, Kishore Kumar wrote:

Jay,
Thanks for commenting on this issue. Inline response below:

On Wed, Jun 8, 2011 at 3:58 PM, Jay Mahadeokar<jai.mahadeokar@gmail.com>
wrote:

Ex - If frequency of edge u-> v is 2 hours. One departure was at 12:00
PM.
Then if you arrive at u at 12:30 PM, then your changeover time at u will
be
1:30 mins. Since, next departure is only possible at 2 pm.

Not really. The data we are dealing with is non-scheduled trips data.
So, we cannot say the changeover time as 1:30 mins. Instead, consider
the service follows a Uniform Distribution and get the mean waiting
time as half of the headway seconds(i.e, Changeover time penalty = 1
hour). I'm also solving scheduled routing problem(easier than this)
but that is 2 weeks later.

So, I think this can be again reduced to a variant time-dependent
dijkstra
algorithm.

The only difference being - your travel_time is now constant for each
edge,
But your change_over time depends on the time at which your dijkstra
search
starts. So, the algorithm should again take O(m + n log n) time. You need
to
modify dijkstra to suit your need.

Again, this algorithm is not time-dependent(or I don't see it). So, a
non-scheduled routing query returns the same result independent of
whether the time of the day is 5am or 9am.

Let me try implement the algorithm tonight and get back if I encounter
a deadend.

Thanks,
J Kishore kumar.
_______________________________________________
pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

_______________________________________________
pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

Kishore_Kumar · June 11, 2011, 10:24am

Sorry I have calculated the complexity wrongly. And, since the
existing Dijkstra is sufficient, I linked to it here:
https://github.com/pgRouting/pgrouting/blob/gsoc-multimodal/extra/transit/sql/nonscheduled.sql#L42

There's a test case that passes the current implementation:
https://github.com/pgRouting/pgrouting/blob/gsoc-multimodal/tests/test_transit.py#L28

Thanks & Regards,
J Kishore kumar.

On Wed, Jun 8, 2011 at 9:46 PM, Jay Mahadeokar <jai.mahadeokar@gmail.com> wrote:

On Wed, Jun 8, 2011 at 6:49 PM, Kishore Kumar <justjkk@gmail.com> wrote:

Jay,
Thanks for commenting on this issue. Inline response below:

On Wed, Jun 8, 2011 at 3:58 PM, Jay Mahadeokar <jai.mahadeokar@gmail.com>
wrote:
>
> Ex - If frequency of edge u-> v is 2 hours. One departure was at 12:00
> PM.
> Then if you arrive at u at 12:30 PM, then your changeover time at u will
> be
> 1:30 mins. Since, next departure is only possible at 2 pm.

Not really. The data we are dealing with is non-scheduled trips data.
So, we cannot say the changeover time as 1:30 mins. Instead, consider
the service follows a Uniform Distribution and get the mean waiting
time as half of the headway seconds(i.e, Changeover time penalty = 1
hour). I'm also solving scheduled routing problem(easier than this)
but that is 2 weeks later.

Hi Kishore,

Seems that I am not able to understand the problem statement clearly. Can
you explain it in more detail? Maybe along with Steve's query.

I am quite curious since you mentioned the worst-case complexity as O(n!).
Almost all problems that exist today which are treated as hardest (NP-hard
problems) can be solved in exponential time worst case i.e O(c^n). (Sorry
for mentioning too much theory here )
Even this exponential bound is better than the factorial time bound! [1].
So, I have a feeling that the problem you are dealing with can be solved in
better worst case bound.

Maybe you can give one example of input and the desired output which can
make things clearer.

[1] http://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functions

> So, I think this can be again reduced to a variant time-dependent
> dijkstra
> algorithm.
>
> The only difference being - your travel_time is now constant for each
> edge,
> But your change_over time depends on the time at which your dijkstra
> search
> starts. So, the algorithm should again take O(m + n log n) time. You
> need to
> modify dijkstra to suit your need.
>

Again, this algorithm is not time-dependent(or I don't see it). So, a
non-scheduled routing query returns the same result independent of
whether the time of the day is 5am or 9am.

Let me try implement the algorithm tonight and get back if I encounter
a deadend.

Thanks,
J Kishore kumar.
_______________________________________________
pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

--
Regards,
-Jay Mahadeokar

_______________________________________________
pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

Anton_Patrushev · June 12, 2011, 8:00am

Hi Kishore,

Ah, now I got it. Well, here in Russia some bus routes also don't have
fixed schedule and are regulated by number of buses/trips. Thus I
think the idea makes sense.

Anton.

Kishore_Kumar · June 13, 2011, 5:28pm

Hi,

I am facing the following issues in the prototype code[1]:

1. As I am internally calling Dijkstra function and the Dijkstra
implementation assumes that vertex id and edge id are integers but
according to GTFS, stop id and route id can be text also. This can be
solved by:
a. Generating a new table that maps Auto-generated integers to Stop
Ids(Map<Integer,Text>)
b. Modifying Dijkstra implementation to accept vertex id of type text

2. Since the non-scheduling algorithm is not directly doing any lower
level calculations, I implemented the prototype using PL/pgSQL
language. Is rewriting the same logic in c is required and does it
provide any significant performance boost? According to a post in
pgsql-performance list[2], "I wouldn't expect the queries called by
the pl/pgsql function to be much faster if called through SPI from C
instead.". Hence, my current thought is to give real-life data as
input and check for performance problems and if performance is bad,
write the code in c.

Thanks & Regards,
J Kishore kumar.

[1] - https://github.com/pgRouting/pgrouting/blob/gsoc-multimodal/extra/transit/sql/nonscheduled.sql
[2] - http://archives.postgresql.org/pgsql-performance/2010-05/msg00199.php

On Sun, Jun 12, 2011 at 1:30 PM, Anton Patrushev
<anton.patrushev@georepublic.de> wrote:

Hi Kishore,

Ah, now I got it. Well, here in Russia some bus routes also don't have
fixed schedule and are regulated by number of buses/trips. Thus I
think the idea makes sense.

Anton.
_______________________________________________
pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

Stephen_Woodbridge1 · June 13, 2011, 7:25pm

On 6/13/2011 1:28 PM, Kishore Kumar wrote:

Hi,

I am facing the following issues in the prototype code[1]:

1. As I am internally calling Dijkstra function and the Dijkstra
implementation assumes that vertex id and edge id are integers but
according to GTFS, stop id and route id can be text also. This can be
solved by:
a. Generating a new table that maps Auto-generated integers to Stop
Ids(Map<Integer,Text>)
b. Modifying Dijkstra implementation to accept vertex id of type text

I think I would do a. above. The reason is text id's are really for people and and integer id's computers like. And it is likely that doing b., you would need to remap the text labels to integer anyway for performance reasons.

2. Since the non-scheduling algorithm is not directly doing any lower
level calculations, I implemented the prototype using PL/pgSQL
language. Is rewriting the same logic in c is required and does it
provide any significant performance boost? According to a post in
pgsql-performance list[2], "I wouldn't expect the queries called by
the pl/pgsql function to be much faster if called through SPI from C
instead.". Hence, my current thought is to give real-life data as
input and check for performance problems and if performance is bad,
write the code in c.

I do not have direct test experience to back this up, but assuming pgsql as a language is implemented similar to Perl or python the standard is to compile the source to bytecode and then run a bytecode interpreter. This gives very good perfomance when 90% of the time is spend in library functions that really are compiled C/C++ and called from the interpreter. Where you start to run into performance problems is when you have a lot of code or a lot of looping and evaluating expressions. That said, the rule for optimizing code is to measure first and then remove the worse bottle neck, and then repeat until you are happy with the results. So I see no problem with a prototype in pgsql, and later it or part of it can be rewritten to C if there are performance issues.

Hope this helps,
-Steve

Thanks& Regards,
J Kishore kumar.

[1] - https://github.com/pgRouting/pgrouting/blob/gsoc-multimodal/extra/transit/sql/nonscheduled.sql
[2] - http://archives.postgresql.org/pgsql-performance/2010-05/msg00199.php

On Sun, Jun 12, 2011 at 1:30 PM, Anton Patrushev
<anton.patrushev@georepublic.de> wrote:

Hi Kishore,

Ah, now I got it. Well, here in Russia some bus routes also don't have
fixed schedule and are regulated by number of buses/trips. Thus I
think the idea makes sense.

Anton.
_______________________________________________
pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

_______________________________________________
pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

Anton_Patrushev · June 14, 2011, 1:16am

Hi Kishore,

I agree with Steve - mapping text vertex labels to integers makes more sense.
Same for Pl/pgSQL performance - I think it is OK to have it done like
this. The problems with Pl/pgSQL is its language limitation which
limits us in programming techniques, but it is OK as long as you can
solve your problem with what you have.

Anton.

On 6/14/11, Stephen Woodbridge <woodbri@swoodbridge.com> wrote:

On 6/13/2011 1:28 PM, Kishore Kumar wrote:

Hi,

I am facing the following issues in the prototype code[1]:

1. As I am internally calling Dijkstra function and the Dijkstra
implementation assumes that vertex id and edge id are integers but
according to GTFS, stop id and route id can be text also. This can be
solved by:
  a. Generating a new table that maps Auto-generated integers to Stop
Ids(Map<Integer,Text>)
  b. Modifying Dijkstra implementation to accept vertex id of type text

I think I would do a. above. The reason is text id's are really for
people and and integer id's computers like. And it is likely that doing
b., you would need to remap the text labels to integer anyway for
performance reasons.

2. Since the non-scheduling algorithm is not directly doing any lower
level calculations, I implemented the prototype using PL/pgSQL
language. Is rewriting the same logic in c is required and does it
provide any significant performance boost? According to a post in
pgsql-performance list[2], "I wouldn't expect the queries called by
the pl/pgsql function to be much faster if called through SPI from C
instead.". Hence, my current thought is to give real-life data as
input and check for performance problems and if performance is bad,
write the code in c.

I do not have direct test experience to back this up, but assuming pgsql
as a language is implemented similar to Perl or python the standard is
to compile the source to bytecode and then run a bytecode interpreter.
This gives very good perfomance when 90% of the time is spend in library
functions that really are compiled C/C++ and called from the
interpreter. Where you start to run into performance problems is when
you have a lot of code or a lot of looping and evaluating expressions.
That said, the rule for optimizing code is to measure first and then
remove the worse bottle neck, and then repeat until you are happy with
the results. So I see no problem with a prototype in pgsql, and later it
or part of it can be rewritten to C if there are performance issues.

Hope this helps,
   -Steve

Thanks& Regards,
J Kishore kumar.

[1] -
https://github.com/pgRouting/pgrouting/blob/gsoc-multimodal/extra/transit/sql/nonscheduled.sql
[2] -
http://archives.postgresql.org/pgsql-performance/2010-05/msg00199.php

On Sun, Jun 12, 2011 at 1:30 PM, Anton Patrushev
<anton.patrushev@georepublic.de> wrote:

Hi Kishore,

Ah, now I got it. Well, here in Russia some bus routes also don't have
fixed schedule and are regulated by number of buses/trips. Thus I
think the idea makes sense.

Anton.
_______________________________________________
pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

_______________________________________________
pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

_______________________________________________
pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

--
Georepublic UG (haftungsbeschränkt)
Salzmannstraße 44,
81739 München, Germany

Anton Patrushev
CTO

eMail: anton.patrushev@georepublic.de
Web: http://georepublic.de

Tel: +49 (089) 420 959 519
Sip: 1959519@sipgate.de

Commercial register: Amtsgericht München, HRB 181428
CEO: Daniel Kastl

Stephen_Woodbridge1 · June 14, 2011, 3:17am

On 6/13/2011 9:16 PM, Anton Patrushev wrote:

Hi Kishore,

I agree with Steve - mapping text vertex labels to integers makes more sense.
Same for Pl/pgSQL performance - I think it is OK to have it done like
this. The problems with Pl/pgSQL is its language limitation which
limits us in programming techniques, but it is OK as long as you can
solve your problem with what you have.

Yeah, this is a good point, you are fairly limited in the ability to create rich structures unless you can present them as records. So building trees, and pointers to objects can be done, but it is not easy and this is where you might run into performance problems.

Since you have already implemented your prototype in pgsql, then it obviously has what you need. It will be interesting to see how it preforms when given a real world example.

-Steve

Anton.

On 6/14/11, Stephen Woodbridge<woodbri@swoodbridge.com> wrote:

On 6/13/2011 1:28 PM, Kishore Kumar wrote:

Hi,

I am facing the following issues in the prototype code[1]:

1. As I am internally calling Dijkstra function and the Dijkstra
implementation assumes that vertex id and edge id are integers but
according to GTFS, stop id and route id can be text also. This can be
solved by:
   a. Generating a new table that maps Auto-generated integers to Stop
Ids(Map<Integer,Text>)
   b. Modifying Dijkstra implementation to accept vertex id of type text

I think I would do a. above. The reason is text id's are really for
people and and integer id's computers like. And it is likely that doing
b., you would need to remap the text labels to integer anyway for
performance reasons.

2. Since the non-scheduling algorithm is not directly doing any lower
level calculations, I implemented the prototype using PL/pgSQL
language. Is rewriting the same logic in c is required and does it
provide any significant performance boost? According to a post in
pgsql-performance list[2], "I wouldn't expect the queries called by
the pl/pgsql function to be much faster if called through SPI from C
instead.". Hence, my current thought is to give real-life data as
input and check for performance problems and if performance is bad,
write the code in c.

I do not have direct test experience to back this up, but assuming pgsql
as a language is implemented similar to Perl or python the standard is
to compile the source to bytecode and then run a bytecode interpreter.
This gives very good perfomance when 90% of the time is spend in library
functions that really are compiled C/C++ and called from the
interpreter. Where you start to run into performance problems is when
you have a lot of code or a lot of looping and evaluating expressions.
That said, the rule for optimizing code is to measure first and then
remove the worse bottle neck, and then repeat until you are happy with
the results. So I see no problem with a prototype in pgsql, and later it
or part of it can be rewritten to C if there are performance issues.

Hope this helps,
    -Steve

Thanks& Regards,
J Kishore kumar.

[1] -
https://github.com/pgRouting/pgrouting/blob/gsoc-multimodal/extra/transit/sql/nonscheduled.sql
[2] -
http://archives.postgresql.org/pgsql-performance/2010-05/msg00199.php

On Sun, Jun 12, 2011 at 1:30 PM, Anton Patrushev
<anton.patrushev@georepublic.de> wrote:

Hi Kishore,

Ah, now I got it. Well, here in Russia some bus routes also don't have
fixed schedule and are regulated by number of buses/trips. Thus I
think the idea makes sense.

Anton.
_______________________________________________
pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

_______________________________________________
pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

_______________________________________________
pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev