[pgrouting-dev] GSoC partitioning project

Stephen_Woodbridge1 · July 5, 2013, 2:19pm

Mukul asks:

can you help me coming up with basic class skeletons for retrieving
partitions using an index key and for keeping a track of loaded
partitions as well .

The problem:

* we have an incomplete graph to solve
* we need to find a node in the graph but it is not in the graph
* we call a method graph::LoadPartition(partition_id) to load the partition with that node
* if this fails then the route fails and we clean up, and throw an error
* otherwise we continue solving the graph until we need to load another partition

This assumes it is cheap to find a node or to find that it is missing.
But the question is how do we define "missing".

A node has two pieces of information, the node_id and the partition_id. We can can make some assumptions here that if both ends of an edge are in the same partition then we will never find it in the graph unless the partition has been loaded. So edges that cross partition boundaries are the ones that concern us because the adjacent partition may or may not have been loaded.

So we might define an edge like:

eid -- edge id
sid -- source node id
spid -- source partition id
tid -- target node id
tpid -- target partition id
cost
rcost
...

We could set both spid and tpid to zero if the edge is not a boundary edge. So a simple check if either of these is zero then we know both are loaded. If they are not zero then we need to check if the partition is loaded and if not call graph::LoadPartiton(partition_id).

I would recommend maintaining a bit array where the partition_id >0 is the index into the bit array.

How does graph::LoadPartiton(partition_id) work?

After we partition the data we have a tables like:

CREATE TABLE grid_vertices
(
   id integer NOT NULL, -- node id
   pid integer, -- partition id
   the_geom geometry,
   quadkey text,
   CONSTRAINT grid_vertices_pkey PRIMARY KEY (id)
);

The following is the edge table and it can have other columns as need or it can be a view with the source, target, sgid, and tgid columns joined from another table.

CREATE TABLE grid
(
   gid serial NOT NULL,
   source integer, -- source node_id
   target integer, -- target node_id
   cost double precision,
   sgid integer, -- source grid_id (partition_id)
   tgid integer, -- target grid_id (partition_id)
   CONSTRAINT grid_pkey PRIMARY KEY (gid)
);

So the LoadPartition() method will call a C function that will run a query using the postgresql SPI functions, build a struct array and return the array pointer to the method that will load the array into the graph and update the bit array that tracks the loaded partitions. Then free the struct array and return ok or fail.

The C code will execute a query to select all the edges that have at least one end in that partition:

    select * from grid where sgid=<pid>
    union
    select * from grid where tgid=<pid>

both sgid and tgid columns should be indexed on the table.

The struct will be something like:

typedef struct partition_t {
   integer gid;
   integer source;
   integer spart;
   integer target;
   integer tpart
   double precision cost;
   double precision rcost;
   ...
}

You will need to return the edge count for the partition_t array or add a dummy record with gid = -1 to signal the end of the array.

When the method loads the edges, if will zero the partition_ids if both are the same. It will need to check if they are different, and check if the other partition is already load if that edge as already been loaded into the graph to avoid duplication of the edges for these boundary edges.

Does this cover everything?
Do you need some help with the C code? and processing the SQL query?

-Steve

Mukul_priya · July 5, 2013, 5:00pm

The C code will execute a query to select all the edges that have at least one end in that partition:

do we need to retrieve all the edges that are lying totally or partially in that partition or even if we retrieve only those edges that are directly connected to the source node that is load only those edges that have one of its node as source node.

···

On Fri, Jul 5, 2013 at 7:49 PM, Stephen Woodbridge <woodbri@swoodbridge.com> wrote:

Mukul asks:

can you help me coming up with basic class skeletons for retrieving
partitions using an index key and for keeping a track of loaded
partitions as well .

The problem:

we have an incomplete graph to solve

we need to find a node in the graph but it is not in the graph

we call a method graph::LoadPartition(partition_id) to load the partition with that node

if this fails then the route fails and we clean up, and throw an error

otherwise we continue solving the graph until we need to load another partition

This assumes it is cheap to find a node or to find that it is missing.
But the question is how do we define “missing”.

A node has two pieces of information, the node_id and the partition_id. We can can make some assumptions here that if both ends of an edge are in the same partition then we will never find it in the graph unless the partition has been loaded. So edges that cross partition boundaries are the ones that concern us because the adjacent partition may or may not have been loaded.

So we might define an edge like:

eid – edge id
sid – source node id
spid – source partition id
tid – target node id
tpid – target partition id
cost
rcost
…

We could set both spid and tpid to zero if the edge is not a boundary edge. So a simple check if either of these is zero then we know both are loaded. If they are not zero then we need to check if the partition is loaded and if not call graph::LoadPartiton(partition_id).

I would recommend maintaining a bit array where the partition_id >0 is the index into the bit array.

How does graph::LoadPartiton(partition_id) work?

After we partition the data we have a tables like:

CREATE TABLE grid_vertices
(
id integer NOT NULL, – node id
pid integer, – partition id
the_geom geometry,
quadkey text,
CONSTRAINT grid_vertices_pkey PRIMARY KEY (id)
);

The following is the edge table and it can have other columns as need or it can be a view with the source, target, sgid, and tgid columns joined from another table.

CREATE TABLE grid
(
gid serial NOT NULL,
source integer, – source node_id
target integer, – target node_id
cost double precision,
sgid integer, – source grid_id (partition_id)
tgid integer, – target grid_id (partition_id)
CONSTRAINT grid_pkey PRIMARY KEY (gid)
);

So the LoadPartition() method will call a C function that will run a query using the postgresql SPI functions, build a struct array and return the array pointer to the method that will load the array into the graph and update the bit array that tracks the loaded partitions. Then free the struct array and return ok or fail.

The C code will execute a query to select all the edges that have at least one end in that partition:

select * from grid where sgid=
union
select * from grid where tgid=

both sgid and tgid columns should be indexed on the table.

The struct will be something like:

typedef struct partition_t {
integer gid;
integer source;
integer spart;
integer target;
integer tpart
double precision cost;
double precision rcost;
…
}

You will need to return the edge count for the partition_t array or add a dummy record with gid = -1 to signal the end of the array.

When the method loads the edges, if will zero the partition_ids if both are the same. It will need to check if they are different, and check if the other partition is already load if that edge as already been loaded into the graph to avoid duplication of the edges for these boundary edges.

Does this cover everything?
Do you need some help with the C code? and processing the SQL query?

-Steve

pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev

Mukul_priya · July 5, 2013, 5:11pm

we have the partition_id and the node_id (of source node) so we can retrieve only those edges that have one of their nodes as source node. So two conditions will arise here

That edge might completely lie inside that partition ( as u sadi above )
Its a boundary edge , in that case if the algorithm follws this edge then we again have to check whether the partition( using target_partiton_id) in which the target_node of this edge is lying is loaded or not.

Stephen_Woodbridge1 · July 5, 2013, 5:29pm

On 7/5/2013 1:00 PM, Mukul priya wrote:

do we need to retrieve all the edges that are lying totally or
partially in that partition or even if we retrieve only those edges
that are directly connected to the source node that is load only
those edges that have one of its node as source node.

When we load a partition we load all of the edges that have at least one
node in that partition. That means if both nodes are in the partition we
load it, and if only the target or the source node are in the partition
we load it. Once the partition is load we make it as loaded in the bit
array and we never load it again.

On 7/5/2013 1:11 PM, Mukul priya wrote:

we have the partition_id and the node_id (of source node) so we can
retrieve only those edges that have one of their nodes as source
node. So two conditions will arise here 1) That edge might completely
lie inside that partition ( as u sadi above ) 2) Its a boundary edge
, in that case if the algorithm follws this edge then we again have
to check whether the partition( using target_partiton_id) in which
the target_node of this edge is lying is loaded or not.

So when you are solving the graph and we are at some node A and it
connects via edges to nodes B, C, D.

You will need to check if A.partition != 0 or B.partition !=0 then check if the partition is loaded or not and load it if needed.

-Steve

Mukul_priya · July 5, 2013, 5:44pm

Ok. got it now.

I think i am clear with the basic skeleton , will try to code it this weekend , will post doubts directly on this list regarding c or c++ issues .

thanks.

···

On Fri, Jul 5, 2013 at 10:59 PM, Stephen Woodbridge <woodbri@swoodbridge.com> wrote:

On 7/5/2013 1:00 PM, Mukul priya wrote:

do we need to retrieve all the edges that are lying totally or

partially in that partition or even if we retrieve only those edges
that are directly connected to the source node that is load only

those edges that have one of its node as source node.

When we load a partition we load all of the edges that have at least one
node in that partition. That means if both nodes are in the partition we
load it, and if only the target or the source node are in the partition
we load it. Once the partition is load we make it as loaded in the bit
array and we never load it again.

On 7/5/2013 1:11 PM, Mukul priya wrote:

we have the partition_id and the node_id (of source node) so we can
retrieve only those edges that have one of their nodes as source
node. So two conditions will arise here 1) That edge might completely
lie inside that partition ( as u sadi above ) 2) Its a boundary edge
, in that case if the algorithm follws this edge then we again have
to check whether the partition( using target_partiton_id) in which
the target_node of this edge is lying is loaded or not.

So when you are solving the graph and we are at some node A and it
connects via edges to nodes B, C, D.

You will need to check if A.partition != 0 or B.partition !=0 then check if the partition is loaded or not and load it if needed.

-Steve

pgrouting-dev mailing list
pgrouting-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/pgrouting-dev