Cost concern for Make biconnected planar and Make maximal planar

Hi @cvvergara , @robe

Following our community bonding discussion about the cost concern, I did a deep audit of the codebase. Here are my findings and I’d appreciate your guidance on following:
1] pgr_contraction vs planar augmentation
In our meeting you mentioned pgr_contraction as a reference . I studied it carefully and found a key difference:
pgr_contraction creates shortcut edges that replace existing paths. For example, when path u → v → w is contracted, the shortcut u → w gets cost = cost(u→v) + cost(v→w). This is mathematically exact : pgr_dijkstra on the contracted graph produces the same shortest-path costs as on the original graph. That’s why contraction needs and has meaningful cost values.

pgr_makeBiconnectedPlanar and pgr_makeMaximalPlanar are fundamentally different — they add structural edges to improve graph properties (biconnectivity, triangulation), not to replace or shortcut any existing path. There is no existing path being shortcutted, so there is no cost to derive.

Both serves different purpose than what contraction does :

  • Planar augmentation → prepares graphs for graph drawing and structural algorithms for example : chrobak_payne_straight_line_drawing() — requires maximal planar input canonical_ordering() — requires maximal planar input

  • Mesh generation / triangulation for computational geometry

  • Network redundancy analysis (biconnected = no single point of failure)

None of these downstream algorithms use edge costs. They operate purely on graph topology (which vertices are connected to which).

2. pgr_makeConnected

I verified pgr_makeConnected across all layers (SQL → C SRF → C++ driver → algorithm → Boost call) and ran both the official docquery tests . Everything passes:

pgr_makeConnected returns (seq, start_vid, end_vid) without a cost column, and it works correctly. The documentation states: “The algorithm does not consider traversal costs in the calculations.” So I think we can follow the exact pattern for makeBiconnected and makeMaximal .

Since pgRouting functions never modify the input, the pipeline works as: call makeConnected → user INSERTs returned edges (with their chosen cost) → call makeBiconnectedPlanar on the updated table → user INSERTs → call makeMaximalPlanar. Same pattern as pgr_contraction’s documentation tutorial.

Regarding the cost i want your guidance what we should prefer .
Given that these are structural/topological operations (not routing operations), I see two clean approaches:

Option 1: Match pgr_makeConnected : no cost column

sql :

pgr_makeBiconnectedPlanar(TEXT,
OUT seq BIGINT,
OUT start_vid BIGINT,
OUT end_vid BIGINT)

then user adds edges with their own cost when inserting :
INSERT INTO edges(source, target, cost, reverse_cost)
SELECT start_vid, end_vid, 1.0, 1.0 – user chooses cost
FROM pgr_makeBiconnectedPlanar(‘…’);

Option 2: Include cost column with a default value

pgr_makeBiconnectedPlanar(TEXT,
OUT seq BIGINT,
OUT start_vid BIGINT,
OUT end_vid BIGINT,
OUT cost FLOAT)     – default 0 or 1
  • Output can be piped directly into INSERT

  • More convenient for the user

If we go with Option 2, we could also apply the same change to pgr_makeConnected for consistency across the pipeline.

Which approach does the community prefer? I’m ready to proceed with implementation either way. I lean toward Option 1 (matching pgr_makeConnected) since these are graph structure operations, but I’m happy to go with Option 2 if that’s preferred.

Thanks!

Mohit

I’d go with option 1. In my mind there is no point in providing a most often incorrect cost to a user. By not specifying one, you force the user to think about one.

1 Like

The edges sql include cost (and reverse_cost)
First, I forget if you are talking about directed only, undirected only or both graphs in your function

(you just reminded me that its only undirected)

so both of them are undirected.

G 1 1 2 2 1--2 3 3 2--3 3--1

for the graph above:

  • the cost from 1 to 2 was given in the data
  • the cost from 2 to 3 was given in the data
  • you are adding edge 1 to 3

The question can be rephrased:
Can we set a cost?

To keep triangle inequality

if 1 to 2 to 3 is a straight line the cost of 1 to 3 is C(1,3) = C(1,2) + C(2,3)

maybe C(1,2) > C(2,3) and v3 lies on the line 1 to 3
Then C(1,3) = C(1,2) - C(2,3)

maybe C(2,3) > C(1,2)
c(1,3) = C(2,3) - (C2,1)

rewriting the to maybes

C(1,3) = graph G {

1 – 2 --3;
3 --1 [color=green]
}

so to keep triangle inequality:

C(1,3) must be between: C(largest) - C(smallest), C(1,2) + C(2,3)

Maybe we can not give an exact cost, but we could give (if what you are making are triangularities) the min and max values of the costs to keep triangle inequality

Hi @cvvergara ,

Thank you for the detailed feedback on the triangle inequality idea. I did a thorough analysis and here are my findings:

First, just for info i want to explain about cost importance in my algorithms :

1] Boost BGL has zero cost awareness for these algorithms

Looking at the actual Boost function signatures:

make_biconnected_planar:

template <typename Graph, typename PlanarEmbedding, typename EdgeIndexMap, typename AddEdgeVisitor>

void make_biconnected_planar(Graph& g, PlanarEmbedding embedding, EdgeIndexMap em, AddEdgeVisitor& vis);

make_maximal_planar:

template <typename Graph, typename PlanarEmbedding, typename VertexIndexMap, typename EdgeIndexMap, typename AddEdgeVisitor>

void make_maximal_planar(Graph& g, PlanarEmbedding embedding, VertexIndexMap vm, EdgeIndexMap em, AddEdgeVisitor vis);

Neither function accepts an edge weight/cost parameter. There is no edge_weight_t property map involved at any level.

Even Boost’s own straight_line_drawing.cpp example uses:(which is continuation of planar pipeline)

typedef adjacency_list<vecS, vecS, undirectedS, property<vertex_index_t, int>> graph;

No edge_weight_t at all — the entire pipeline (make_connected → make_biconnected_planar → make_maximal_planar → canonical_ordering → chrobak_payne_straight_line_drawing) works with zero cost/weight properties.

2] All downstream algorithms ignore costs

Every algorithm that consumes the output of these functions operates purely on topology:

  1. planar_canonical_ordering
  2. chrobak_payne_straight_line_drawing
  3. planar_face_traversal
  4. is_straight_line_drawing

Now about the triangulation you suggested for cost calculations:

3] The triangle inequality : where it works and where it breaks

Your 1–2–3 example is correct for the trivial case:

G 1 1 2 2 1--2 cost=5 3 3 1--3 added range=[2,8] 2--3 cost=3

Triangle inequality gives: |5 - 3| ≤ C(1,3) ≤ 5 + 3 → range [2, 8] (which we can decide as per your example if its straight or not getting max(8) or minimum(2) as per the graph.
But even here which value do we actually return? 2? 5? 8? Any choice is arbitrary.

Now lets consider what the makebiconnected algorithm actually does on a slightly larger graph.
(Note: the pipeline works like first makebiconnected will run because we cant run maximalplanar algorithm on a Non-biconnected graph i.e the graph should be biconnected before maximal planar runs so we will consider makebiconnected example first)

before Before: path 1–2–3–4 Articulation points: 2, 3 1 1 2 2 1--2 cost=10 3 3 2--3 cost=1 4 4 3--4 cost=100

make_biconnected_planar adds edge (1, 4) to form a cycle and eliminate all articulation points:

after After: edge 1–4 added by algorithm 1 1 2 2 1--2 cost=10 4 4 1--4 added cost=??? 3 3 2--3 cost=1 3--4 cost=100

Now for cost of edge 1→4:

  • Path via 1→2→3→4 has cost = 10 + 1 + 100 = 111
  • Now the question arises :Triangle inequality on which triangle? There are no triangles here the result is a cycle (1–2–3–4–1), not a triangulation. The triangle inequality concept doesn’t even apply to make_biconnected_planar , it adds edges to form cycles, not triangles.

Even if we said “use shortest path cost” = 111, the only path between 1 and 4 in the original graph is 1→2→3→4. But there’s no guarantee there is even a path — consider when make_biconnected_planar operates after make_connected has added hypothetical edges. Some parts of the “path” may already contain edges with no real cost.

Note: I mentioned about makeConnected here because before we put input graph for makeBiconnected algorithm , it is essential for the graph to be Connected first. So either give connected graph as input OR run makeConnected first then run makeBiconnected.

4] More complex case : multiple edges added

Consider the Boost documentation’s own example graph (from make_biconnected_planar.cpp)

boostExample Boost example: 11 vertices, multiple articulation points 0 0 1 1 0--1 10 10 0--10 2 2 3 3 2--3 3--0 4 4 3--4 5 5 4--5 5--3 6 6 5--6 7 7 6--7 8 8 7--8 8--5 9 9 8--9

This graph has multiple biconnected components and the algorithm adds several edges. If we tried to compute cost for each new edge:

  • Each added edge changes the graph structure
  • The cost bounds for the next edge depend on the cost we assigned to the previous edge
  • This creates a cascading dependency — there’s no correct answer.

5] Why this problem is fundamentally hard

The problem of “adding minimum-weight edges to make a graph biconnected” has been proven NP-hard in:

  • Eswaran & Tarjan (1976), “Augmentation Problems”, SIAM Journal on ComputingDOI: 10.1137/0205044
    • Proved that weighted biconnectivity augmentation is NP-complete
    • While unweighted augmentation is solvable in polynomial time, introducing weights makes it NP-hard

Even computing meaningful cost bounds would require running shortest-path algorithms (Dijkstra) for each added edge, turning our currently O(n) algorithm into O(k × (V log V + E)), where k is the number of added edges.

6] Recommendation: follow pgr_makeConnected pattern

pgr_makeConnected (currently experimental) already returns just (seq, start_vid, end_vid) without a cost column, and its documentation states: “The algorithm does not consider traversal costs in the calculations.”

I propose the same signature for both functions:

pgr_makeBiconnectedPlanar(TEXT, OUT seq BIGINT, OUT start_vid BIGINT, OUT end_vid BIGINT)

pgr_makeMaximalPlanar(TEXT, OUT seq BIGINT, OUT start_vid BIGINT, OUT end_vid BIGINT)

The user can then add their edges with whatever cost makes sense for their use case:

sql

INSERT INTO edges(source, target, cost, reverse_cost)

SELECT start_vid, end_vid, user_chosen_cost, user_chosen_cost

FROM pgr_makeBiconnectedPlanar('...');

Returning a fixed cost (0, 1, or -1) would be misleading it implies the algorithm computed something meaningful, when in reality no cost computation occurred. By not providing a cost column, we make it clear that the user must decide, which is the honest and correct approach.
Looking forward for your guidance.
Thanks! Mohit.

This is an excellent architectural breakdown by Mohit. From a production database and high-performance pipeline perspective, I completely support the proposal to keep pgr_makeBiconnectedPlanar and pgr_makeMaximalPlanar purely topological, without forcing an artificial cost column.

Here is why forcing cost calculations inside these specific functions is a dangerous anti-pattern for relational spatial databases:

  1. Separation of Concerns (Topology vs. Metric Space): Planarization and biconnectivity are strictly structural (topological) operations. Edge costs belong to the metric and business-logic layer. Forcing an arbitrary or expensive heuristic calculation inside a low-level topological reconstruction function violates basic database design principles.

  2. The Performance & Scaling Wall: As Mohit rightly pointed out, introducing weighted graph augmentation transforms a clean polynomial step into an NP-hard bottleneck. If a user runs this on a massive dataset (like an entire OpenStreetMap road network), forcing cascading shortest-path calculations (Dijkstra/A*) for every single hypothetical edge will utterly destroy pgRouting’s performance queries.

  3. The SQL-Native Approach is Always Better: In real-world GIS applications, when new edges are artificially introduced to make a graph biconnected or planar, the “cost” of these edges heavily depends on the business logic. One user might want them to equal the geographic length (ST_Length), another might want to inherit the average cost of adjacent edges, and a third might want to flag them with an infinite/negative penalty cost.

Leaving the cost assignment out of the core function and letting the user handle it via a clean, native SQL INSERT/SELECT or UPDATE block is the most honest, scalable, and transparent approach for pgRouting.

Backing up Mohit’s clean signature proposal (seq, start_vid, end_vid). It aligns perfectly with how modern, high-throughput data pipelines separate raw geometry/topology transformations from spatial optimization.

If the core development team ever needs assistance setting up robust automated benchmarking pipelines, containerized test suites for pgRouting clusters, or optimization workflows for heavy spatial databases, feel free to reach out. I explicitly specialize in high-performance backends, database automation, and DevOps layouts.

You can review my code quality, automated scripts, and architecture portfolio directly on my GitHub (where you can also find my current availability for freelance and contract setups): Bohdan Betskov Portfolio on GitHub.

Keep up the solid analytical work, Mohit!

ok.

seq, source, target
it is, the user takes care of the cost

Thank you for the update, Vicky! That is a very clean and powerful decision for the pgRouting architecture. It keeps the core topological library focused on performance and predictability, while giving users absolute flexibility over their metric business logic via standard SQL layers.

Looking forward to seeing these clean signatures implemented in the upcoming releases. It was great participating in this architectural discussion!

As mentioned, if the core team ever needs an extra pair of hands for performance benchmarking, containerization, or pipeline automation workflows, feel free to ping me here or open an issue in my repository: Bohdan Betskov Portfolio on GitHub.

Best of luck with the upcoming pgRouting deployment updates!