Graph in memory

kikislater · November 29, 2024, 2:45pm

Hello,

I followed PostGIS day 2024 and there was this interesting remark made by Paul Ramsey in this video:

“I’m going to ask the pgRouting developers why there isn’t some sort of caching mode that keeps the whole graph in memory.”

And there was also this issue on github

github.com/pgRouting/pgrouting

Keep graph in Ram

opened 08:51AM - 28 Jul 23 UTC

esdras-kid

Functionality Request Enhancement

To optimize performance for large graphs, I have some ideas that could potential…ly improve the process. Currently, with every call to pgRouting, there is a time overhead for loading the graph from the PostgreSQL database and building the Boost C++ graph structure internally before running the algorithm. My idea is to explore two possible approaches: Keep the Graph in RAM: It is possible to load the entire graph from the PostgreSQL database into RAM and keep it there during the lifetime of the application or API. By doing so, subsequent calls to the routing algorithm can directly access the graph data in memory, significantly reducing the overhead of loading the graph from the database each time. Generate Boost Graph Structure in RAM: If keeping the entire graph in RAM is not feasible due to size constraints, another approach is to generate the Boost C++ graph structure directly in memory and cache it for subsequent algorithm calls.

As I haven’t seen a question on this platform, I’ll take the opportunity to ask it.

robe · December 7, 2024, 3:32am

@kikislater
There are a couple of things we changed in recent versions to reduce the impact of graph building and to store graphs in memory for more than one route use.

In prior versions of pgrouting before 3.7, the graph was built essentially twice, first in boost and then copied over to C. @cvvergara can discuss this a bit more. She revised to build once without having to copy.
Functions have been added I forget the version I think from pgrouting 3.4 on, you will find many permutations of like dijkstra that does multi start , multi stop
So it builds the graph once and uses to compute several route requests.

I discuss this a bit in my video on PostGIS Day 2024

and Vicky gives many examples of this in her state of pgRouting 2023 FOSS4G talk

So those are all going kind of in that direction.

I don’t think it’s realistic to keep a graph in RAM for multiple function calls because first of all, if you have a large dataset, you are probably using different parts of the graph depending on queries you are doing, so trying to load the whole graph in memory will easily eat up your RAM.

Secondly the way PostgreSQL is built, you can’t just create a graph in memory once and some how allow other connections to use it. In theory you could maybe trick shared buffers to do it.

I think the best bet might be to create some kind of custom type that can save the boost graph in a table in a format more usable, but I think even then you run into issues with toast / detoasting that may just make being able to pull that graph from PostgreSQL storage not worthwhile.

@pramsey has more knowledge of inner workings of PostgreSQL so might have some ideas on how he invisions this would work.

pramsey · December 9, 2024, 8:48pm

Secondly the way PostgreSQL is built, you can’t just create a graph in memory once and some how allow other connections to use it. In theory you could maybe trick shared buffers to do it.

It would be quite a bit of a research project. In order for backends to share the graph the graph would have to be in shared memory, which implies every allocation done in building the graph also allocating within shmem, which could be tricksy, to say the least. Another approach would be to to have the graph allocated by a bgworker in its own space and then pass the queries from the backend to the bgworker via shmem. This might be a better approach. Could start with a bgworker per connection, so each system is isolated, but multiple routes on the same graph don’t have a reload/rebuild phase.

P.

I think the best bet might be to create some kind of custom type that can save the boost graph in a table in a format more usable, but I think even then you run into issues with toast / detoasting that may just make being able to pull that graph from PostgreSQL storage not worthwhile.

@pramsey has more knowledge of inner workings of PostgreSQL so might have some ideas on how he invisions this would work.

Visit Topic or reply to this email to respond.

To unsubscribe from these emails, click here.

robe · December 9, 2024, 9:47pm

Hadn’t thought about using background workers. We can maybe add that on as a Google Summer of Code (GSOC) project which is coming up soon…

I also want to explore the performance impact of the change in Release of pgRouting 3.7.0
of getting rid of the redundancy of graph both in C and C++ on bigger graphs discussed in

github.com/pgRouting/pgrouting

Read postgresql data on C++

pgRouting:develop ← cvvergara:read-data-on-cpp-try3

opened 10:25PM - 24 Jan 24 UTC

cvvergara

+4770 -6656

Moving the input queries from the C code into the C++ code. ## Before this ch…ange **On the C file:** * Includes what we call a driver [header](https://github.com/pgRouting/pgrouting/blob/v3.6.1/include/drivers/dijkstra/dijkstra_driver.h) is linked as C * The driver headers are very similar to each other * the static `process` within each C file [example](https://github.com/pgRouting/pgrouting/blob/v3.6.1/src/dijkstra/dijkstra.c#L54) are very similar * opens the connection with PostgreSQL * reads the data, (edges sql, restrictions_sql, etc and any array that is on the query) * all of these reading is done on the C code which interacts with PostgreSQL, (which is written in C.) * Suppose, for simplifying this description, that the data takes `x` MB in memory * calls the `pgr_do_function` defined on the driver * gets the results * closes the connection with PostgreSQL **On the driver C++ file:** * [code](https://github.com/pgRouting/pgrouting/blob/v3.6.1/src/dijkstra/dijkstra_driver.cpp) is written in C++ * Drivers are very similar within each other. * Convert C arrays to C++ containers * Now memory size is 2x + container overhead * Builds the boost graph * Now memory size is 3x + container overhead * **None of this code interacts with PostgreSQL** The ideal situation: * Build the boost graph as the data is read. * Have C++ templates on the drivers * Being so similar that can be done General steps to reach the ideal situation 1) Be able to read the PostgreSQL data on the C++ code 2) Create the templates 3) Build the boost graphs based on the templates needs ## This PR is step 1 In rough terms, moving the reading of the data to C++. The sketch of the C & C++ driver files for the first step: **On the C file:** * Includes what we call a driver `header` linked as C * The driver headers are very similar to each other * the static `process` within each C file will still be very similar * opens the connection with PostgreSQL * calls the `pgr_do_function` defined on the driver * gets the results * closes the connection with PostgreSQL **On the driver C++ file:** * [code](https://github.com/pgRouting/pgrouting/blob/v3.6.1/src/dijkstra/dijkstra_driver.cpp) is written in C++ * Drivers will still be very similar within each other. * **C++ code will interact with PostgreSQL** * Read the data directly into C++ containers * Two kinds of data * Data that comes from the inner SQL queries * Data that comes from arrays * All of these reading will be do with C++ code * Now the reading into C++ container takes `x` MB in memory * Builds the boost graph * Now memory size is 2x + container overhead ## tasks - [x] Copy the files `pgdata_getters` and `pgdata_fetchers` to `trsp_pgget` & `trsp_pgfetch` - [x] Verify that the only function that is including the `trsp_pgget` is the deprecated `pgr_trsp` function - [x] Remove the unused code on `trsp_pgget` & `trsp_pgfetch` - [x] Create workaround to postgres defines of functions that exist on the standard - [x] Adjust `pgdata_getters` and `pgdata_fetchers` to be used directly from C++ - [x] parameter is a string containing the SQL query - [x] return a structure instead of a pointer - [x] return a C++ container instead of a pointer - [x] Create new template get_data that works with the changes on `pgdata_getters` and `pgdata_fetchers` - [x] Do general adjustments to several files - [x] Per sub directory basis - [x] Delete from C file code that reads the data - [x] Add into C++ driver file(s) the code that reads the data - [x] If necessary adjust the code that use a C array to use the C++ container - [x] standardize the driver names to start with `pgr_do_` - [x] Update release_notes & NEWS

I expect not much impact with smaller graphs, but with much bigger graphs we should see lower memory utilization as well as faster build.

kikislater · December 10, 2024, 6:16pm

Thank you for taking the time to respond via this platform. I don’t know whether to discuss here or on github. It’s interesting to see that even though pgrouting performs well from my point of view, there’s an interest in improving performance.
Even if this can be a disadvantage on certain large networks that need to be managed, for other uses and I don’t know if it’s possible, but apart from the optimisations suggested by Paul which seem completely legitimate, would it be possible to load the graph in memory using an option and not at system level?

robe · December 12, 2024, 8:00am

Well probably this discussion belongs on the pgrouting-dev category and not users. I prefer discussion here and once we actually have some code written then the more technical details can happen on a pull request.

Just moved this to the pgrouting-dev category. pgrouting-dev - OSGeo Discourse

robe · December 13, 2024, 5:35am

Load just for the current session?