robe
December 9, 2024, 9:47pm
4
pramsey:
It would be quite a bit of a research project. In order for backends to share the graph the graph would have to be in shared memory, which implies every allocation done in building the graph also allocating within shmem, which could be tricksy, to say the least. Another approach would be to to have the graph allocated by a bgworker in its own space and then pass the queries from the backend to the bgworker via shmem. This might be a better approach. Could start with a bgworker per connection, so each system is isolated, but multiple routes on the same graph don’t have a reload/rebuild phase.
P.
Hadn’t thought about using background workers. We can maybe add that on as a Google Summer of Code (GSOC) project which is coming up soon…
I also want to explore the performance impact of the change in Release of pgRouting 3.7.0
of getting rid of the redundancy of graph both in C and C++ on bigger graphs discussed in
pgRouting:develop
← cvvergara:read-data-on-cpp-try3
opened 10:25PM - 24 Jan 24 UTC
Moving the input queries from the C code into the C++ code.
## Before this ch… ange
**On the C file:**
* Includes what we call a driver [header](https://github.com/pgRouting/pgrouting/blob/v3.6.1/include/drivers/dijkstra/dijkstra_driver.h) is linked as C
* The driver headers are very similar to each other
* the static `process` within each C file [example](https://github.com/pgRouting/pgrouting/blob/v3.6.1/src/dijkstra/dijkstra.c#L54) are very similar
* opens the connection with PostgreSQL
* reads the data, (edges sql, restrictions_sql, etc and any array that is on the query)
* all of these reading is done on the C code which interacts with PostgreSQL, (which is written in C.)
* Suppose, for simplifying this description, that the data takes `x` MB in memory
* calls the `pgr_do_function` defined on the driver
* gets the results
* closes the connection with PostgreSQL
**On the driver C++ file:**
* [code](https://github.com/pgRouting/pgrouting/blob/v3.6.1/src/dijkstra/dijkstra_driver.cpp) is written in C++
* Drivers are very similar within each other.
* Convert C arrays to C++ containers
* Now memory size is 2x + container overhead
* Builds the boost graph
* Now memory size is 3x + container overhead
* **None of this code interacts with PostgreSQL**
The ideal situation:
* Build the boost graph as the data is read.
* Have C++ templates on the drivers
* Being so similar that can be done
General steps to reach the ideal situation
1) Be able to read the PostgreSQL data on the C++ code
2) Create the templates
3) Build the boost graphs based on the templates needs
## This PR is step 1
In rough terms, moving the reading of the data to C++.
The sketch of the C & C++ driver files for the first step:
**On the C file:**
* Includes what we call a driver `header` linked as C
* The driver headers are very similar to each other
* the static `process` within each C file will still be very similar
* opens the connection with PostgreSQL
* calls the `pgr_do_function` defined on the driver
* gets the results
* closes the connection with PostgreSQL
**On the driver C++ file:**
* [code](https://github.com/pgRouting/pgrouting/blob/v3.6.1/src/dijkstra/dijkstra_driver.cpp) is written in C++
* Drivers will still be very similar within each other.
* **C++ code will interact with PostgreSQL**
* Read the data directly into C++ containers
* Two kinds of data
* Data that comes from the inner SQL queries
* Data that comes from arrays
* All of these reading will be do with C++ code
* Now the reading into C++ container takes `x` MB in memory
* Builds the boost graph
* Now memory size is 2x + container overhead
## tasks
- [x] Copy the files `pgdata_getters` and `pgdata_fetchers` to `trsp_pgget` & `trsp_pgfetch`
- [x] Verify that the only function that is including the `trsp_pgget` is the deprecated `pgr_trsp` function
- [x] Remove the unused code on `trsp_pgget` & `trsp_pgfetch`
- [x] Create workaround to postgres defines of functions that exist on the standard
- [x] Adjust `pgdata_getters` and `pgdata_fetchers` to be used directly from C++
- [x] parameter is a string containing the SQL query
- [x] return a structure instead of a pointer
- [x] return a C++ container instead of a pointer
- [x] Create new template get_data that works with the changes on `pgdata_getters` and `pgdata_fetchers`
- [x] Do general adjustments to several files
- [x] Per sub directory basis
- [x] Delete from C file code that reads the data
- [x] Add into C++ driver file(s) the code that reads the data
- [x] If necessary adjust the code that use a C array to use the C++ container
- [x] standardize the driver names to start with `pgr_do_`
- [x] Update release_notes & NEWS
I expect not much impact with smaller graphs, but with much bigger graphs we should see lower memory utilization as well as faster build.