I think 1GB is a hard limit in PostgreSQL .
See for example this note
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/utils/memutils.h;h=21640d62a647d23a4bfd3f12922b615d9cc03e97;hb=HEAD#l31
and discussed here - https://postgrespro.com/list/thread-id/1549721
Instead of using pgr_createTopology, have you tried using pgr_extractVertices which will eventually replace pgr_createTopology? https://docs.pgrouting.org/latest/en/pgr_extractVertices.html
That said I think internally (the pgRouting project) we'll need to rethink how we implement some functions like (pgr_components) to avoid these issues.
In meantime, I think your best bet is to segmentize your data into groups and reconnect them by overlapping node ids.
For this I'm using this function for breaking apart https://postgis.net/docs/ST_SquareGrid.html
(alternatively - https://postgis.net/docs/ST_HexagonGrid.html ) or
(if you have areas - https://postgis.net/docs/ST_Subdivide.html --
Or some sort of known boundaries like country, city, states, provinces etc.
-- I'm going to assume your data is in 4326 (you'll need to change size if it's in another special ref)
I haven't thoroughly tested this so might need changing and you will need to change to accommodate your field names. Note the 2 in the ST_SquareGrid is 2 degrees.
DO language plpgsql $do$
DECLARE var_sql text; i integer; r record; last_component bigint;
BEGIN
-- create table of divisions that breaks into square grid regions
DROP TABLE IF EXISTS tmp_divisions;
CREATE TEMP TABLE tmp_divisions AS
SELECT row_number() OVER(ORDER BY sg.i, sg.j) AS rn, sg.geom
FROM ST_SquareGrid(2, (SELECT ST_SetSRID(ST_Extent(the_geom), 4326) FROM edges) ) AS sg
WHERE EXISTS (SELECT 1 FROM edges WHERE edges.the_geom && sg.geom) ;
i = 0; last_component = 0;
-- create subtables which will hold components of each sub region and will set components of overlapping nodes to the component with lowest number
FOR r IN(SELECT rn, geom FROM tmp_divisions ORDER BY rn) LOOP
i = i + 1;
var_sql = 'DROP TABLE IF EXISTS tmp_connectedComponents_' || r.rn::text || '; CREATE TEMP TABLE tmp_connectedComponents_' || r.rn::text || ' AS
SELECT (component + ' || last_component::text || ') AS component, node, NULL::bigint AS new_component FROM pgr_connectedComponents( ' ||
quote_literal('SELECT id, source, target, cost, cost AS reverse_cost FROM edges
WHERE ST_Intersects(the_geom,' || quote_literal(r.geom::text) || '::geometry' || ') ' ) || ')';
RAISE NOTICE '%', var_sql;
EXECUTE var_sql;
var_sql = 'ALTER TABLE tmp_connectedComponents_' || r.rn::text || ' ADD CONSTRAINT pk_tmp_connectedComponents_' || r.rn::text || ' PRIMARY KEY(node);';
EXECUTE var_sql;
var_sql = 'CREATE INDEX ix_tmp_connectedComponents_' || r.rn::text || '_component ON tmp_connectedComponents_' || r.rn::text || '(component);';
EXECUTE var_sql;
RAISE NOTICE '%', var_sql;
COMMIT;
EXECUTE 'SELECT MAX(component) FROM tmp_connectedComponents_' || r.rn::text INTO last_component;
IF i > 1 THEN
var_sql = 'UPDATE tmp_connectedComponents_' || r.rn::text || ' AS c
SET new_component = COALESCE(p.new_component, p.component)
FROM tmp_connectedComponents_' || (i - 1) || ' AS p WHERE c.node = p.node;';
RAISE NOTICE '%', var_sql;
EXECUTE var_sql;
COMMIT;
var_sql = 'UPDATE tmp_connectedComponents_' || r.rn::text || ' AS c
SET new_component = pp.new_component
FROM (SELECT min(COALESCE(p.new_component, p.component)) AS new_component, component
FROM tmp_connectedComponents_' || r.rn::text || ' AS p GROUP BY component) AS pp
WHERE c.new_component IS NULL AND pp.component = c.component;';
RAISE NOTICE '%', var_sql;
EXECUTE var_sql;
COMMIT;
END IF;
END LOOP;
-- after we are all done, we consolidate all into a singled final_connectedComponents table
-- and insure a node is represented only once.
IF i > 0 THEN
DROP TABLE IF EXISTS final_connectedComponents;
var_sql = 'CREATE TABLE final_connectedComponents AS ' ||
string_agg('SELECT new_component AS component, component AS old_component, node
FROM tmp_connectedComponents_' || gi::text, ' UNION ') FROM generate_series(1, i) AS gi;
RAISE NOTICE '%' , var_sql;
EXECUTE var_sql;
var_sql = 'UPDATE final_connectedComponents SET component = old_component
WHERE component IS NULL AND
old_component IN(SELECT o.component FROM final_connectedComponents AS o WHERE o.component IS NOT NULL)';
RAISE NOTICE '%', var_sql;
EXECUTE var_sql;
-- delete dupes keeping the one that already has a component assigned
var_sql = 'DELETE FROM final_connectedComponents WHERE component IS NULL
AND node IN(SELECT o.node FROM final_connectedComponents AS o WHERE o.component IS NOT NULL)';
RAISE NOTICE '%, %', var_sql, clock_timestamp();
EXECUTE var_sql;
-- fill in still missing components
var_sql = 'UPDATE final_connectedComponents SET component = old_component WHERE component IS NULL;';
RAISE NOTICE '%, %', var_sql, clock_timestamp();
EXECUTE var_sql;
CREATE INDEX ix_final_connectedComponents_component ON final_connectedComponents(component);
COMMIT;
UPDATE final_connectedComponents AS c
SET component = pp.new_component
FROM (SELECT min(least(p.component, p.old_component)) AS new_component, old_component
FROM final_connectedComponents AS p GROUP BY old_component ) AS pp
WHERE c.old_component = pp.old_component AND c.component <> pp.new_component;
-- delete dupes
var_sql = 'DELETE FROM final_connectedComponents AS n
USING (SELECT o.node, MIN(ctid) AS min_ctid FROM final_connectedComponents AS o GROUP BY o.node HAVING COUNT(*) > 1) AS o
WHERE n.node = o.node AND n.ctid <> o.min_ctid;';
RAISE NOTICE '%, %', var_sql, clock_timestamp();
EXECUTE var_sql;
ALTER TABLE final_connectedComponents ADD CONSTRAINT pk_final_connetectedComponents PRIMARY KEY(node);
END IF;
END;
$do$;
Hope that helps,
Regina
-----Original Message-----
From: Pgrouting-users [mailto:pgrouting-users-bounces@lists.osgeo.org] On
Behalf Of Phyks
Sent: Saturday, January 21, 2023 4:13 PM
To: pgrouting-users@lists.osgeo.org
Subject: Re: [pgrouting-users] "invalid memory alloc request size" on large
graphs
Hi,
> 1.) set the highest "work_mem" for the session and close all other
> running programs ..
>
> SET work_mem = '8GB';
> SELECT pgr_createTopology( ... )
> -- set back for the default value;
> SET work_mem = '1GB';
Same issue :
=# SET work_mem = '8GB';
SET
=# (SELECT component, node FROM pgr_connectedComponents(
'SELECT id, source, target, cost, cost AS reverse_cost FROM edges'
));
ERREUR: invalid memory alloc request size 1080000000 CONTEXTE : fonction
SQL « pgr_connectedcomponents », instruction 1
> 2.) Linux: Enable Swap space ( ~ 20GB or 2x of your RAM )
> https://www.digitalocean.com/community/tutorials/how-to-add-swap-
space
> -on-ubuntu-20-04
>
> 3.) optimize table variables storage spaces ( to less space )
> - change your bigint variables to integer
> for (about 40M edges) an integer [ -2147483648 to +2147483647 ]
> should enought
> https://www.postgresql.org/docs/current/datatype-numeric.html
As far as I understand, `pgr_connectedComponents` as run in the example
above should only query the `edges` table. I already have `source` and `target`
set as regular `integer` there.
I have the impression the malloc is failing for another reason that OOM:
1/ Malloc size should be about 1GB as per the error message. This is far less
than my free RAM and seems to be a fixed buffer size somewhere.
2/ I have only 2GB RAM used on my system (hence 10GB free RAM) and do not
see any increase of RAM usage between the start of the
`pgr_connectedComponents` query and the error output.
For these reasons, I do not expect swap / RAM changes to have any effect
here, but instead a configuration option in PostgreSQL / pgRouting to allow
allocations of more than 1GB?
It seems I'm hitting https://postgrespro.com/list/thread-id/1549721
which references https://github.com/pgRouting/pgrouting/issues/291 but I
don't see much solutions there apart from using bounding boxes (which is
easily done for routing, much less for contraction / connected components
analysis).
Thanks!
Phyks
> Phyks <phyks+pgrouting@phyks.me> ezt írta (időpont: 2023. jan. 21.,
> Szo,
> 14:06):
>
>> Hi,
>>
>> I'm trying to use pgRouting on a quite large graph (about 40M edges).
>> My graph structure is expected to be quite simple and not very
>> connected (being mostly a simple tree structure).
>>
>> I came across the discussion on speeding up `pgr_createTopology` for
>> large networks (
>> https://github.com/pgRouting/pgrouting/discussions/2255#discussioncom
>> ment-4739376)
>>
>> and am trying to put it in practice for my use case. Moving from
>> `pgr_createTopology` to `pgr_extractVertices` gives a huge increase
>> in processing time and memory requirements.
>>
>> However, when trying to run `pgr_*` (`pgr_dijkstra`,
>> `pgr_contraction` or `pgr_connectedComponents` for example) on the
>> subsequent graph, I am facing a malloc issue : invalid memory alloc request
size 1080000000.
>>
>> This is easily worked around with `pgr_dijkstra` since routing is
>> mainly local and the edges table can easily be filtered with a bounding box.
>> This is not as easy for `pgr_contraction` or
>> `pgr_connectedComponents` where tiling the computation (through bbox)
>> requires some extra post-processing.
>>
>> I'm not sure about the units of the `malloc` error message, but it
>> seems to me this is about 1GB allocation which should be no issue on
>> the machine and setup I am running. For the record, my postgresql is
>> having either a basic configuration (out of the box install) or a
>> pgtune-d one, same issue in both cases. My machine has a bit more than
10GB of free RAM.
>>
>> Would anyone have more insights on this issue which could help me
>> better understand and work around it? This is particularly
>> frustrating since computations on a quarter of the table (about 10M
>> edges) work perfectly fine and in a very reasonable amount of time.
>>
>> Thank you in advance,
>> Best,
>> _______________________________________________
>> Pgrouting-users mailing list
>> Pgrouting-users@lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/pgrouting-users
>>
>
>
> _______________________________________________
> Pgrouting-users mailing list
> Pgrouting-users@lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/pgrouting-users
_______________________________________________
Pgrouting-users mailing list
Pgrouting-users@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/pgrouting-users