[GRASS-dev] Re: grass-dev Digest, Vol 43, Issue 8

Hi Markus,

How much memory was available on the machine? If the machine had more than 512MB RAM, it is not fair to run terracost running with mem=400MB, and compare it with an algorithm that can use more memory.

However, I am surprised that withnumtiles=1, it was slower than r.cost. That's something I'd like to look into. Would you mind sharing the raster with me, and sending me the exact commands that you ran?

A grid with 29M points is pretty small, for today's machines. I suggest running on something 10 times larger. And use a lot of sources, that makes the data access pattern less local.

-Laura

----------------------------------------------------------------------

Message: 1
Date: Fri, 06 Nov 2009 09:50:02 +0100
From: Markus Metz <markus.metz.giswork@googlemail.com>
Subject: [GRASS-dev] comparing r.cost and r.terracost
To: GRASS developers list <grass-dev@lists.osgeo.org>
Message-ID: <4AF3E33A.1090208@googlemail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Same test region as before

North Carolina sample dataset

g.region rast=elev_state_500m@PERMANENT res=100
# gives about 28 million cells
v.to.rast nc_state@PERMANENT use=val val=500.0 out=cost --o
v.to.rast urbanarea@PERMANENT use=val val=1 out=urbanarea --o
r.cost in=cost start_rast=urbanarea out=dist_urban percent_memory=20 --o

grass7 r.cost
real 0m55.349s
user 0m53.360s
sys 0m1.797s

grass65 r.cost
real 26m35.166s
user 2m55.612s
sys 23m37.921s

r.terracost: check optimal tile size for 400MB memory (default setting;
r.cost in grass7 used 135MB with 20% of maps in memory)
r.terracost in=cost start_rast=urbanarea out=dist_urban_terracost -i
[...]
TILESIZE: nc_spm_08 N=28064550 elements, M=419430400 bytes, optimal
numtiles=1870

r.terracost numtiles=1870, intermediate data are stored on disk as
r.cost does
real 25m13.593s
user 22m46.978s
sys 1m8.059s

r.terracost numtiles=1, all in memory (just fits into 400MB)
real 0m17.969s
user 0m17.276s
sys 0m0.500s

According to Laura Toma, when comparing r.cost with r.terracost,
numtiles must be >1 for r.terracost in order to compare disk I/O
algorithms [1]

With these test settings that intentionally reduced memory consumption
in order to test disk I/O performance, r.terracost is not really faster
than r.cost in grass65 and much slower than r.cost in grass7.

Markus M

[1] http://lists.osgeo.org/pipermail/grass-dev/2009-July/045157.html

------------------------------

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

End of grass-dev Digest, Vol 43, Issue 8
****************************************

Hi Laura,

Laura Toma wrote:

Hi Markus,

How much memory was available on the machine?

8 GB

If the machine had more than 512MB RAM, it is not fair to run
terracost running with mem=400MB, and compare it with an algorithm
that can use more memory.

I don't understand, both modules can use more than 400MB of memory. I
set r.terracost to use 400MB max and r.cost to use 135MB max. If
anything, this is not fair for r.cost and gives an advantage to
r.terracost. I did not mention that I gave r.terracost another advantage
by assigning the temporary directories to a folder on a separate, very
fast hard drive that had nothing else to do but manage the temp files of
r.terracost. The temp files of r.cost are in the standard grass
.tmp/$HOST directory, in my case that (slower) hard drive also had other
things to do than just manage r.cost's temp files. I really tried to
give r.terracost a head start :wink:

However, I am surprised that withnumtiles=1, it was slower than
r.cost.

???
cut'n paste from below
r.terracost numtiles=1
real 0m17.969s
user 0m17.276s
sys 0m0.500s

r.cost in grass7, percent_memory=20% # that's not percent of available
memory, that's percent of the region to keep in memory, for the test
region this was max 135 MB
real 0m55.349s
user 0m53.360s
sys 0m1.797s

-> r.terracost with numtiles=1 is much faster than r.cost

That's something I'd like to look into. Would you mind sharing the
raster with me, and sending me the exact commands that you ran?

Is in the message you replied to, see below.
cut'n paste from below

North Carolina sample dataset

g.region rast=elev_state_500m@PERMANENT res=100
# gives about 28 million cells
v.to.rast nc_state@PERMANENT use=val val=500.0 out=cost --o
v.to.rast urbanarea@PERMANENT use=val val=1 out=urbanarea --o
r.cost in=cost start_rast=urbanarea out=dist_urban percent_memory=20 --o

r.terracost in=cost start_rast=urbanarea out=dist_urban_terracost -i
gives
[...]
TILESIZE: nc_spm_08 N=28064550 elements, M=419430400 bytes, optimal
numtiles=1870

# test command for r.terracost, same as above, now with optimal
numtiles=1870
r.terracost in=cost start_rast=urbanarea out=dist_urban_terracost
numtiles=1870

A grid with 29M points is pretty small, for today's machines. I
suggest running on something 10 times larger.

Will do.

  And use a lot of sources, that makes the data access pattern less
local.

Yes, I thought about that too, best would be a lot of evenly distributed
start points, that should force r.cost to do highly random and scattered
disk access. My theory however predicts that the current r.cost will
always be faster than the current r.terracost if r.terracost is run with
numtiles>1. We will see.

BTW, I took the liberty to fix r.terracost, it works now with
numtiles>1. See changelog for r39684
https://trac.osgeo.org/grass/changeset/39684

Markus M

-Laura

----------------------------------------------------------------------

Message: 1
Date: Fri, 06 Nov 2009 09:50:02 +0100
From: Markus Metz <markus.metz.giswork@googlemail.com>
Subject: [GRASS-dev] comparing r.cost and r.terracost
To: GRASS developers list <grass-dev@lists.osgeo.org>
Message-ID: <4AF3E33A.1090208@googlemail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Same test region as before

North Carolina sample dataset

g.region rast=elev_state_500m@PERMANENT res=100
# gives about 28 million cells
v.to.rast nc_state@PERMANENT use=val val=500.0 out=cost --o
v.to.rast urbanarea@PERMANENT use=val val=1 out=urbanarea --o
r.cost in=cost start_rast=urbanarea out=dist_urban percent_memory=20 --o

grass7 r.cost
real 0m55.349s
user 0m53.360s
sys 0m1.797s

grass65 r.cost
real 26m35.166s
user 2m55.612s
sys 23m37.921s

r.terracost: check optimal tile size for 400MB memory (default setting;
r.cost in grass7 used 135MB with 20% of maps in memory)
r.terracost in=cost start_rast=urbanarea out=dist_urban_terracost -i
[...]
TILESIZE: nc_spm_08 N=28064550 elements, M=419430400 bytes, optimal
numtiles=1870

r.terracost numtiles=1870, intermediate data are stored on disk as
r.cost does
real 25m13.593s
user 22m46.978s
sys 1m8.059s

r.terracost numtiles=1, all in memory (just fits into 400MB)
real 0m17.969s
user 0m17.276s
sys 0m0.500s

According to Laura Toma, when comparing r.cost with r.terracost,
numtiles must be >1 for r.terracost in order to compare disk I/O
algorithms [1]

With these test settings that intentionally reduced memory consumption
in order to test disk I/O performance, r.terracost is not really faster
than r.cost in grass65 and much slower than r.cost in grass7.

Markus M

[1] http://lists.osgeo.org/pipermail/grass-dev/2009-July/045157.html

------------------------------

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

End of grass-dev Digest, Vol 43, Issue 8
****************************************

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Hi Markus,

How much memory was available on the machine?

8 GB

If the machine had more than 512MB RAM, it is not fair to run
terracost running with mem=400MB, and compare it with an algorithm
that can use more memory.

I don't understand, both modules can use more than 400MB of memory. I
set r.terracost to use 400MB max and r.cost to use 135MB max. If
anything, this is not fair for r.cost and gives an advantage to
r.terracost. I did not mention that I gave r.terracost another advantage
by assigning the temporary directories to a folder on a separate, very
fast hard drive that had nothing else to do but manage the temp files of
r.terracost. The temp files of r.cost are in the standard grass
.tmp/$HOST directory, in my case that (slower) hard drive also had other
things to do than just manage r.cost's temp files. I really tried to
give r.terracost a head start :wink:

my experience is that , if you want to see how an application would behave with 500 MB of RAM, you have to physically reboot the machine with 500 MB of RAM (it's very easy to do this on a Mac, and relatively easy on Linux. on windows, i don't know).

if the machine has more than 500MB RAM, even if you restrict the application to use less, the system gives it all it can. in your setup, it is almost as if r.cost would run fully in memory, because even it it places the segments on disk, the system file cache fits all segments in memory. the same is true for terracost, its streams fit in memory. but using tiles has a big CPU overhead, which is why it is slower.

when i did some preliminary testing, i rebooted the machine with 512MB RAM, and ran r.cost on grids of 50M-100M cells. it was slow, completely IO bound, and took several hours or more. or if you use 1GB of RAM, you may need to go to larger grids.

However, I am surprised that withnumtiles=1, it was slower than
r.cost.

it looks like i misread your numbers.

BTW, I took the liberty to fix r.terracost, it works now with
numtiles>1. See changelog for r39684
https://trac.osgeo.org/grass/changeset/39684

great, thanks! let's see if terracost is worthwhile :slight_smile:

-Laura

Hi Laura,

Laura Toma wrote:

my experience is that , if you want to see how an application would behave with 500 MB of RAM, you have to physically reboot the machine with 500 MB of RAM (it's very easy to do this on a Mac, and relatively easy on Linux. on windows, i don't know).

if the machine has more than 500MB RAM, even if you restrict the application to use less, the system gives it all it can. in your setup, it is almost as if r.cost would run fully in memory, because even it it places the segments on disk, the system file cache fits all segments in memory. the same is true for terracost, its streams fit in memory. but using tiles has a big CPU overhead, which is why it is slower.

I haven't rebooted my Linux box with less RAM, but I set up a test region with about 312 million cells (details below), I think we can agree that this is for current standards a pretty large region, maybe not in the future. Your argument still holds true that r.cost may have some advantage because its temp files are much smaller than the temp files of r.terracost and therefore a larger proportion can be cached by the system (beyond the control of the module). I could however see a lot of disk IO on both modules.

For 312 million cells, r.cost needed 51 min, r.terracost needed 24 h 22 min, both got 2GB memory.

Now that sounds like really bad news for r.terracost. But this is not the whole truth. First, I had to tweak r.cost a little bit in order to be so fast, still have to come up with a solution to do that tweaking in the module. Second, r.cost may suffer more from memory reduction, not physical RAM reduction, than r.terracost. Reducing the percent_memory option already slows the module down considerably. But that is also true for r.terracost, there the bottleneck seems to be INTERTILE DIJKSTRA which took well over 12 hours with heavy disk IO and full memory consumption. Third, r.cost performs better with less start points keeping region settings constant. I'm not sure if this applies as well to r.terracost.

In summary, I think that on even larger regions, say >1 billion cells, and many small separate start points (>100 000), r.terracost should outperform r.cost, but I would not bet on it :wink: For what I guess is current everyday use (< 100 million cells), r.cost in grass7 might most of the time outperform r.terracost with numtiles>1, sometimes considerably as in my tests. Speed performance of r.cost is variable and dependent on the combination of region size, number and distribution of start points, and the amount of memory it is allowed to use. There may still be some scope for improvement in r.cost, I just did a quick job there, no in-depth code analysis (yet). The extraordinarily large temp files of r.terracost (total 64GB, largest single file was about 56GB, no typo) could be a handicap when processing such large regions. Finally, the results of the tests I did are valid for my test system only, they will be different on other systems.

when i did some preliminary testing, i rebooted the machine with 512MB RAM, and ran r.cost on grids of 50M-100M cells. it was slow, completely IO bound, and took several hours or more. or if you use 1GB of RAM, you may need to go to larger grids.

Please test r.cost in grass7 yourself, and maybe share your test commands, then others can run the tests too and compare.

Here is my test region:

The 312 million cells test region was created in the North Carolina sample dataset with
g.region rast=elev_state_500m@PERMANENT res=40
Then I created a cost layer with
r.mapcalc "cost = 1"
You wanted many start points, so I generated 10000 start points with
v.random output=start_points_10000 n=10000
and converted this vector to raster with
v.to.rast start_points_10000 use=val val=1 out=start_points_10000 --o

The test command for r.cost was
time r.cost input=cost start_rast=start_points_10000 output=dist_random_10000 percent_memory=40 --o
This setting was equivalent to 2 GB of memory.
time:
real 51m18.172s
user 34m4.067s
sys 0m45.100s

For r.terracost, I used as temp dir again a directory on a separate hard drive, faster than the one that r.cost used, so let's say
tmpdir="/path/to/some/fast/dir"
and the test command for r.terracost was
time r.terracost in=cost start_rast=start_points_10000 out=dist_random_10000_terracost STEAM_DIR=$tmpdir VTMPDIR=$tmpdir memory=2000 numtiles=20788 --o
numtiles=20788 I got with r.terracost -i
time:
real 1453m37.022s
user 513m56.549s
sys 43m38.519s

Sorry for that long post!

Markus M

Hi Markus,

Processing a grid of 312 M cells takes about 8 x 312M = 2GB of RAM, so on a machine with 8GB of RAM it will not use virtual memory at all, irrespective of how you tweak it.

With 8GB of RAM, the correct comparison is between r.cost and r.terracost with numtiles=1 (do you have timings for this case?).

In other words, if you tweak r.cost, you also need to tweak r.terracost, which means you run with numtiles=1 for as long as data fits in real memory.

If you want any real numbers on how r.cost behaves with low memory you need to reboot the machine with 1GB or better 512MB of RAM. There is no way around it. Just try it, it is easy to do. I run experiments like this all the time.

-Laura

On Nov 14, 2009, at 6:51 AM, Markus Metz wrote:

Hi Laura,

Laura Toma wrote:

my experience is that , if you want to see how an application would
behave with 500 MB of RAM, you have to physically reboot the machine
with 500 MB of RAM (it's very easy to do this on a Mac, and relatively
easy on Linux. on windows, i don't know).

if the machine has more than 500MB RAM, even if you restrict the
application to use less, the system gives it all it can. in your
setup, it is almost as if r.cost would run fully in memory, because
even it it places the segments on disk, the system file cache fits all
segments in memory. the same is true for terracost, its streams fit in
memory. but using tiles has a big CPU overhead, which is why it is
slower.

I haven't rebooted my Linux box with less RAM, but I set up a test
region with about 312 million cells (details below), I think we can
agree that this is for current standards a pretty large region, maybe
not in the future. Your argument still holds true that r.cost may have
some advantage because its temp files are much smaller than the temp
files of r.terracost and therefore a larger proportion can be cached by
the system (beyond the control of the module). I could however see a lot
of disk IO on both modules.

For 312 million cells, r.cost needed 51 min, r.terracost needed 24 h 22
min, both got 2GB memory.

Now that sounds like really bad news for r.terracost. But this is not
the whole truth. First, I had to tweak r.cost a little bit in order to
be so fast, still have to come up with a solution to do that tweaking in
the module. Second, r.cost may suffer more from memory reduction, not
physical RAM reduction, than r.terracost. Reducing the percent_memory
option already slows the module down considerably. But that is also true
for r.terracost, there the bottleneck seems to be INTERTILE DIJKSTRA
which took well over 12 hours with heavy disk IO and full memory
consumption. Third, r.cost performs better with less start points
keeping region settings constant. I'm not sure if this applies as well
to r.terracost.

In summary, I think that on even larger regions, say >1 billion cells,
and many small separate start points (>100 000), r.terracost should
outperform r.cost, but I would not bet on it :wink: For what I guess is
current everyday use (< 100 million cells), r.cost in grass7 might most
of the time outperform r.terracost with numtiles>1, sometimes
considerably as in my tests. Speed performance of r.cost is variable and
dependent on the combination of region size, number and distribution of
start points, and the amount of memory it is allowed to use. There may
still be some scope for improvement in r.cost, I just did a quick job
there, no in-depth code analysis (yet). The extraordinarily large temp
files of r.terracost (total 64GB, largest single file was about 56GB, no
typo) could be a handicap when processing such large regions. Finally,
the results of the tests I did are valid for my test system only, they
will be different on other systems.

when i did some preliminary testing, i rebooted the machine with 512MB
RAM, and ran r.cost on grids of 50M-100M cells. it was slow,
completely IO bound, and took several hours or more. or if you use 1GB
of RAM, you may need to go to larger grids.

Please test r.cost in grass7 yourself, and maybe share your test
commands, then others can run the tests too and compare.

Here is my test region:

The 312 million cells test region was created in the North Carolina
sample dataset with
g.region rast=elev_state_500m@PERMANENT res=40
Then I created a cost layer with
r.mapcalc "cost = 1"
You wanted many start points, so I generated 10000 start points with
v.random output=start_points_10000 n=10000
and converted this vector to raster with
v.to.rast start_points_10000 use=val val=1 out=start_points_10000 --o

The test command for r.cost was
time r.cost input=cost start_rast=start_points_10000
output=dist_random_10000 percent_memory=40 --o
This setting was equivalent to 2 GB of memory.
time:
real 51m18.172s
user 34m4.067s
sys 0m45.100s

For r.terracost, I used as temp dir again a directory on a separate hard
drive, faster than the one that r.cost used, so let's say
tmpdir="/path/to/some/fast/dir"
and the test command for r.terracost was
time r.terracost in=cost start_rast=start_points_10000
out=dist_random_10000_terracost STEAM_DIR=$tmpdir VTMPDIR=$tmpdir
memory=2000 numtiles=20788 --o
numtiles=20788 I got with r.terracost -i
time:
real 1453m37.022s
user 513m56.549s
sys 43m38.519s

Sorry for that long post!

Markus M

Hi Laura,

Laura Toma wrote:

Hi Markus,

Processing a grid of 312 M cells takes about 8 x 312M = 2GB of RAM,

That is only true for r.terracost with numtiles=1, because r.terracost stores costs as float. Is it possible that there is a bug in r.terracost when using numtiles > 1, because creating 64GB of temporary files seems a bit inordinate for 2GB of data? And if r.terracost would use double for costs, it would be about 130GB of temporary files? OK, disk space is nearly for free nowadays.
r.cost stores costs as double, so the size of temporary files is about 4GB. Additionally, 2GB where used for processing, i.e. at least 6GB of system RAM are required to also keep cached files in RAM.

so on a machine with 8GB of RAM it will not use virtual memory at all, irrespective of how you tweak it.

Right, but it still uses the disk IO algorithm and reads from/writes to disk.

With 8GB of RAM, the correct comparison is between r.cost and r.terracost with numtiles=1

I don't think so because r.cost still uses its disk IO algorithm while r.terracost doesn't. That's like comparing r.watershed in ram mode with r.terraflow. A module not using a disk IO algorithm should always be faster than a corresponding module using a disk IO algorithm, as long as intermediate data fit into RAM.

In other words, if you tweak r.cost, you also need to tweak r.terracost, which means you run with numtiles=1 for as long as data fits in real memory.

I tweaked the disk IO algorithm to be faster, not to use less disk space. I can also do serious tweaking and write a true all-in-memory version of r.cost and compare that to r.terracost numtiles=1, but I'm interested in the performance of r.cost with the disk IO algorithm and thus compare it to r.terracost with its disk IO algorithm (requires numtiles > 1).

If you want any real numbers on how r.cost behaves with low memory you need to reboot the machine with 1GB or better 512MB of RAM. There is no way around it. Just try it, it is easy to do. I run experiments like this all the time.

OK, would you mind running experiments with r.cost in grass7 and r.terracost numtiles>1 so you can see for yourself?

I rebooted with 2500MB of RAM in order to run the same test command as before on the 312 million cells region, giving about 2000MB of RAM to r.cost, same like before. I used the same region and start points as before because I think these settings are challenging for r.cost. My test system went into swap space, all memory was used up (system file cache was in swap anyway, OS needs some RAM too), and r.cost took, as expected, longer, namely 4 hours 10 min.

Still much less than the 24 hours 22 min of r.terracost with memory=2000 and 8GB of system RAM...

The latest version of r.cost (r39749) needs 2 hours 30 min with 2500MB of RAM and 2000MB of RAM assigned to it, remainder used by OS.

From a user's perspective, one reason or side-effect concerning modules with disk IO algorithms is IMO that you do not need to use up all available system memory and can still do other things in parallel, so I would always assign max 75% percent of RAM to these modules and can still do other work, potentially preventing the system from caching files.

BTW, there was a typo in my g.region command, must be res=30 in order to get 312 million cells, sorry!

Markus M

-Laura

On Nov 14, 2009, at 6:51 AM, Markus Metz wrote:

Hi Laura,

Laura Toma wrote:

my experience is that , if you want to see how an application would
behave with 500 MB of RAM, you have to physically reboot the machine
with 500 MB of RAM (it's very easy to do this on a Mac, and relatively
easy on Linux. on windows, i don't know).

if the machine has more than 500MB RAM, even if you restrict the
application to use less, the system gives it all it can. in your
setup, it is almost as if r.cost would run fully in memory, because
even it it places the segments on disk, the system file cache fits all
segments in memory. the same is true for terracost, its streams fit in
memory. but using tiles has a big CPU overhead, which is why it is
slower.

I haven't rebooted my Linux box with less RAM, but I set up a test
region with about 312 million cells (details below), I think we can
agree that this is for current standards a pretty large region, maybe
not in the future. Your argument still holds true that r.cost may have
some advantage because its temp files are much smaller than the temp
files of r.terracost and therefore a larger proportion can be cached by
the system (beyond the control of the module). I could however see a lot
of disk IO on both modules.

For 312 million cells, r.cost needed 51 min, r.terracost needed 24 h 22
min, both got 2GB memory.

Now that sounds like really bad news for r.terracost. But this is not
the whole truth. First, I had to tweak r.cost a little bit in order to
be so fast, still have to come up with a solution to do that tweaking in
the module. Second, r.cost may suffer more from memory reduction, not
physical RAM reduction, than r.terracost. Reducing the percent_memory
option already slows the module down considerably. But that is also true
for r.terracost, there the bottleneck seems to be INTERTILE DIJKSTRA
which took well over 12 hours with heavy disk IO and full memory
consumption. Third, r.cost performs better with less start points
keeping region settings constant. I'm not sure if this applies as well
to r.terracost.

In summary, I think that on even larger regions, say >1 billion cells,
and many small separate start points (>100 000), r.terracost should
outperform r.cost, but I would not bet on it :wink: For what I guess is
current everyday use (< 100 million cells), r.cost in grass7 might most
of the time outperform r.terracost with numtiles>1, sometimes
considerably as in my tests. Speed performance of r.cost is variable and
dependent on the combination of region size, number and distribution of
start points, and the amount of memory it is allowed to use. There may
still be some scope for improvement in r.cost, I just did a quick job
there, no in-depth code analysis (yet). The extraordinarily large temp
files of r.terracost (total 64GB, largest single file was about 56GB, no
typo) could be a handicap when processing such large regions. Finally,
the results of the tests I did are valid for my test system only, they
will be different on other systems.

when i did some preliminary testing, i rebooted the machine with 512MB
RAM, and ran r.cost on grids of 50M-100M cells. it was slow,
completely IO bound, and took several hours or more. or if you use 1GB
of RAM, you may need to go to larger grids.

Please test r.cost in grass7 yourself, and maybe share your test
commands, then others can run the tests too and compare.

Here is my test region:

The 312 million cells test region was created in the North Carolina
sample dataset with
g.region rast=elev_state_500m@PERMANENT res=40
Then I created a cost layer with
r.mapcalc "cost = 1"
You wanted many start points, so I generated 10000 start points with
v.random output=start_points_10000 n=10000
and converted this vector to raster with
v.to.rast start_points_10000 use=val val=1 out=start_points_10000 --o

The test command for r.cost was
time r.cost input=cost start_rast=start_points_10000
output=dist_random_10000 percent_memory=40 --o
This setting was equivalent to 2 GB of memory.
time:
real 51m18.172s
user 34m4.067s
sys 0m45.100s

For r.terracost, I used as temp dir again a directory on a separate hard
drive, faster than the one that r.cost used, so let's say
tmpdir="/path/to/some/fast/dir"
and the test command for r.terracost was
time r.terracost in=cost start_rast=start_points_10000
out=dist_random_10000_terracost STEAM_DIR=$tmpdir VTMPDIR=$tmpdir
memory=2000 numtiles=20788 --o
numtiles=20788 I got with r.terracost -i
time:
real 1453m37.022s
user 513m56.549s
sys 43m38.519s

Sorry for that long post!

Markus M

Hi Markus,

Your conclusions are based on the hypothesis that you can model the performance of r.cost in the presence of low memory by tweaking the memory limit in the code and using a machine with a large physical memory. I don't think that this hypothesis is true, and here is the evidence so far:

r.cost on a machine with 8GB physical memory: 1h
r.cost on a machine with 2.5GB physical memory: 4h

If you reboot the machine with 1GB RAM, you will see the running time go up (by a lot). Afterwards, try rebooting with 512GB RAM. I have run similar tests in the past, and r.cost did not finish in 40 hours. It may be better now, and you are the best person to re-try these tests as you know how to tweak it.

I'll get back about what terracost is doing and why it has such large files after we see these new numbers.

-Laura

On Nov 17, 2009, at 3:51 PM, Markus Metz wrote:

Hi Laura,

Laura Toma wrote:

Hi Markus,

Processing a grid of 312 M cells takes about 8 x 312M = 2GB of RAM,

That is only true for r.terracost with numtiles=1, because r.terracost
stores costs as float. Is it possible that there is a bug in r.terracost
when using numtiles > 1, because creating 64GB of temporary files seems
a bit inordinate for 2GB of data? And if r.terracost would use double
for costs, it would be about 130GB of temporary files? OK, disk space is
nearly for free nowadays.
r.cost stores costs as double, so the size of temporary files is about
4GB. Additionally, 2GB where used for processing, i.e. at least 6GB of
system RAM are required to also keep cached files in RAM.

so on a machine with 8GB of RAM it will not use virtual memory at all,
irrespective of how you tweak it.

Right, but it still uses the disk IO algorithm and reads from/writes to
disk.

With 8GB of RAM, the correct comparison is between r.cost and
r.terracost with numtiles=1

I don't think so because r.cost still uses its disk IO algorithm while
r.terracost doesn't. That's like comparing r.watershed in ram mode with
r.terraflow. A module not using a disk IO algorithm should always be
faster than a corresponding module using a disk IO algorithm, as long as
intermediate data fit into RAM.

In other words, if you tweak r.cost, you also need to tweak
r.terracost, which means you run with numtiles=1 for as long as data
fits in real memory.

I tweaked the disk IO algorithm to be faster, not to use less disk
space. I can also do serious tweaking and write a true all-in-memory
version of r.cost and compare that to r.terracost numtiles=1, but I'm
interested in the performance of r.cost with the disk IO algorithm and
thus compare it to r.terracost with its disk IO algorithm (requires
numtiles > 1).

If you want any real numbers on how r.cost behaves with low memory you
need to reboot the machine with 1GB or better 512MB of RAM. There is
no way around it. Just try it, it is easy to do. I run experiments
like this all the time.

OK, would you mind running experiments with r.cost in grass7 and
r.terracost numtiles>1 so you can see for yourself?

I rebooted with 2500MB of RAM in order to run the same test command as
before on the 312 million cells region, giving about 2000MB of RAM to
r.cost, same like before. I used the same region and start points as
before because I think these settings are challenging for r.cost. My
test system went into swap space, all memory was used up (system file
cache was in swap anyway, OS needs some RAM too), and r.cost took, as
expected, longer, namely 4 hours 10 min.

Still much less than the 24 hours 22 min of r.terracost with memory=2000
and 8GB of system RAM...

The latest version of r.cost (r39749) needs 2 hours 30 min with 2500MB
of RAM and 2000MB of RAM assigned to it, remainder used by OS.

From a user's perspective, one reason or side-effect concerning modules
with disk IO algorithms is IMO that you do not need to use up all
available system memory and can still do other things in parallel, so I
would always assign max 75% percent of RAM to these modules and can
still do other work, potentially preventing the system from caching files.

BTW, there was a typo in my g.region command, must be res=30 in order to
get 312 million cells, sorry!

Markus M

-Laura

On Nov 14, 2009, at 6:51 AM, Markus Metz wrote:

Hi Laura,

Laura Toma wrote:

my experience is that , if you want to see how an application would
behave with 500 MB of RAM, you have to physically reboot the machine
with 500 MB of RAM (it's very easy to do this on a Mac, and relatively
easy on Linux. on windows, i don't know).

if the machine has more than 500MB RAM, even if you restrict the
application to use less, the system gives it all it can. in your
setup, it is almost as if r.cost would run fully in memory, because
even it it places the segments on disk, the system file cache fits all
segments in memory. the same is true for terracost, its streams fit in
memory. but using tiles has a big CPU overhead, which is why it is
slower.

I haven't rebooted my Linux box with less RAM, but I set up a test
region with about 312 million cells (details below), I think we can
agree that this is for current standards a pretty large region, maybe
not in the future. Your argument still holds true that r.cost may have
some advantage because its temp files are much smaller than the temp
files of r.terracost and therefore a larger proportion can be cached by
the system (beyond the control of the module). I could however see a lot
of disk IO on both modules.

For 312 million cells, r.cost needed 51 min, r.terracost needed 24 h 22
min, both got 2GB memory.

Now that sounds like really bad news for r.terracost. But this is not
the whole truth. First, I had to tweak r.cost a little bit in order to
be so fast, still have to come up with a solution to do that tweaking in
the module. Second, r.cost may suffer more from memory reduction, not
physical RAM reduction, than r.terracost. Reducing the percent_memory
option already slows the module down considerably. But that is also true
for r.terracost, there the bottleneck seems to be INTERTILE DIJKSTRA
which took well over 12 hours with heavy disk IO and full memory
consumption. Third, r.cost performs better with less start points
keeping region settings constant. I'm not sure if this applies as well
to r.terracost.

In summary, I think that on even larger regions, say >1 billion cells,
and many small separate start points (>100 000), r.terracost should
outperform r.cost, but I would not bet on it :wink: For what I guess is
current everyday use (< 100 million cells), r.cost in grass7 might most
of the time outperform r.terracost with numtiles>1, sometimes
considerably as in my tests. Speed performance of r.cost is variable and
dependent on the combination of region size, number and distribution of
start points, and the amount of memory it is allowed to use. There may
still be some scope for improvement in r.cost, I just did a quick job
there, no in-depth code analysis (yet). The extraordinarily large temp
files of r.terracost (total 64GB, largest single file was about 56GB, no
typo) could be a handicap when processing such large regions. Finally,
the results of the tests I did are valid for my test system only, they
will be different on other systems.

when i did some preliminary testing, i rebooted the machine with 512MB
RAM, and ran r.cost on grids of 50M-100M cells. it was slow,
completely IO bound, and took several hours or more. or if you use 1GB
of RAM, you may need to go to larger grids.

Please test r.cost in grass7 yourself, and maybe share your test
commands, then others can run the tests too and compare.

Here is my test region:

The 312 million cells test region was created in the North Carolina
sample dataset with
g.region rast=elev_state_500m@PERMANENT res=40
Then I created a cost layer with
r.mapcalc "cost = 1"
You wanted many start points, so I generated 10000 start points with
v.random output=start_points_10000 n=10000
and converted this vector to raster with
v.to.rast start_points_10000 use=val val=1 out=start_points_10000 --o

The test command for r.cost was
time r.cost input=cost start_rast=start_points_10000
output=dist_random_10000 percent_memory=40 --o
This setting was equivalent to 2 GB of memory.
time:
real 51m18.172s
user 34m4.067s
sys 0m45.100s

For r.terracost, I used as temp dir again a directory on a separate hard
drive, faster than the one that r.cost used, so let's say
tmpdir="/path/to/some/fast/dir"
and the test command for r.terracost was
time r.terracost in=cost start_rast=start_points_10000
out=dist_random_10000_terracost STEAM_DIR=$tmpdir VTMPDIR=$tmpdir
memory=2000 numtiles=20788 --o
numtiles=20788 I got with r.terracost -i
time:
real 1453m37.022s
user 513m56.549s
sys 43m38.519s

Sorry for that long post!

Markus M

Hi Laura,

Laura Toma wrote:

Hi Markus,

Your conclusions are based on the hypothesis that you can model the performance of r.cost in the presence of low memory by tweaking the memory limit in the code

I think I improved disk IO, AFAICT memory limits in r.cost are handled as before through the percent_memory option. BTW, I added an -i option to r.cost, reporting estimated memory and disk space usage for the given percent_memory option, inspired by r.terracost.

and using a machine with a large physical memory. I don't think that this hypothesis is true, and here is the evidence so far:

r.cost on a machine with 8GB physical memory: 1h
r.cost on a machine with 2.5GB physical memory: 4h

r.terracost memory=2000 on a machine with 8GB physical memory: 24h

r.terracost numtiles=1 would need here 2380MB + xMB for Dijkstra Search, but there are only 2000MB free.
r.cost on a machine with 2.5GB physical memory: >24h

If you reboot the machine with 1GB RAM, you will see the running time go up (by a lot).

Please test if the running time of r.cost reaches the running time of t.terracost.

I have run similar tests in the past, and r.cost did not finish in 40 hours. It may be better now

It may be so. You're welcome to test yourself. Unfortunately, I can't afford (time contraints...) to do further testing myself, but will continue as soon as I have some spare time. r.cost seems particularly challenging
with regard to external memory, and I like challenges :wink:

Markus M

, and you are the best person to re-try these tests as you know how to tweak it.

I'll get back about what terracost is doing and why it has such large files after we see these new numbers.

-Laura

On Nov 17, 2009, at 3:51 PM, Markus Metz wrote:

Hi Laura,

Laura Toma wrote:

Hi Markus,

Processing a grid of 312 M cells takes about 8 x 312M = 2GB of RAM,

That is only true for r.terracost with numtiles=1, because r.terracost
stores costs as float. Is it possible that there is a bug in r.terracost
when using numtiles > 1, because creating 64GB of temporary files seems
a bit inordinate for 2GB of data? And if r.terracost would use double
for costs, it would be about 130GB of temporary files? OK, disk space is
nearly for free nowadays.
r.cost stores costs as double, so the size of temporary files is about
4GB. Additionally, 2GB where used for processing, i.e. at least 6GB of
system RAM are required to also keep cached files in RAM.

so on a machine with 8GB of RAM it will not use virtual memory at all,
irrespective of how you tweak it.

Right, but it still uses the disk IO algorithm and reads from/writes to
disk.

With 8GB of RAM, the correct comparison is between r.cost and
r.terracost with numtiles=1

I don't think so because r.cost still uses its disk IO algorithm while
r.terracost doesn't. That's like comparing r.watershed in ram mode with
r.terraflow. A module not using a disk IO algorithm should always be
faster than a corresponding module using a disk IO algorithm, as long as
intermediate data fit into RAM.

In other words, if you tweak r.cost, you also need to tweak
r.terracost, which means you run with numtiles=1 for as long as data
fits in real memory.

I tweaked the disk IO algorithm to be faster, not to use less disk
space. I can also do serious tweaking and write a true all-in-memory
version of r.cost and compare that to r.terracost numtiles=1, but I'm
interested in the performance of r.cost with the disk IO algorithm and
thus compare it to r.terracost with its disk IO algorithm (requires
numtiles > 1).

If you want any real numbers on how r.cost behaves with low memory you
need to reboot the machine with 1GB or better 512MB of RAM. There is
no way around it. Just try it, it is easy to do. I run experiments
like this all the time.

OK, would you mind running experiments with r.cost in grass7 and
r.terracost numtiles>1 so you can see for yourself?

I rebooted with 2500MB of RAM in order to run the same test command as
before on the 312 million cells region, giving about 2000MB of RAM to
r.cost, same like before. I used the same region and start points as
before because I think these settings are challenging for r.cost. My
test system went into swap space, all memory was used up (system file
cache was in swap anyway, OS needs some RAM too), and r.cost took, as
expected, longer, namely 4 hours 10 min.

Still much less than the 24 hours 22 min of r.terracost with memory=2000
and 8GB of system RAM...

The latest version of r.cost (r39749) needs 2 hours 30 min with 2500MB
of RAM and 2000MB of RAM assigned to it, remainder used by OS.

>From a user's perspective, one reason or side-effect concerning modules
with disk IO algorithms is IMO that you do not need to use up all
available system memory and can still do other things in parallel, so I
would always assign max 75% percent of RAM to these modules and can
still do other work, potentially preventing the system from caching files.

BTW, there was a typo in my g.region command, must be res=30 in order to
get 312 million cells, sorry!

Markus M

-Laura

On Nov 14, 2009, at 6:51 AM, Markus Metz wrote:

Hi Laura,

Laura Toma wrote:

my experience is that , if you want to see how an application would
behave with 500 MB of RAM, you have to physically reboot the machine
with 500 MB of RAM (it's very easy to do this on a Mac, and relatively
easy on Linux. on windows, i don't know).

if the machine has more than 500MB RAM, even if you restrict the
application to use less, the system gives it all it can. in your
setup, it is almost as if r.cost would run fully in memory, because
even it it places the segments on disk, the system file cache fits all
segments in memory. the same is true for terracost, its streams fit in
memory. but using tiles has a big CPU overhead, which is why it is
slower.

I haven't rebooted my Linux box with less RAM, but I set up a test
region with about 312 million cells (details below), I think we can
agree that this is for current standards a pretty large region, maybe
not in the future. Your argument still holds true that r.cost may have
some advantage because its temp files are much smaller than the temp
files of r.terracost and therefore a larger proportion can be cached by
the system (beyond the control of the module). I could however see a lot
of disk IO on both modules.

For 312 million cells, r.cost needed 51 min, r.terracost needed 24 h 22
min, both got 2GB memory.

Now that sounds like really bad news for r.terracost. But this is not
the whole truth. First, I had to tweak r.cost a little bit in order to
be so fast, still have to come up with a solution to do that tweaking in
the module. Second, r.cost may suffer more from memory reduction, not
physical RAM reduction, than r.terracost. Reducing the percent_memory
option already slows the module down considerably. But that is also true
for r.terracost, there the bottleneck seems to be INTERTILE DIJKSTRA
which took well over 12 hours with heavy disk IO and full memory
consumption. Third, r.cost performs better with less start points
keeping region settings constant. I'm not sure if this applies as well
to r.terracost.

In summary, I think that on even larger regions, say >1 billion cells,
and many small separate start points (>100 000), r.terracost should
outperform r.cost, but I would not bet on it :wink: For what I guess is
current everyday use (< 100 million cells), r.cost in grass7 might most
of the time outperform r.terracost with numtiles>1, sometimes
considerably as in my tests. Speed performance of r.cost is variable and
dependent on the combination of region size, number and distribution of
start points, and the amount of memory it is allowed to use. There may
still be some scope for improvement in r.cost, I just did a quick job
there, no in-depth code analysis (yet). The extraordinarily large temp
files of r.terracost (total 64GB, largest single file was about 56GB, no
typo) could be a handicap when processing such large regions. Finally,
the results of the tests I did are valid for my test system only, they
will be different on other systems.

when i did some preliminary testing, i rebooted the machine with 512MB
RAM, and ran r.cost on grids of 50M-100M cells. it was slow,
completely IO bound, and took several hours or more. or if you use 1GB
of RAM, you may need to go to larger grids.

Please test r.cost in grass7 yourself, and maybe share your test
commands, then others can run the tests too and compare.

Here is my test region:

The 312 million cells test region was created in the North Carolina
sample dataset with
g.region rast=elev_state_500m@PERMANENT res=40
Then I created a cost layer with
r.mapcalc "cost = 1"
You wanted many start points, so I generated 10000 start points with
v.random output=start_points_10000 n=10000
and converted this vector to raster with
v.to.rast start_points_10000 use=val val=1 out=start_points_10000 --o

The test command for r.cost was
time r.cost input=cost start_rast=start_points_10000
output=dist_random_10000 percent_memory=40 --o
This setting was equivalent to 2 GB of memory.
time:
real 51m18.172s
user 34m4.067s
sys 0m45.100s

For r.terracost, I used as temp dir again a directory on a separate hard
drive, faster than the one that r.cost used, so let's say
tmpdir="/path/to/some/fast/dir"
and the test command for r.terracost was
time r.terracost in=cost start_rast=start_points_10000
out=dist_random_10000_terracost STEAM_DIR=$tmpdir VTMPDIR=$tmpdir
memory=2000 numtiles=20788 --o
numtiles=20788 I got with r.terracost -i
time:
real 1453m37.022s
user 513m56.549s
sys 43m38.519s

Sorry for that long post!

Markus M