Hello all
You might find it easier to read the following text in a gist
Introduction
I was trying to write a decorator/contextmanager that would temporary change the
computational region, but while using it I noticed that there was some overhead the root
of which seemed to be the usage of the GRASS modules. So in order to quantify this
I wrote a small benchmark that tries to measure the overhead of calling a GRASS Module.
This is what I found.
tl;dr
Calling a GRASS module incurs a constant but measurable overhead which, in certain
cases, e.g. when writing a module that uses a lot of the other modules, can quickly
add up to a significant quantity.
Disclaimer
If you try to run the benchmark on your own PC, the actual timings you will get will
probably be different. The differences might be caused by having:
- a stronger/weaker CPU
- faster/slower hard disk.
- using different Python version
- using different compilation flags
Still, I think that some of the findings are reproducible.
For reference, I used:
- OS: Linux 5.0.9
- CPU: Intel i7 6700HQ
- Disk type: SSD
- Python: 3.7
CFLAGS='-O2 -fPIC -march=native -std=gnu99'
Demonstration
The easiest way to demonstrate the performance difference between using a GRASS module
vs using the GRASS API is to run the following snippets.
Both of them do exactly the same thing, i.e. retrieve the current region settings 10
times in a row. The performance difference is huge though. On my laptop, the first one
needs 0.36 seconds while the second one needs just 0.00038 seconds. That’s almost
a 1000x difference…
import time
import grass.script as gscript
start = time.time()
for i in range(10):
region = gscript.parse_command("g.region", flags="g")
end = time.time()
total = end - start
print("Total time: %s" % total)
vs
import time
from grass.pygrass.gis.region import Region
start = time.time()
for i in range(10):
region = Region()
end = time.time()
total = end - start
print("Total time: %s" % total)
How much is the overhead exactly?
In order to measure the actual overhead of calling a GRASS module, I created two new
GRASS modules that all they do is parse the command line arguments and measured how much
time is needed for their execution. The first module is r.simple
and is
implemented in Python while the other one is r.simple.c
and is implemented in C.
The timings are in msec and the benchmark was executed using Python 3.7
call method | r.simple | r.simple.c |
---|---|---|
pygrass.module.Module | 85.9 | 66.5 |
pygrass.module.Module.shortcut | 85.5 | 66.9 |
grass.script.run_command | 41.3 | 30.5 |
subprocess.check_call | 41.8 | 30.3 |
As we can see, gsrcipt.run_command
and subprocess
give more or less a identical
results, which is to be expected since run_command
+ friends are just a thin wrapper
around subprocess
. Similarly shortcuts
has the same overhead as “vanila”
pygrass.Module
. Nevertheless, it is obvious that pygrass
is roughly 2x times slower
than grass.script
(but more about that later).
As far as C vs Python goes, on my computer modules implemented in C seem to be 25% faster than their
Python counterparts. Nevertheless, a 40 msec startup time doesn’t seem extraordinary
for a Python script, while 30 msec feels rather large for a CLI application implemented
in C.
Where is all that time being spent?
C Modules
Unfortunately, I am not familiar enough with the GRASS internals to easily check what is
going on and I didn’t have the time to try to profile the code. I suspect that
G_gisinit
or something similar is causing the overhead, but someone more familiar with
the C API should be able to enlighten us.
Python Modules
In order to gain a better understanding of the overhead we have when calling python
modules, we also need to measure the following quantities:
- The time that python needs to spawn a new process
- The startup time for the python interpreter
- The time that python needs to
import grass
These are the results:
msec | |
---|---|
subprocess spawn | 1.2 |
python 2 startup | 9.0 |
python 2 + import grass | 24.5 |
python 3 startup | 18.2 |
python 3 + import grass | 39.3 |
As we can see:
- the overhead of spawning a new process from within python is more or less negligible
(at least compared to the other quantities). - The overhead of spawning a python 3 interpreter is 2x bigger than Python 2; i.e. the
transition to Python 3 will have some performance impact, no matter what. - The overhead of spawning a python 3 interpreter accounts for roughly 50% of the total
overhead (18 msec out of 41 msec). - The other 50% of the overhead is pretty much caused just by importing the
grass
library.
Why is Pygrass 2x times slower?
I haven’t carefully looked into this (i.e. I haven’t profiled the code), but it seems
that the culprit is this
line.
In other words, pygrass
calls a module twice, once to get the interface’s description
and once to actually run the module. That’s why the overhead is double both for
C modules and Python ones.
Is this truly a problem?
Well, it depends
The most important factor is probably the size of the computational region. If you are
dealing with really large regions, then the overhead is probably miniscule. If you are
dealing with much smaller ones though, then it can be significant.
That being said, there are at least two cases where I think that this overhead is
important:
- interactive sessions
- running tests
Just to give an example, i.pansharpen
calls ~50 other GRASS modules. On my laptop, the
overhead for these calls is almost 2 seconds. Is this a lot? Well, if you are
pansharpening Landsat Tiles, it probably is not that much, but you will have this
overhead even if you are pansharpening a 4 pixel map (e.g. when running tests).
I am pretty sure that there are numerous other modules that are just like that, too.
What is to be done?
I think that there are two different areas where work can be done:
- Reduce the actual overhead; i.e. make calling GRASS modules faster.
- Convert calls to GRASS modules with API calls which, as shown earlier, are orders of
magnitude faster.
Reduce the overhead
The overhead of Python and C modules has different causes.
As I said, I haven’t looked into what causes C modules to take so long so i can’t make
any suggestions. This should be further looked into though. The reason is that modules
like e.g. g.region
are being used practically everywhere, so speeding these up should
give a measureable performance improvement.
As far as Python modules go, unfortunately, the overhead seems to be relatively
inelastic. Speeding up the startup time of the Python interpreter is not something that
GRASS can do or rely upon. So, the only thing that can be actually be done is to try to
reduce the time needed for importing the grass
library. That being said, this should
probably not skim more than a few ms at best, but at least the gains should be there for
all python modules.
Luckily there is at least one relatively low hanging fruit. Speeding pygrass should not
be that difficult. I haven’t looked into this but pre-generating the modules’ XML
descriptions seems feasible and it should remove the need to call each module twice,
thus making pygrass performance comparable to grass.script.run_command
Convert module calls to API calls
Converting the module calls to API calls is what can potentially give the bigger
benefits, but at the same time is what needs the most work.
There are some low hanging fruits here too. E.g. functions like use_temp_region()
and
raster_info()
can probably be refactored to use the pygrass API, while calls to e.g.
g.region
can probably be replaced with pygrass.gis.Region
objects etc.
Nevertheless, this does not really touch the root of the problem which IMHV is the tight
coupling between the GRASS modules functionality and the CLI. In layman terms, at the
moment, if you want to use module A from module B you are forced to spawn a new process
and suffer the overhead that this entails.
TBH, I am not sure if this a problem that can even be tackled at this stage, but if each
module had one or more functions that could be imported/called by other modules,
everything would be much easier and performance would be significantly better.
How to run the benchmark?
If you want to run this benchmark on your own you need to:
git remote add panos [https://github.com/pmav99/grass-ci.git
](https://github.com/pmav99/grass-ci.git`)git fetch --all
git checkout overhead
cd scripts/r.simple && make && cd ../../
cd raster/r.simple.c && make && cd ../../
- Start a grass session
python benchmark.py
I haven’t run the benchmark under Python 2, but even if there are incompatibilities,
they should be trivial to fix.
I will be happy to hear any remarks
with kind regards,
Panos