[GRASS-dev] [GRASS GIS] #2033: Moving g.pnmcomp to lib/display to improve render performance of wxGUI

#2033: Moving g.pnmcomp to lib/display to improve render performance of wxGUI
----------------------------------------------+-----------------------------
Reporter: huhabla | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.0.0
Component: wxGUI | Version: svn-trunk
Keywords: display, Python, multiprocessing | Platform: All
      Cpu: All |
----------------------------------------------+-----------------------------
I would like to move the code of g.pnmcomp into the display library. So we
can call it directly as C-function in the wxGUI to avoid file IO and to
speedup the rendering process.

I will use the Python multiprocessing module to avoid the crash of the GUI
in case of a segmentation fault or in case of an exit call when a fatal
error occurs. Hence, to call the C-Library function via ctypes or PyGRASS,
a new process will be spawned. All data will be exchanged using Python
objects and multiprocessing queues between the wxGUI and its sub-process
child's. This can also be used to run several processes in parallel as
g.gui.animation already does.

I have implemented a prototype that makes use of this concept. The code is
attached as diff to the current grass7 svn-trunk version. The code can be
seen as a proof of concept that shows how it might work. The code will
also show that it is still possible to call g.pnmcomp as usual.

This concept may also lead to a new implementation guideline: to use more
C-Library functions in the wxGUI to speedup the visualization.

My question would be if this is also possible with d.rast, d.vect and
other display modules? Hence, moving the code from these modules into the
display library and calling these functions from dedicated wxGUI sub-
processes to speed up the rendering?

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2033&gt;
GRASS GIS <http://grass.osgeo.org>

#2033: Moving g.pnmcomp to lib/display to improve render performance of wxGUI
----------------------------------------------+-----------------------------
Reporter: huhabla | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.0.0
Component: wxGUI | Version: svn-trunk
Keywords: display, Python, multiprocessing | Platform: All
      Cpu: All |
----------------------------------------------+-----------------------------

Comment(by glynn):

Replying to [ticket:2033 huhabla]:

> My question would be if this is also possible with d.rast, d.vect and
other display modules? Hence, moving the code from these modules into the
display library and calling these functions from dedicated wxGUI sub-
processes to speed up the rendering?

Possible? Probably. Sane? No.

Moving the guts of d.rast/d.vect/etc around won't make it run any faster.
If the issue is with the communication of the raster data, there are
faster methods than reading and writing PNM files.

Both the PNG and cairo drivers support reading and writing 32-bpp BMP
files where the raster data is correctly aligned for memory mapping.
Setting GRASS_PNGFILE to a filename with a .bmp suffix selects this
format, and setting GRASS_PNG_MAPPED=TRUE causes the drivers to mmap() the
file rather than using read() and write().

Once you have d.* commands generating BMP files, it shouldn't be necessary
to add any binary blobs to wxGUI. Compositing should be perfectly viable
within Python using either numpy, PIL or wxPython (having wxPython perform
the compositing during rendering may be able to take advantage of video
hardware).

Additionally, on X11 (and provided that the cairo library supports it),
the cairo driver supports rendering directly into an X pixmap which is
retained in the server (typically in video memory) after the d.* program
terminates. This has the added advantage that rendering will be performed
using the video hardware.

Setting GRASS_PNGFILE to a filename ending in ".xid" selects this option;
the XID of the pixmap will be written to that file as a hexadecimal value.
The g.cairocomp module can composite these pixmaps without the image data
ever leaving video memory ("g.cairocomp -d ..." can be used to delete the
pixmaps from the server).

The only missing piece of the puzzle is a way to get wxPython to use an
existing pixmap (ideally without pulling it into client memory then
pushing it back out to the server). The cleanest approach would be via
pycairo and wx.lib.wxcairo, which would also allow g.cairocomp to be
eliminated, but that's yet another dependency.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2033#comment:1&gt;
GRASS GIS <http://grass.osgeo.org>

#2033: Moving g.pnmcomp to lib/display to improve render performance of wxGUI
----------------------------------------------+-----------------------------
Reporter: huhabla | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.0.0
Component: wxGUI | Version: svn-trunk
Keywords: display, Python, multiprocessing | Platform: All
      Cpu: All |
----------------------------------------------+-----------------------------

Comment(by huhabla):

Replying to [comment:1 glynn]:
> Replying to [ticket:2033 huhabla]:
>
> > My question would be if this is also possible with d.rast, d.vect and
other display modules? Hence, moving the code from these modules into the
display library and calling these functions from dedicated wxGUI sub-
processes to speed up the rendering?
>
> Possible? Probably. Sane? No.
>
> Moving the guts of d.rast/d.vect/etc around won't make it run any
faster. If the issue is with the communication of the raster data, there
are faster methods than reading and writing PNM files.

I have the hope to speed up the composition by avoiding disc I/O.

> Both the PNG and cairo drivers support reading and writing 32-bpp BMP
files where the raster data is correctly aligned for memory mapping.
Setting GRASS_PNGFILE to a filename with a .bmp suffix selects this
format, and setting GRASS_PNG_MAPPED=TRUE causes the drivers to mmap() the
file rather than using read() and write().

As far as i understand mmap(), it is file backed and reads/writes the data
from the file on demand into the shared memory? An exception is anonymous
mapping, but is this also supported on windows? How can we access the
anonymous mmap() from wxPython?

> Once you have d.* commands generating BMP files, it shouldn't be
necessary to add any binary blobs to wxGUI. Compositing should be
perfectly viable within Python using either numpy, PIL or wxPython (having
wxPython perform the compositing during rendering may be able to take
advantage of video hardware).

What do you mean with binary blobs? Binary large objects? Well as i can
see from the wx description, there is no way around blobs since even numpy
arrays must be converted into a bytearray or similar to create a wx image.
Does wxPython take advantage of the video hardware? IMHO we can also
implement a OpenCL version of the PNM image composition. In this case it
would be a large advantage to have the images created by d.rast and d.vect
in a shared memory area as well to avoid disk I/O.

> Additionally, on X11 (and provided that the cairo library supports it),
the cairo driver supports rendering directly into an X pixmap which is
retained in the server (typically in video memory) after the d.* program
terminates. This has the added advantage that rendering will be performed
using the video hardware.
>
> Setting GRASS_PNGFILE to a filename ending in ".xid" selects this
option; the XID of the pixmap will be written to that file as a
hexadecimal value. The g.cairocomp module can composite these pixmaps
without the image data ever leaving video memory ("g.cairocomp -d ..." can
be used to delete the pixmaps from the server).
>
> The only missing piece of the puzzle is a way to get wxPython to use an
existing pixmap (ideally without pulling it into client memory then
pushing it back out to the server). The cleanest approach would be via
pycairo and wx.lib.wxcairo, which would also allow g.cairocomp to be
eliminated, but that's yet another dependency.

It is still puzzling me how to create a shared memory buffer using
multiprocessing.sharedctypes.Array and use this in the C-function calls.
In the current approach i have to use a queue object to transfer the image
data from the child process to its parent and therefor the transformation
of the image buffer into a Python bytearray. How to access video memory is
another point. Are pipes or similar techniques available for this kind of
operation? Should we wait for hardware that have no distinction between
video and main memory? Using pycairo.BitmapFromImageSurface() seems to be
a good approach?

However, we should focus on approaches that work on Linux/Unix/Mac and
Windows. Using X11 specific features is not meaningful in my humble
opinion. Besides of that the cairo driver is not work with the windows
grass installer yet (missing dependencies, with the exception of Markus
Metz local installation).

I don't think that calling the d.vect and d.rast functionality as library
functions is insane. :slight_smile:
Using library function will allow to use the same image buffers across
rendering and composition that can be passed to the wxGUI parent process
using the multiprocessing queue. This will not increase the actual
rendering speed, but it will avoid several I/O operations and allows I/O
independent parallel rendering in case of multi-map visualization. The
mmap() approach is not needed in this case as well.

Well, the massive amount of d.vect and d.rast options will make it
difficult to design a convenient C-function interface ... but this can be
solved.

In the long-term, the current command interface to access the wx monitors
is a bit ... lets say ... error prone. It would be an advantage to have
the d.* modules as Python modules that are able to talk to the monitors
using socket connections or other cross platform IPC methods, sending
serialized objects that describe the call of the new (d.vect) vector
rendering or (d.rast) raster rendering functions in the display library.
In addition these modules can call the display library functions them self
for image rendering without monitors.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2033#comment:2&gt;
GRASS GIS <http://grass.osgeo.org>

#2033: Moving g.pnmcomp to lib/display to improve render performance of wxGUI
----------------------------------------------+-----------------------------
Reporter: huhabla | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.0.0
Component: wxGUI | Version: svn-trunk
Keywords: display, Python, multiprocessing | Platform: All
      Cpu: All |
----------------------------------------------+-----------------------------

Comment(by glynn):

Replying to [comment:2 huhabla]:

> I have the hope to speed up the composition by avoiding disc I/O.

If one process writes a file and another immediately reads it, it doesn't
necessarily involve "disc" I/O.

The OS caches disc blocks in RAM. write() completes as soon as the data
has been copied to the cache (the kernel will copy it to disc on its own
schedule), read() reads the data from the cache (and only requires disc
access for data which isn't already in the cache).

The kernel will use all "free" memory for the disc cache. So unless memory
pressure is high, the files written by the display driver will remain in
the cache for quite a while.

> As far as i understand mmap(), it is file backed and reads/writes the
data from the file on demand into the shared memory? An exception is
anonymous mapping, but is this also supported on windows? How can we
access the anonymous mmap() from wxPython?

Anonymous mmap() isn't relevant here. mmap() is file backed, but this
doesn't affect the time required to read and write the file unless memory
pressure is so high that the size of the file exceeds the amount of free
memory. In the event that sufficient free memory is available, neither
writing nor reading will block waiting for disc I/O.

> > Once you have d.* commands generating BMP files, it shouldn't be
necessary to add any binary blobs to wxGUI. Compositing should be
perfectly viable within Python using either numpy, PIL or wxPython (having
wxPython perform the compositing during rendering may be able to take
advantage of video hardware).
>
> What do you mean with binary blobs? Binary large objects?

Machine code.

IOW, it shouldn't be necessary to move g.pnmcomp into a library (DLL/DSO)
which is accessed from the wxGUI process. The replacement can just be
written in Python, using existing Python modules (numpy, PIL or wxPython)
to get reasonable performance.

> Does wxPython take advantage of the video hardware?

wxWidgets is a cross-platform wrapper around existing toolkits: Windows
GDI, GTK/GDK, etc. The underlying toolkit will use the video hardware, but
wxWidgets may insist upon inserting itself between the data and the
hardware.

> IMHO we can also implement a OpenCL version of the PNM image
composition.

This won't help much unless you can persuade wxWidgets/wxPython to use the
composed image directly. If it insists upon pulling the data from video
memory so that it can pass it to a function which just pushes it straight
back again, it would probably be quicker to perform the compositing on the
CPU.

> It is still puzzling me how to create a shared memory buffer using
multiprocessing.sharedctypes.Array and use this in the C-function calls.

I'm not sufficiently familiar with the multiprocessing module to answer
this question. However, if it turns out to be desirable (and I don't
actually think it will), it wouldn't be that hard to modify the PNG/cairo
drivers to write into a SysV-IPC shared memory segment (shmat() etc).

But I don't think that will offer any advantages over mmap()d files, and
it's certainly at a disadvantage compared to GPU rendering into shared
video memory.

> Should we wait for hardware that have no distinction between video and
main memory?

X11 will always make a distinction between server memory and client
memory, as those may be on different physical systems.

> Using pycairo.BitmapFromImageSurface() seems to be a good approach?

It may be the best that you're going to get. GDK can can create a
GdkPixmap from an XID (gdk_pixmap_foreign_new), and this functionality is
exposed by PyGTK. But the higher level libraries all seem to insist upon
creating the pixmap themselves from data which is in client memory. Or at
least, if the functionality is available, it doesn't seem to be
documented.

> I don't think that calling the d.vect and d.rast functionality as
library functions is insane. :slight_smile:

Eliminating the process boundary for no reason other than to avoid having
to figure out inter-process communication is not sane.

> Using library function will allow to use the same image buffers across
rendering and composition that can be passed to the wxGUI parent process
using the multiprocessing queue.

Using files will allow to use the same "image buffers" (i.e. the kernel's
disc cache).

> Well, the massive amount of d.vect and d.rast options will make it
difficult to design a convenient C-function interface ... but this can be
solved.

Solving inter-process communication is likely to be a lot simpler, and the
end result will be nicer.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2033#comment:3&gt;
GRASS GIS <http://grass.osgeo.org>

#2033: Moving g.pnmcomp to lib/display to improve render performance of wxGUI
----------------------------------------------+-----------------------------
Reporter: huhabla | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.0.0
Component: wxGUI | Version: svn-trunk
Keywords: display, Python, multiprocessing | Platform: All
      Cpu: All |
----------------------------------------------+-----------------------------

Comment(by huhabla):

I am not fully convinced using the mmap() approach for IPC. It is not
guaranteed that mmap() is faster then the usual file I/O.[1]

I have written a small python script to test the performance of d.rast and
d.vect using the north carolina sample dataset. The script is attached. I
am using the PNG and Cairo driver, switching mmap on and of for different
window sizes. In addition i call the d.rast (1x) and d.vect (2x) modules
in parallel to measure a render speed gain. The composition is still
missing, but i will add the PIL approach available in the grass wiki with
python.mmap support. Here is the script:

{{{
# -*- coding: utf-8 -*-
from grass.pygrass import modules
import os
import time

parallel = [True, False]
drivers = ["png", "cairo"]
mmap_modes = ["FALSE", "TRUE"]
sizes = [1024, 4096]

def render_image(module, driver="png", pngfile="test.png",
                  size=4096, mapped="TRUE"):
     os.putenv("GRASS_RENDER_IMMEDIATE", "%s"%driver)
     os.putenv("GRASS_PNGFILE", "%s"%pngfile)
     os.putenv("GRASS_WIDTH", "%i"%size)
     os.putenv("GRASS_HEIGHT", "%i"%size)
     os.putenv("GRASS_PNG_MAPPED", "%s"%mapped)
     module.run()

def composite_images(files):
     pass

def main():
     # Set the region
     modules.Module("g.region", rast="elevation", flags="p")

     for finish in parallel:
         if finish:
             print("*** Serial runs")
         else:
             print("*** Parallel runs")

         # Setup the modules
         rast = modules.Module("d.rast", map="elevation", run_=False,
                                 quiet=True, finish_=False)
         vectB = modules.Module("d.vect", map="streams", width=1,
color="blue",
                                 fcolor="aqua", type=["area","line"],
                                 run_=False, quiet=True, finish_=finish)
         vectA = modules.Module("d.vect", map="roadsmajor", width=2,
                                run_=False, quiet=True, finish_=finish)

         count = 0
         for driver in drivers:
             for mode in mmap_modes:
                 for size in sizes:
                     start = time.time()
                     count += 1
                     files =
                     if mode == "TRUE":
                         rast_file = "rast.bmp"
                         vectA_file="vectA.bmp"
                         vectB_file="vectB.bmp"
                     else:
                         rast_file = "rast.png"
                         vectA_file="vectA.png"
                         vectB_file="vectB.png"

                     render_image(rast, driver=driver, pngfile=rast_file,
                                  size=size, mapped=mode)
                     render_image(vectA, driver=driver, pngfile=vectA_file,
                                  size=size, mapped=mode)
                     render_image(vectB, driver=driver, pngfile=vectB_file,
                                  size=size, mapped=mode)

                     files.append(rast_file)
                     files.append(vectA_file)
                     files.append(vectB_file)

                     # Wait for processes
                     rast.popen.wait()
                     vectA.popen.wait()
                     vectB.popen.wait()

                     # Composite the images
                     composite_images(files)

                     for file in files:
                         os.remove(file)

                     elapsed = (time.time() - start)
                     print("*** Run %i Driver=%s mmap=%s Size=%i
time=%f"%(count,
driver,
mode,
size,
elapsed))

main()
}}}

The result of the benchmark:

{{{
GRASS 7.0.svn (nc_spm_08_grass7):~/src > python display_bench.py
projection: 99 (Lambert Conformal Conic)
zone: 0
datum: nad83
ellipsoid: a=6378137 es=0.006694380022900787
north: 228500
south: 215000
west: 630000
east: 645000
nsres: 10
ewres: 10
rows: 1350
cols: 1500
cells: 2025000
*** Serial runs
*** Run 1 Driver=png mmap=FALSE Size=1024 time=0.796055
*** Run 2 Driver=png mmap=FALSE Size=4096 time=3.389201
*** Run 3 Driver=png mmap=TRUE Size=1024 time=0.449877
*** Run 4 Driver=png mmap=TRUE Size=4096 time=3.723065
*** Run 5 Driver=cairo mmap=FALSE Size=1024 time=0.824797
*** Run 6 Driver=cairo mmap=FALSE Size=4096 time=2.632125
*** Run 7 Driver=cairo mmap=TRUE Size=1024 time=0.542321
*** Run 8 Driver=cairo mmap=TRUE Size=4096 time=2.276822
*** Parallel runs
*** Run 1 Driver=png mmap=FALSE Size=1024 time=0.756147
*** Run 2 Driver=png mmap=FALSE Size=4096 time=3.113990
*** Run 3 Driver=png mmap=TRUE Size=1024 time=0.530959
*** Run 4 Driver=png mmap=TRUE Size=4096 time=3.355732
*** Run 5 Driver=cairo mmap=FALSE Size=1024 time=0.865963
*** Run 6 Driver=cairo mmap=FALSE Size=4096 time=2.358270
*** Run 7 Driver=cairo mmap=TRUE Size=1024 time=0.566976
*** Run 8 Driver=cairo mmap=TRUE Size=4096 time=1.934245
}}}

There is no mmap() speed improvement for the PNG driver for large window
sizes?
I will investigate this further.

[1] http://lists.freebsd.org/pipermail/freebsd-
questions/2004-June/050371.html

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2033#comment:4&gt;
GRASS GIS <http://grass.osgeo.org>

#2033: Moving g.pnmcomp to lib/display to improve render performance of wxGUI
----------------------------------------------+-----------------------------
Reporter: huhabla | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.0.0
Component: wxGUI | Version: svn-trunk
Keywords: display, Python, multiprocessing | Platform: All
      Cpu: All |
----------------------------------------------+-----------------------------

Comment(by glynn):

Replying to [comment:4 huhabla]:

Your tests are comparing raw BMP with PNG (which uses zlib compression).
If you aren't seeing a significant difference between those two, then the
I/O overhead is negligible and performance is dictated by rendering speed.

Regardless of whether I/O uses mmap() or write() and read(), disk transfer
doesn't get involved unless memory pressure is so high that the data gets
discarded from the cache before it is read. And if memory pressure is that
high, disk transfer will get involved anyhow when the "memory" buffers are
swapped out (and if memory pressure is high and you don't have swap, then
you'll just get an out-of-memory failure).

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2033#comment:5&gt;
GRASS GIS <http://grass.osgeo.org>

#2033: Moving g.pnmcomp to lib/display to improve render performance of wxGUI
----------------------------------------------+-----------------------------
Reporter: huhabla | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.0.0
Component: wxGUI | Version: svn-trunk
Keywords: display, Python, multiprocessing | Platform: All
      Cpu: All |
----------------------------------------------+-----------------------------

Comment(by huhabla):

I have improved the benchmark script and implemented PIL based and
g.pnmcomp image composition (without transparency). Now png, bmp and ppm
images are created and mmap is enabled for bmp images. Time is measured
for the whole rendering/composition process and separately for the
composition.
My test system is a core-i5 2410M with 8GB RAM and 320Gig HD running
Ubuntu 12.04 LTS.

It seems to me that creating raw bmp images without mmap enabled shows the
best performance for the PNG and Cairo driver. Maybe i did something
wrong, but the use of mmap shows no obvious benefit?
The png compression slows the rendering significantly down and is IMHO not
well suited as image exchange format in the rendering/composition process.

Running the render processes in parallel shows only for the 4096x4096
pixel size images a significant benefit.

Any suggestions to improve the benchmark? Does my setup produce reasonable
results?

Here the script:

{{{
# -*- coding: utf-8 -*-
import os
import time
import Image
import wx
from grass.pygrass import modules

parallel = [True, False]
drivers = ["png", "cairo"]
bitmaps = ["png", "bmp", "ppm"]
mmap_modes = ["FALSE", "TRUE"]
sizes = [1024, 4096]

############################################################################

def render_image(module, driver="png", pngfile="test.png",
                  size=4096, mapped="TRUE"):
     os.putenv("GRASS_RENDER_IMMEDIATE", "%s"%driver)
     os.putenv("GRASS_PNGFILE", "%s"%pngfile)
     os.putenv("GRASS_WIDTH", "%i"%size)
     os.putenv("GRASS_HEIGHT", "%i"%size)
     os.putenv("GRASS_PNG_MAPPED", "%s"%mapped)
     module.run()

############################################################################

def composite_images(files, bitmap, mode, size):
     start = time.time()
     if bitmap == "ppm":
         filename = "output"
         filename += ".ppm"
         modules.Module("g.pnmcomp", input=files, width=size, height=size,
                        output=filename)
         # Load the image as wx image for visualization
         img = wx.Image(filename, wx.BITMAP_TYPE_ANY)
         os.remove(filename)
     else:
         images =
         size = None
         for m in files:
             im = Image.open(m)
             images.append(im)
             size = im.size
         comp = Image.new('RGB', size)
         for im in images:
             comp.paste(im)
         wxImage = wx.EmptyImage(*comp.size)
         wxImage.SetData(comp.convert('RGB').tostring())

     return (time.time() - start)

############################################################################

def main():
     # Set the region
     modules.Module("g.region", rast="elevation", flags="p")

     for finish in parallel:
         if finish:
             print("*** Serial runs")
         else:
             print("*** Parallel runs")

         print("Run\tSize\tDriver\tBitmap\tmmap\trender\tcomposite")

         # Setup the modules
         rast = modules.Module("d.rast", map="elevation", run_=False,
                                 quiet=True, finish_=False)
         vectB = modules.Module("d.vect", map="streams", width=1,
color="blue",
                                 fcolor="aqua", type=["area","line"],
                                 run_=False, quiet=True, finish_=finish)
         vectA = modules.Module("d.vect", map="roadsmajor", width=2,
                                run_=False, quiet=True, finish_=finish)

         count = 0
         for size in sizes:
             for driver in drivers:
                 for bitmap in bitmaps:
                     for mode in mmap_modes:
                         # Skip mmap for non-bmp files
                         if mode == "TRUE" and bitmap != "bmp":
                             continue

                         start = time.time()
                         count += 1
                         files =

                         rast_file = "rast.%s"%(bitmap)
                         vectA_file="vectA.%s"%(bitmap)
                         vectB_file="vectB.%s"%(bitmap)

                         files.append(rast_file)
                         files.append(vectA_file)
                         files.append(vectB_file)

                         render_image(rast, driver=driver,
                                      pngfile=rast_file,
                                      size=size, mapped=mode)

                         render_image(vectA, driver=driver,
                                      pngfile=vectA_file,
                                      size=size, mapped=mode)

                         render_image(vectB, driver=driver,
                                      pngfile=vectB_file,
                                      size=size, mapped=mode)

                         # Wait for processes
                         rast.popen.wait()
                         vectA.popen.wait()
                         vectB.popen.wait()

                         # Composite the images
                         comptime = composite_images(files, bitmap, mode,
                                                     size)

                         for file in files:
                             os.remove(file)

                         elapsed = (time.time() - start)
                         print("%i\t%i\t%s\t%s\t%s\t%.2f\t%.2f"%(count,
size,
                                                                 driver,
bitmap,
                                                                 mode,
elapsed,
                                                                 comptime))

############################################################################

main()
}}}

Here the benchmark results:

{{{
GRASS 7.0.svn (nc_spm_08_grass7):~/src > python display_bench.py
projection: 99 (Lambert Conformal Conic)
zone: 0
datum: nad83
ellipsoid: a=6378137 es=0.006694380022900787
north: 228500
south: 215000
west: 630000
east: 645000
nsres: 10
ewres: 10
rows: 1350
cols: 1500
cells: 2025000
*** Serial runs
Run Size Driver Bitmap mmap render composite
1 1024 png png FALSE 0.87 0.11
2 1024 png bmp FALSE 0.45 0.03
3 1024 png bmp TRUE 0.48 0.03
4 1024 png ppm FALSE 0.47 0.07
5 1024 cairo png FALSE 0.93 0.09
6 1024 cairo bmp FALSE 0.52 0.03
7 1024 cairo bmp TRUE 0.56 0.03
8 1024 cairo ppm FALSE 0.61 0.06
9 4096 png png FALSE 4.74 1.29
10 4096 png bmp FALSE 3.43 0.38
11 4096 png bmp TRUE 4.15 0.38
12 4096 png ppm FALSE 3.04 0.55
13 4096 cairo png FALSE 3.68 0.99
14 4096 cairo bmp FALSE 1.95 0.37
15 4096 cairo bmp TRUE 2.65 0.37
16 4096 cairo ppm FALSE 3.44 0.55
*** Parallel runs
Run Size Driver Bitmap mmap render composite
1 1024 png png FALSE 0.92 0.11
2 1024 png bmp FALSE 0.50 0.03
3 1024 png bmp TRUE 0.48 0.03
4 1024 png ppm FALSE 0.51 0.07
5 1024 cairo png FALSE 0.98 0.08
6 1024 cairo bmp FALSE 0.53 0.03
7 1024 cairo bmp TRUE 0.60 0.03
8 1024 cairo ppm FALSE 0.67 0.07
9 4096 png png FALSE 4.77 1.33
10 4096 png bmp FALSE 3.08 0.37
11 4096 png bmp TRUE 3.74 0.38
12 4096 png ppm FALSE 2.84 0.55
13 4096 cairo png FALSE 3.38 1.01
14 4096 cairo bmp FALSE 1.82 0.37
15 4096 cairo bmp TRUE 2.44 0.37
16 4096 cairo ppm FALSE 2.93 0.55
}}}

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2033#comment:6&gt;
GRASS GIS <http://grass.osgeo.org>

#2033: Moving g.pnmcomp to lib/display to improve render performance of wxGUI
----------------------------------------------+-----------------------------
Reporter: huhabla | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.0.0
Component: wxGUI | Version: svn-trunk
Keywords: display, Python, multiprocessing | Platform: All
      Cpu: All |
----------------------------------------------+-----------------------------

Comment(by glynn):

Replying to [comment:6 huhabla]:
> It seems to me that creating raw bmp images without mmap enabled shows
the best performance for the PNG and Cairo driver. Maybe i did something
wrong, but the use of mmap shows no obvious benefit?

Using mmap() in the driver is probably not that significant in this
context.

It's more useful when GRASS_PNG_READ=TRUE, the resolution is high, and the
rendering is simple and/or limited to a portion of the image. In that
situation, mmap() eliminates the read() as well as the write(), and only
the modified portion needs to be read and written.

Another area where it matters is with e.g. wximgview.py (and its
predecessors), as it's safe to read a BMP image which is being modified
using mmap(), whereas doing the same thing to a file which is being
written out with write() runs the risk reading a truncated file.

Other than that, the performance difference between using mmap() and
read() on the read side boils down to mmap() avoiding a memcpy(). The
extent to which that matters depends upon what else you're doing with the
data. For wxGUI, it's probably a drop in the ocean.

> Any suggestions to improve the benchmark? Does my setup produce
reasonable results?

There isn't anything I'd particularly take issue with. However:

1. With the cairo driver, BMP files use pre-multiplied alpha (because
that's what cairo uses internally), whereas PPM/PGM output includes an un-
multiplication step. So depending upon your perspective, the cairodriver
benchmarks are rigged against PPM or in favour of BMP.

2. Producing separate results for PPM with g.pnmcomp and PPM with PIL
would provide a clearer comparison between the two compositing options and
the various formats.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2033#comment:7&gt;
GRASS GIS <http://grass.osgeo.org>

#2033: Moving g.pnmcomp to lib/display to improve render performance of wxGUI
----------------------------------------------+-----------------------------
Reporter: huhabla | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.0.0
Component: wxGUI | Version: svn-trunk
Keywords: display, Python, multiprocessing | Platform: All
      Cpu: All |
----------------------------------------------+-----------------------------

Comment(by huhabla):

I have updated the display benchmark script to compare the PPM performance
of PIL and g.pnmcomp. System: Ubuntu 12.04 LTS, AMD Phenom(tm) II X6
1090T Processor, 16GB RAM, 1TB Harddisk. Please make sure that you have
the latest grass7 svn version to reproduce the benchmark results, since
there was a bug in the pygrass Module run() function, that did not allow
parallel process runs.

{{{
GRASS 7.0.svn (nc_spm_08_grass7):~/Downloads > python display_bench.py
projection: 99 (Lambert Conformal Conic)
zone: 0
datum: nad83
ellipsoid: a=6378137 es=0.006694380022900787
north: 228500
south: 215000
west: 630000
east: 645000
nsres: 10
ewres: 10
rows: 1350
cols: 1500
cells: 2025000
*** Serial runs
Run Size Driver Bitmap mmap render composite
1 1024 png png FALSE 0.859 0.135 PIL
2 1024 png bmp FALSE 0.447 0.044 PIL
3 1024 png bmp TRUE 0.446 0.044 PIL
4 1024 png ppm FALSE 0.430 0.046 PIL
5 1024 png ppm FALSE 0.461 0.066 g.pnmcomp
6 1024 cairo png FALSE 0.900 0.102 PIL
7 1024 cairo bmp FALSE 0.535 0.055 PIL
8 1024 cairo bmp TRUE 0.527 0.045 PIL
9 1024 cairo ppm FALSE 0.579 0.050 PIL
10 1024 cairo ppm FALSE 0.579 0.051 g.pnmcomp
11 4096 png png FALSE 5.106 1.513 PIL
12 4096 png bmp FALSE 2.728 0.602 PIL
13 4096 png bmp TRUE 2.724 0.596 PIL
14 4096 png ppm FALSE 2.402 0.604 PIL
15 4096 png ppm FALSE 2.129 0.306 g.pnmcomp
16 4096 cairo png FALSE 4.011 1.236 PIL
17 4096 cairo bmp FALSE 1.273 0.633 PIL
18 4096 cairo bmp TRUE 1.281 0.599 PIL
19 4096 cairo ppm FALSE 2.510 0.606 PIL
20 4096 cairo ppm FALSE 2.230 0.311 g.pnmcomp
*** Parallel runs
Run Size Driver Bitmap mmap render composite
1 1024 png png FALSE 0.856 0.127 PIL
2 1024 png bmp FALSE 0.456 0.052 PIL
3 1024 png bmp TRUE 0.457 0.044 PIL
4 1024 png ppm FALSE 0.442 0.048 PIL
5 1024 png ppm FALSE 0.447 0.059 g.pnmcomp
6 1024 cairo png FALSE 0.902 0.100 PIL
7 1024 cairo bmp FALSE 0.535 0.049 PIL
8 1024 cairo bmp TRUE 0.528 0.042 PIL
9 1024 cairo ppm FALSE 0.586 0.046 PIL
10 1024 cairo ppm FALSE 0.595 0.063 g.pnmcomp
11 4096 png png FALSE 4.481 1.535 PIL
12 4096 png bmp FALSE 2.331 0.608 PIL
13 4096 png bmp TRUE 2.344 0.595 PIL
14 4096 png ppm FALSE 2.139 0.603 PIL
15 4096 png ppm FALSE 1.808 0.294 g.pnmcomp
16 4096 cairo png FALSE 3.374 1.226 PIL
17 4096 cairo bmp FALSE 1.269 0.619 PIL
18 4096 cairo bmp TRUE 1.283 0.586 PIL
19 4096 cairo ppm FALSE 2.117 0.598 PIL
20 4096 cairo ppm FALSE 1.790 0.486 g.pnmcomp
}}}

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2033#comment:8&gt;
GRASS GIS <http://grass.osgeo.org>

#2033: Moving g.pnmcomp to lib/display to improve render performance of wxGUI
----------------------------------------------+-----------------------------
Reporter: huhabla | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.0.0
Component: wxGUI | Version: svn-trunk
Keywords: display, Python, multiprocessing | Platform: All
      Cpu: All |
----------------------------------------------+-----------------------------

Comment(by glynn):

Replying to [comment:8 huhabla]:

What I take away from this:

  * PNG has noticeable overhead even for reading, and substantial overhead
for writing.

  * BMP versus PPM makes no difference in terms of I/O.

  * When using the cairo driver, un-multiplying the alpha for PPM has a
noticeable overhead. As the script doesn't handle cairodriver's BMP files
correctly, the figures for cairo/BMP aren't meaningful.

  * g.pnmcomp has higher throughput but also a higher constant overhead, so
it's faster than PIL for larger images and slower for smaller images. And
the PIL version ignores the PGM file containing the alpha channel.

  * The amount of noise in the timings is noticeable but not all that
significant. In theory, the composite timings for a given size and format
shouldn't depend upon pngdriver versus cairodriver or mmap'd BMP versus
non-mmap'd BMP (although the difference between pngdriver and cairodriver
PNGs may indicate differences in options, e.g. compression).

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2033#comment:9&gt;
GRASS GIS <http://grass.osgeo.org>

#2033: Moving g.pnmcomp to lib/display to improve render performance of wxGUI
--------------------------+-------------------------------------------------
  Reporter: huhabla | Owner: grass-dev@…
      Type: enhancement | Status: closed
  Priority: major | Milestone: 7.0.0
Component: wxGUI | Version: svn-trunk
Resolution: worksforme | Keywords: display, Python, multiprocessing
  Platform: All | Cpu: All
--------------------------+-------------------------------------------------
Changes (by huhabla):

  * status: new => closed
  * resolution: => worksforme

Comment:

My conclusion:

  * Moving the code of g.pnmcomp, d.vect, d.rast ... d.* into the display
library for speedup reason is not meaningful. It might be meaningful when
we decide to implement the d.* modules as Python modules that communicate
with the wx display using sockets instead of files to call the rendering
backend. Well, the same can be achieved by implementing python wrapper
modules around the display modules ... so it might be not meaningful at
all.

  * The current wxGUI rendering approach using PPM and g.pnmcomp seems to
be the most efficient considering the fact that the cairo driver is not
yet available in the windows version of grass7. It seems to me that using
PIL will not provide a large speedup benefit over g.pnmcmop especially for
large images.

  * A small speedup can be achieved when calling the d.* modules in
parallel in the GUI, especially when several maps need to be re-rendered.

  * IMHO the only way to speedup the rendering is to make d.rast and d.vect
faster.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2033#comment:10&gt;
GRASS GIS <http://grass.osgeo.org>

#2033: Moving g.pnmcomp to lib/display to improve render performance of wxGUI
--------------------------+-------------------------------------------------
  Reporter: huhabla | Owner: grass-dev@…
      Type: enhancement | Status: closed
  Priority: major | Milestone: 7.0.0
Component: wxGUI | Version: svn-trunk
Resolution: worksforme | Keywords: display, Python, multiprocessing
  Platform: All | Cpu: All
--------------------------+-------------------------------------------------

Comment(by martinl):

Replying to [comment:10 huhabla]:
> * The current wxGUI rendering approach using PPM and g.pnmcomp seems to
be the most efficient considering the fact that the cairo driver is not
yet available in the windows version of grass7. It seems to me that using
PIL will not provide a large speedup benefit over g.pnmcmop especially for
large images.

small update: since r57542 GRASS 7 is built with cairo support also on
Windows.

Martin

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2033#comment:11&gt;
GRASS GIS <http://grass.osgeo.org>

(back on this in the list rather than cluttering the ticket)

On Wed, Aug 7, 2013 at 9:42 PM, GRASS GIS <trac@osgeo.org> wrote:

#2033: Moving g.pnmcomp to lib/display to improve render performance of wxGUI

...

I still hope that the command line wx0 can be brought to speed as x0
is for GRASS 6.

Some questions wrt to the comments sent by Soeren:

  * Moving the code of g.pnmcomp, d.vect, d.rast ... d.* into the display
library for speedup reason is not meaningful. It might be meaningful when
we decide to implement the d.* modules as Python modules that communicate
with the wx display using sockets instead of files to call the rendering
backend.

What estimate of work would it be needed to implement a prototype for this?

Well, the same can be achieved by implementing python wrapper
modules around the display modules ... so it might be not meaningful at
all.

  * The current wxGUI rendering approach using PPM and g.pnmcomp seems to
be the most efficient considering the fact that the cairo driver is not
yet available in the windows version of grass7.

Meanwhile this has been fixed (see r57542). Is g.pnmcomp still the
only choice? To me it looks like slow.

It seems to me that using
PIL will not provide a large speedup benefit over g.pnmcmop especially for
large images.

(the latter happens on large monitors)

  * A small speedup can be achieved when calling the d.* modules in
parallel in the GUI, especially when several maps need to be re-rendered.

  * IMHO the only way to speedup the rendering is to make d.rast and d.vect
faster.

Would also a quite different approach be possible? I have somehow the
idea that generating tmp files on disk is slower than writing into the
graphics card's memory :slight_smile:

thanks
Markus

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2033#comment:10&gt;
GRASS GIS <http://grass.osgeo.org>

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Markus Neteler wrote:

Would also a quite different approach be possible? I have somehow the
idea that generating tmp files on disk is slower than writing into the
graphics card's memory :slight_smile:

The fastest solution is what you get if you use the cairo driver and
$GRASS_PNGFILE has a ".xid" suffix. The driver creates an X11 or
X11+XRender surface (which is in video memory). Rendering operations
use X11 (XRender if available, core protocol otherwise) which will use
the video hardware where possible.

The disadvantages are that it only works with X11, and some GUI
toolkits (including wxPython) are unable to make use of the resulting
image (other than by pulling the data back into client memory so that
it can be used as a source for their naive "image" class). Even if a
particul toolkit has the ability to make use of an image already in
video memory, the mechanism will be non-portable and typically either
poorly documented or undocumented.

Any other method will end up rendering the image in software to client
memory. It may also write the image to a file, but actual disk I/O is
performed by the kernel in the background. The expensive part is if
the driver performs expensive processing (e.g. compression) on the
data in the process.

The biggest obstacle is probably the use of wxWidgets, as its
rendering model is stuck in the early 1990s (the API is essentially
Windows 3.1 GDI with some minor modifications).

--
Glynn Clements <glynn@gclements.plus.com>

Hi,

2013/11/26 Glynn Clements <glynn@gclements.plus.com>:

The fastest solution is what you get if you use the cairo driver and
$GRASS_PNGFILE has a ".xid" suffix. The driver creates an X11 or

it's not really related to this issue. Anyway it would be probably
good idea to rename all environmental variables related to rendering,
e.g. GRASS_WIDTH to GRASS_DISPLAY_WIDTH, GRASS_DISP_WIDTH, or
GRASS_RENDER_WIDTH and so on.

The variable GRASS_PNGFILE is also very misleading name, it's not used
only for producing PNG files. Cairo driver supports many formats which
are given by filename extension, so probably GRASS_DISPLAY_FILE would
be better.

What do you think?

Martin

On Tue, Nov 26, 2013 at 4:51 PM, Martin Landa <landa.martin@gmail.com>wrote:

Hi,

2013/11/26 Glynn Clements <glynn@gclements.plus.com>:

> The fastest solution is what you get if you use the cairo driver and
> $GRASS_PNGFILE has a ".xid" suffix. The driver creates an X11 or

it's not really related to this issue. Anyway it would be probably
good idea to rename all environmental variables related to rendering,
e.g. GRASS_WIDTH to GRASS_DISPLAY_WIDTH, GRASS_DISP_WIDTH, or
GRASS_RENDER_WIDTH and so on.

The variable GRASS_PNGFILE is also very misleading name, it's not used
only for producing PNG files. Cairo driver supports many formats which
are given by filename extension, so probably GRASS_DISPLAY_FILE would
be better.

What do you think?

I think it would be great. It is part of the (programming) interface, so I
deserves good, properly considered and descriptive names.

I'm against shortened names, so GRASS_DISP_WIDTH is not my choice. If I
would designing this from scratch, I would choose GRASS_RENDER_WIDTH
because it describes what is happening. However, we have display driver and
we use the term display architecture, so GRASS_DISPLAY_WIDTH seem to be the
right choice now unless we want to rename display things to render things.

Note that renaming is not completely unrealistic option because in GUI we
use "display" for windows and "render" for rendering of images, so in fact
we are already using the other render-oriented terminology, the only
problem are display commands which relates to both render and display
things.

Vaclav

Martin
_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Hi,

2013/11/27 Vaclav Petras <wenzeslaus@gmail.com>:

it's not really related to this issue. Anyway it would be probably
good idea to rename all environmental variables related to rendering,
e.g. GRASS_WIDTH to GRASS_DISPLAY_WIDTH, GRASS_DISP_WIDTH, or
GRASS_RENDER_WIDTH and so on.

The variable GRASS_PNGFILE is also very misleading name, it's not used
only for producing PNG files. Cairo driver supports many formats which
are given by filename extension, so probably GRASS_DISPLAY_FILE would
be better.

we have already GRASS_RENDER_IMMEDIATE variable, so probably
GRASS_RENDER prefix would be the best choice.

Martin