[GRASS-dev] GRASS and GPGPU

Has anyone considered the possibility of doing stream-based calculations on
the GPU [1] for raster operations on large datasets ?

1. http://en.wikipedia.org/wiki/GPGPU

It appears that this method works best on highly vectorized instructions,
often in 2 "dimensions"-- appropriate for matrix/grid computations.

Just a thought.

Dylan

--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

Yes, that is an excellent thought. I was considering this possibility for speeding up common interpolation, image transformation and
resampling, as well.

Kriging in realtime, anyone? :wink:

Of course, this would have to be implemented in a GPU-specific library.

NVidia has the CUDA SDK, which is not open source at the moment, but
the pressure is on:

http://forums.nvidia.com/lofiversion/index.php?t28458.html

ATI already seems to have their stuff (CTM) opened up.

Maybe a proof-of-concept GPGPU module would make a nice little SoC
project?

Benjamin

Dylan Beaudette wrote:

Has anyone considered the possibility of doing stream-based calculations on the GPU [1] for raster operations on large datasets ?

1. http://en.wikipedia.org/wiki/GPGPU

It appears that this method works best on highly vectorized instructions, often in 2 "dimensions"-- appropriate for matrix/grid computations.

Just a thought.

Dylan

--
Benjamin Ducke, M.A.
Archäoinformatik
(Archaeoinformation Science)
Institut für Ur- und Frühgeschichte
(Inst. of Prehistoric and Historic Archaeology)
Christian-Albrechts-Universität zu Kiel
Johanna-Mestorf-Straße 2-6
D 24098 Kiel
Germany

Tel.: ++49 (0)431 880-3378 / -3379
Fax : ++49 (0)431 880-7300
www.uni-kiel.de/ufg

On Saturday 01 March 2008, Benjamin Ducke wrote:

Yes, that is an excellent thought. I was considering this possibility
for speeding up common interpolation, image transformation and
resampling, as well.

Kriging in realtime, anyone? :wink:

Of course, this would have to be implemented in a GPU-specific library.

NVidia has the CUDA SDK, which is not open source at the moment, but
the pressure is on:

http://forums.nvidia.com/lofiversion/index.php?t28458.html

ATI already seems to have their stuff (CTM) opened up.

Maybe a proof-of-concept GPGPU module would make a nice little SoC
project?

Exactly-- that seems like an ideal candidate: cutting edge and exciting--
something that seems very fundable.

Good idea!

Dylan

Benjamin

Dylan Beaudette wrote:
> Has anyone considered the possibility of doing stream-based calculations
> on the GPU [1] for raster operations on large datasets ?
>
> 1. http://en.wikipedia.org/wiki/GPGPU
>
> It appears that this method works best on highly vectorized instructions,
> often in 2 "dimensions"-- appropriate for matrix/grid computations.
>
> Just a thought.
>
> Dylan

--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

Another popular task that has been implemented using GPU is insolation, visibility and similar class of tasks,
so r.sun, r.los would be another candidate modules that could use faster implementation.

Helena

On Mar 1, 2008, at 4:38 AM, Benjamin Ducke wrote:

Yes, that is an excellent thought. I was considering this possibility for speeding up common interpolation, image transformation and
resampling, as well.

Kriging in realtime, anyone? :wink:

Of course, this would have to be implemented in a GPU-specific library.

NVidia has the CUDA SDK, which is not open source at the moment, but
the pressure is on:

http://forums.nvidia.com/lofiversion/index.php?t28458.html

ATI already seems to have their stuff (CTM) opened up.

Maybe a proof-of-concept GPGPU module would make a nice little SoC
project?

Benjamin

Dylan Beaudette wrote:

Has anyone considered the possibility of doing stream-based calculations on the GPU [1] for raster operations on large datasets ? 1. http://en.wikipedia.org/wiki/GPGPU
It appears that this method works best on highly vectorized instructions, often in 2 "dimensions"-- appropriate for matrix/grid computations.
Just a thought.
Dylan

--
Benjamin Ducke, M.A.
Archäoinformatik
(Archaeoinformation Science)
Institut für Ur- und Frühgeschichte
(Inst. of Prehistoric and Historic Archaeology)
Christian-Albrechts-Universität zu Kiel
Johanna-Mestorf-Straße 2-6
D 24098 Kiel
Germany

Tel.: ++49 (0)431 880-3378 / -3379
Fax : ++49 (0)431 880-7300
www.uni-kiel.de/ufg

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

On Saturday 01 March 2008, Helena Mitasova wrote:

Another popular task that has been implemented using GPU is
insolation, visibility and similar class of tasks,
so r.sun, r.los would be another candidate modules that could use
faster implementation.

Helena

I like the sound of r.sun done on the GPU-- that is a very time intensive
operation, which could use a boost.

Some starters:

# overview presentation:
http://www.cs.unm.edu/~angel/CS534/LECTURES/GPGPU2.pdf

# some discussion
1. http://www.gpgpu.org/forums/index.php?sid=27fd91350f750179126570bde633f792

# open source tools
2. http://sourceforge.net/projects/gpgpu

# an extension to C for stream processing: Brook
3. http://www.gpgpu.org/w/index.php/Brook
4. http://graphics.stanford.edu/projects/brookgpu/lang.html

With some possible drawbacks from [3] :

" However this discounts the important part of transferring the data to be
processed to and from the GPU. With a PCI Express 1.0 x8 interface, the
memory of an ATI HD 2900 XT can be written to at about 730Mb/sec and read
from at about 311Mb/sec which is significantly slower than normal PC memory.
For large datasets, this can greatly diminish the speed increase of using a
GPU over a well tuned CPU implementation. Of course, as GPU's become faster
far more quickly than CPU's and the PCI Express interface improves, it will
make more sense to offload large processing to GPU's. "

# another library: libSh -- appears to be a well-supported C++ library, and
metalanguage:
5. http://libsh.org/
6. http://libsh.org/wiki/index.php/Main_Page
7. http://libsh.org/wiki/index.php/Metaprogramming_GPUs_with_Sh
8. http://libsh.org/wiki/index.php/Sample_Code

On Mar 1, 2008, at 4:38 AM, Benjamin Ducke wrote:
> Yes, that is an excellent thought. I was considering this
> possibility for speeding up common interpolation, image
> transformation and
> resampling, as well.
>
> Kriging in realtime, anyone? :wink:
>
> Of course, this would have to be implemented in a GPU-specific
> library.
>
> NVidia has the CUDA SDK, which is not open source at the moment, but
> the pressure is on:
>
> http://forums.nvidia.com/lofiversion/index.php?t28458.html
>
> ATI already seems to have their stuff (CTM) opened up.
>
> Maybe a proof-of-concept GPGPU module would make a nice little SoC
> project?
>
> Benjamin
>
> Dylan Beaudette wrote:
>> Has anyone considered the possibility of doing stream-based
>> calculations on the GPU [1] for raster operations on large
>> datasets ? 1. http://en.wikipedia.org/wiki/GPGPU
>> It appears that this method works best on highly vectorized
>> instructions, often in 2 "dimensions"-- appropriate for matrix/
>> grid computations.
>> Just a thought.
>> Dylan
>
> --
> Benjamin Ducke, M.A.
> Archäoinformatik
> (Archaeoinformation Science)
> Institut für Ur- und Frühgeschichte
> (Inst. of Prehistoric and Historic Archaeology)
> Christian-Albrechts-Universität zu Kiel
> Johanna-Mestorf-Straße 2-6
> D 24098 Kiel
> Germany
>
> Tel.: ++49 (0)431 880-3378 / -3379
> Fax : ++49 (0)431 880-7300
> www.uni-kiel.de/ufg
>
> _______________________________________________
> grass-dev mailing list
> grass-dev@lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/grass-dev

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Dylan Beaudette pisze:

Some starters:

Also look at Glynn's comments not so long time ago:

http://www.nabble.com/forum/ViewPost.jtp?post=15573024&framed=y

Maciek

On Sunday 02 March 2008 02:41:07 am Maciej Sieczka wrote:

Dylan Beaudette pisze:
> Some starters:

Also look at Glynn's comments not so long time ago:

http://www.nabble.com/forum/ViewPost.jtp?post=15573024&framed=y

Maciek

Thanks for the reminder Maciek.

However, (although I am not a GPU programmer) it looks like the concept of
stream programming might fit into the GRASS approach of using simple
formulations over optimized ones-- and here is why:

many raster operations use constructs like the following:

for x in rows:
  for y in cols:
    get_cell_value(x,y,)
    do_something(x,y, value)
  end
end

Based on my reading of the GPGPU documents it appears that it is possible to
give the (libSh/Brook) library the do_something(x, y, value) function (called
a kernel in stream processing) and the GPU will limplicitly perform the loop
in parallel.

If the functionality used in the inner-most loop can be reduced to simple
operations, it shouldn't be too hard to "drop-in" GPGPU extensions.

That said, I am not at all experienced with programming the GPU-- however,
after several years of using R the notion of being able to use vectorized
functions sounds nice-- especially if the GPU can do those operations 10x
faster than the CPU.

I think that we would need a simple proof of concept raster module to see how
hard it would be. I have heard that Manifold GIS has something like this for
raster operations.

Dylan

--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

On Sunday 02 March 2008 09:55:17 am Dylan Beaudette wrote:

On Sunday 02 March 2008 02:41:07 am Maciej Sieczka wrote:
> Dylan Beaudette pisze:
> > Some starters:
>
> Also look at Glynn's comments not so long time ago:
>
> http://www.nabble.com/forum/ViewPost.jtp?post=15573024&framed=y
>
> Maciek

Thanks for the reminder Maciek.

However, (although I am not a GPU programmer) it looks like the concept of
stream programming might fit into the GRASS approach of using simple
formulations over optimized ones-- and here is why:

many raster operations use constructs like the following:

for x in rows:
  for y in cols:
    get_cell_value(x,y,)
    do_something(x,y, value)
  end
end

Based on my reading of the GPGPU documents it appears that it is possible
to give the (libSh/Brook) library the do_something(x, y, value) function
(called a kernel in stream processing) and the GPU will limplicitly perform
the loop in parallel.

If the functionality used in the inner-most loop can be reduced to simple
operations, it shouldn't be too hard to "drop-in" GPGPU extensions.

Some more findings:

1. the Brook stream language relies on NVIDIA's Cg Compiler, which may not
work with anything but the latest video cards / drivers.

2. NVIDIA's CUDA SDK: this relies on very recent cards / drivers, and the
license agreement is a little vague. Also- code running on GPU's that are
used for display can only run for 5 seconds. Therefore real code running on
the GPU would require a secondary dedicated GPU that is not being used to
display anything.

3. libSh: this appears the be the most flexible/open implementation of GPGPU,
such that many NVIDIA/ATI cards are supported. It is a C++ extension, so
there is a deviation from the ANSI C standards used in GRASS.

I was able to compile libSh from their SVN source, after installing the
'freeglut3-dev' package (Debian/Unstable). The example files seem to compile
fine, although I cannot test them until I am back in my office. libSh and the
examples *seem* to be able to use my 4-yr old video card (GeForce FX
5600XT)-- so many of the cards in circulation should be supported by now.

The examples distributed with libSh are writted around standard GPU type
concepts -- streams of particles etc. There is a good page [1] which
describes the more general concept of implementing stream processing with
libSh.

1 .http://libsh.org/wiki/index.php/How_Sh_Works

Getting to the limit of my C++ knowledge, so I have not been able to create a
simple test case yet. There appears to be extensive documentation on the
available matrix/geometry/trig methods, but they are in such small snippets
that it would take someone with more experience to stitch them into a
complete program.

I wonder if Soren's groundwater flow routines could benefit from what libSh
has to offer?

Dylan

--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

On Saturday 01 March 2008, Helena Mitasova wrote:

Another popular task that has been implemented using GPU is
insolation, visibility and similar class of tasks,
so r.sun, r.los would be another candidate modules that could use
faster implementation.

Helena

One more post on this today, and then I am done looking into GPGPU computation
for a while.

Some initial tests with the libSh code resulted in odd results: the CPU
version was an order of magnitude faster than the GPU version. It could be
that I didn't have something set up correctly...

It also looks like libSh is not being actively maintained, with more of their
efforts going toward a commercial product.

Finally, this page [1] has enough details to keep anyone interested reading
for a while. However, it suggests that GPGPU operations are limited (for the
most part) to semi-32bit floating point numbers. Might be a show-stopper for
working with anything by FCELL maps...

1. http://www.gpgpu.org/w/index.php/FAQ

Dylan

On Mar 1, 2008, at 4:38 AM, Benjamin Ducke wrote:
> Yes, that is an excellent thought. I was considering this
> possibility for speeding up common interpolation, image
> transformation and
> resampling, as well.
>
> Kriging in realtime, anyone? :wink:
>
> Of course, this would have to be implemented in a GPU-specific
> library.
>
> NVidia has the CUDA SDK, which is not open source at the moment, but
> the pressure is on:
>
> http://forums.nvidia.com/lofiversion/index.php?t28458.html
>
> ATI already seems to have their stuff (CTM) opened up.
>
> Maybe a proof-of-concept GPGPU module would make a nice little SoC
> project?
>
> Benjamin
>
> Dylan Beaudette wrote:
>> Has anyone considered the possibility of doing stream-based
>> calculations on the GPU [1] for raster operations on large
>> datasets ? 1. http://en.wikipedia.org/wiki/GPGPU
>> It appears that this method works best on highly vectorized
>> instructions, often in 2 "dimensions"-- appropriate for matrix/
>> grid computations.
>> Just a thought.
>> Dylan
>
> --
> Benjamin Ducke, M.A.
> Archäoinformatik
> (Archaeoinformation Science)
> Institut für Ur- und Frühgeschichte
> (Inst. of Prehistoric and Historic Archaeology)
> Christian-Albrechts-Universität zu Kiel
> Johanna-Mestorf-Straße 2-6
> D 24098 Kiel
> Germany
>
> Tel.: ++49 (0)431 880-3378 / -3379
> Fax : ++49 (0)431 880-7300
> www.uni-kiel.de/ufg
>
> _______________________________________________
> grass-dev mailing list
> grass-dev@lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/grass-dev

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

Dylan Beaudette wrote:

Of course, as GPU's become faster
far more quickly than CPU's and the PCI Express interface improves, it will
make more sense to offload large processing to GPU's. "

This presumes that you actually *have* a GPU.

Many servers only have very basic graphics hardware. Even with desktop
systems, there's a huge performance difference between budget systems
(with e.g. integrated graphics or a £20 card) and "gaming" systems
with a £300 card.

Also, the difference between various GPUs (even different models from
the same vendor) tend to be quite significant, and not easily hidden
by the compiler. You could realistically find that you need to write
half a dozen radically different versions of the same function just
for the most popular GPUs, and also need to re-write it regularly, as
GPU architecture tends to change quite rapidly.

--
Glynn Clements <glynn@gclements.plus.com>

On Tuesday 11 March 2008, Glynn Clements wrote:

Dylan Beaudette wrote:
> Of course, as GPU's become faster
> far more quickly than CPU's and the PCI Express interface improves, it
> will make more sense to offload large processing to GPU's. "

This presumes that you actually *have* a GPU.

Of course. However most desktop machines come standard with some sort of
accelerated video hardware.

Many servers only have very basic graphics hardware. Even with desktop
systems, there's a huge performance difference between budget systems
(with e.g. integrated graphics or a £20 card) and "gaming" systems
with a £300 card.

According to some of the documentation on the libsh and brook sites you don't
need top of the line card to notice a performance boost.

Also, the difference between various GPUs (even different models from
the same vendor) tend to be quite significant, and not easily hidden
by the compiler. You could realistically find that you need to write
half a dozen radically different versions of the same function just
for the most popular GPUs, and also need to re-write it regularly, as
GPU architecture tends to change quite rapidly.

This is probably one of the biggest reasons not to try. The libSH approach
looked appealing, as it was very generalized and appears to work with a wide
range of hardware. However, it is not actively maintained anymore.

The above point coupled with single precision floating point, and a reliance
on the commercial compiler (NVIDIA only)-- might make this entire thread a
moot point.

I would like to be proved wrong and see a GPU-accelerated GRASS module though!

Cheers,

--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

The above point coupled with single precision floating point, and a reliance on the commercial compiler (NVIDIA only)-- might make this entire thread a moot point.

Maybe at this point, but the developments need to be monitored.
When the first accelerated graphics cards for PCs came up, even the
basic 2D stuff needed chip specific drivers to do anything useful. Now
you have VESA, OpenGL, ...

I don't think it is reasonable to assume that GPU-based processing will
consolidate and produce generic open source tools at some point in the
future.

I would like to be proved wrong and see a GPU-accelerated GRASS module though!

You are not alone!

Cheers,

--
Benjamin Ducke
Senior Applications Support and Development Officer

Oxford Archaeological Unit Limited
Janus House
Osney Mead
OX2 0ES
Oxford, U.K.

Tel.: ++44 (0)1865 263 800
benjamin.ducke@oxfordarch.co.uk

Benjamin Ducke wrote:

That's UNreasonable, of course (ugh!)
                        |
                        V

I don't think it is reasonable to assume that GPU-based processing will
consolidate and produce generic open source tools at some point in the
future.

I would like to be proved wrong and see a GPU-accelerated GRASS module though!

You are not alone!

Cheers,

--
Benjamin Ducke
Senior Applications Support and Development Officer

Oxford Archaeological Unit Limited
Janus House
Osney Mead
OX2 0ES
Oxford, U.K.

Tel.: ++44 (0)1865 263 800
benjamin.ducke@oxfordarch.co.uk