[GRASS-dev] OpenCL Parallelization

OpenCL can use CPUs and GPUs for parallel processing.
For all those operations that can be done more efficiently
on a GPU, there a potentially enormous performance gains.
Since OpenCL is like a reduced C dialect and completely
abstracted from the hardware, all an OpenCL programmer
needs to do is implement the algorithm and then it's up
to the graphics chips manufacturers to release drivers
for their hardware. NVIDIA and ATI have already done that.

I for one would love to see v.vol.rst run in parallel
on a bunch of cheap GPUs, outperforming a mainframe...

Ben

Hamish wrote:

Jeshua wrote:

I was curious if anyone has looked at implementing OpenCL
for some of the processor intensive highly parallel
modules.

<http://www.khronos.org/opencl/&gt;

It seems to me that some of the more processor intensive
modules could gain a significant performance utilizing
OpenCL with fairly modest development effort.

I am not familiar with the code for v.surf.rst, however I
might be interested in making an attempt if it can be
parallelized. Or can anyone suggest what processor intensive
modules might be good candidates for parallelization?

see http://grass.osgeo.org/wiki/OpenMP

what advantages would OpenCL have over OpenMP and pthreads?

OpenMP and pthreads are somewhat different beasts suiting two somewhat
different niches. Where would OpenCL overlap? Would it be a better
or worse fit than the others for different scenarios?

Probably supporting three different MP schemes is too crowded.

Hamish

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

--
Benjamin Ducke
Senior Applications Support and Development Officer

Oxford Archaeological Unit Limited
Janus House
Osney Mead
OX2 0ES
Oxford, U.K.

Tel: +44 (0)1865 263 800 (switchboard)
Tel: +44 (0)1865 980 758 (direct)
Fax :+44 (0)1865 793 496
benjamin.ducke@oxfordarch.co.uk

------
Files attached to this email may be in ISO 26300 format (OASIS Open Document Format). If you have difficulty opening them, please visit http://iso26300.info for more information.

On Sep 12, 2009, at 12:21 PM, Benjamin Ducke wrote:

OpenCL can use CPUs and GPUs for parallel processing.
For all those operations that can be done more efficiently
on a GPU, there a potentially enormous performance gains.

Indeed. Precisely the reason why I think it is so compelling.

There are about the same number of transistors in my GPU that are in my CPU. Most of the time most of those transistors in my GPU are just sitting there idling (99% of the time I do not do any intensive 3D tasks), it sure would be nice to be able to put them to use.

And you can easily add more GPUs to machine, not the case with CPUs....

Best,

Jeshua Lacock, Owner
<http://OpenOSX.com>
phone: 208.462.4171

Jeshua,

I will be happy to assist with v.surf.rst, v.vol.rst (I even have v.volt.rst - that is 3D+time).
It has been parallelized couple times in past - the parallel version never survives more than
one release cycle because the tools for parallelization or the architecture changes
and there is nobody to update. I may have one more recent version that was done for beowulf
cluster. Let me know if you would like to look at it. That was done at the segment
level, but running it just with parallelized lineq solver should help too because that would allow
larger segments and larger overlaps.

Helena

Helena Mitasova
Associate Professor
Department of Marine, Earth and Atmospheric Sciences
North Carolina State University
1125 Jordan Hall
NCSU Box 8208
Raleigh, NC 27695-8208
http://skagit.meas.ncsu.edu/~helena/

email: hmitaso@unity.ncsu.edu
ph: 919-513-1327 (no voicemail)
fax 919 515-7802

On Sep 13, 2009, at 6:27 AM, Jeshua Lacock wrote:

On Sep 12, 2009, at 12:21 PM, Benjamin Ducke wrote:

OpenCL can use CPUs and GPUs for parallel processing.
For all those operations that can be done more efficiently
on a GPU, there a potentially enormous performance gains.

Indeed. Precisely the reason why I think it is so compelling.

There are about the same number of transistors in my GPU that are in my CPU. Most of the time most of those transistors in my GPU are just sitting there idling (99% of the time I do not do any intensive 3D tasks), it sure would be nice to be able to put them to use.

And you can easily add more GPUs to machine, not the case with CPUs....

Best,

Jeshua Lacock, Owner
<http://OpenOSX.com>
phone: 208.462.4171

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

On Sep 14, 2009, at 12:08 PM, Helena Mitasova wrote:

I will be happy to assist with v.surf.rst, v.vol.rst (I even have v.volt.rst - that is 3D+time).

Cool!

It has been parallelized couple times in past - the parallel version never survives more than
one release cycle because the tools for parallelization or the architecture changes
and there is nobody to update.

Do you think it might be different with the advent of OpenCL?

I may have one more recent version that was done for beowulf
cluster. Let me know if you would like to look at it.

Please feel free to email me off list.

That was done at the segment
level, but running it just with parallelized lineq solver should help too because that would allow
larger segments and larger overlaps.

I may even have a customer that would help fund development for v.surf.rst.

v.surf.idw is already pretty fast (in comparison to RST), but do you think it could benefit?

Best,

Jeshua Lacock, Owner
<http://OpenOSX.com>
phone: 208.462.4171

fyi,

bare OpenCL links added to the OpenMP page in the grass wiki.

Hamish

Sorry to you all for not being active, I will love to help in this effort. >From September 28 to december end, there is only GRASS coding on the menu for me.

Please do pass me any work in that time frame OPENCL or CUDA . I will be happy to oblige.

Also, I think there is scope for using GPU as a coprocessor and splitting work between different processors on the same machine to coexist.

FYI, I am very much at ease with CUDA (NVidia GPU programming), than any other form of parallelization methods. It is my field of research.

On Tue, Sep 15, 2009 at 3:00 PM, Hamish <hamish_b@yahoo.com> wrote:

fyi,

bare OpenCL links added to the OpenMP page in the grass wiki.

Hamish


grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


JYOTHISH SOMAN
MS-2008-CS
Center for Security, Theory And Algorithmic Research (CSTAR)
International Institute of Information Technology
Hyderabad
India
Phone:+91-9966222626
http://www.iiit.ac.in/

The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
George Bernard Shaw

On Sep 15, 2009, at 10:29 PM, Helena Mitasova wrote:

Do you think it might be different with the advent of OpenCL?

it should not be a separate version, it should be one module that optionally takes advantage
of multiple processors. One thing to try out would be to parallelize the linear equation solver -
that would benefit not only v.surf.rst but also v.vol.rst and other code that calls it.
Soeren may have some suggestions (he has just added some solvers to GRASS and I think he
did some parallel computations as well).

Sounds good. In what file may I find the source for linear equation solver?

I may have one more recent version that was done for beowulf
cluster. Let me know if you would like to look at it.

Please feel free to email me off list.

here it is - you can see that at this point it is quite obsolete - I am not sure
that anybody has ever actually used it
http://skagit.meas.ncsu.edu/~helena/grasswork/grasscontrib/rst/rstmods2fixed.tar.gz

That was done at the segment
level, but running it just with parallelized lineq solver should help too because that would allow
larger segments and larger overlaps.

I may even have a customer that would help fund development for v.surf.rst.

that would be great - we use it a lot with lidar data - millions of points so it would be great help.

My client is an archeologist that takes magnetometer samples in the field and also has millions of points (a large site might be up to 10 million samples).

He said he was definitely interested help paying for this to be developed. He mentioned that "Manifold GIS" is already taking advantage of OpenCL and said the performance increase was nothing short of amazing. He said some very intensive operations are now nearly realtime. He is trying to switch entirely over to GRASS for all of his GIS work.

I am sure this is a difficult question to answer as it depends on a lot of variables, but can anyone offer a ball park estimate of how much work might be involved to parallelize the linear equation solver? I am just trying to get a feel for how much work might be involved.

Best,

Jeshua Lacock, Owner
<http://OpenOSX.com>
phone: 208.462.4171

On Sep 15, 2009, at 7:52 AM, Jyothish Soman wrote:

Sorry to you all for not being active, I will love to help in this effort. >From September 28 to december end, there is only GRASS coding on the menu for me.

Please do pass me any work in that time frame OPENCL or CUDA . I will be happy to oblige.

Also, I think there is scope for using GPU as a coprocessor and splitting work between different processors on the same machine to coexist.

FYI, I am very much at ease with CUDA (NVidia GPU programming), than any other form of parallelization methods. It is my field of research.

Greetings,

Here is a video of a rather impressive Manifold GIS CUDA demo performing a raster operation:

http://www.manifold.net/video/nvidia_cuda_demo.wmv

The operation is reduced from 60 seconds to 2 using 1 GPU - imagine if they had 4 GPUs! It would go from 60 seconds to nearly realtime....

Best,

Jeshua Lacock, Owner
<http://OpenOSX.com>
phone: 208.462.4171

On Wednesday 23 September 2009, Jeshua Lacock wrote:

On Sep 15, 2009, at 7:52 AM, Jyothish Soman wrote:
> Sorry to you all for not being active, I will love to help in this
> effort. >From September 28 to december end, there is only GRASS
> coding on the menu for me.
>
> Please do pass me any work in that time frame OPENCL or CUDA . I
> will be happy to oblige.
>
> Also, I think there is scope for using GPU as a coprocessor and
> splitting work between different processors on the same machine to
> coexist.
>
> FYI, I am very much at ease with CUDA (NVidia GPU programming), than
> any other form of parallelization methods. It is my field of research.

Greetings,

Here is a video of a rather impressive Manifold GIS CUDA demo
performing a raster operation:

http://www.manifold.net/video/nvidia_cuda_demo.wmv

The operation is reduced from 60 seconds to 2 using 1 GPU - imagine if
they had 4 GPUs! It would go from 60 seconds to nearly realtime....

Best,

Interesting... But I wonder about a couple things.

1. why would it take 60 seconds to compute a slope map from a 1400x1400 cell
DEM?

On a 3.2Ghz Xeon (5 year old machine), using a 2710x3306 cell raster,
r.slope.aspect takes:

real 0m7.637s
user 0m6.984s
sys 0m0.508s

2. It looks like the map in the demo is Int16... Does CUDA-based math support
double precision floating point calculations? Last time I checked it didn't.

Other than those 2 points, I would love to see GPU-based acceleration in
GRASS.

A thread from last year on this topic:
http://www.mail-archive.com/grass-dev@lists.osgeo.org/msg01925.html

Hopefully things have improved since then!

Cheers,
Dylan

--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.74.7341

On Sep 23, 2009, at 2:03 PM, Dylan Beaudette wrote:

Other than those 2 points, I would love to see GPU-based acceleration in
GRASS.

Hi Dylan,

I can't comment you two very valid observations, I don't know anything more about it than what was presented in the video.

A thread from last year on this topic:
http://www.mail-archive.com/grass-dev@lists.osgeo.org/msg01925.html

Hopefully things have improved since then!

It looks like it to me, I think OpenCL improves the situation, and indeed GPUs have gotten faster.

Imagine what this 2.72 terraflops card could do with OpenCL enabled:

http://www.electronista.com/articles/09/09/23/ati.radeon.hd.5800.official/

<grin>

It has GDDR5 memory with 150Gbps bandwidth!

Best,

Jeshua Lacock, Owner
<http://OpenOSX.com>
phone: 208.462.4171