[GRASS-dev] grass 7 and pixel random access

Hello list,

I would like to know the planned changes for the raster library,
especially the random access of pixels in the raster.

I wanted to work on it some months back, but my daily job got more intense.
In the coming future, we will need to access easily any row for
parallel processing.

Thanks,
Yann

--
Yann Chemin
International Rice Research Institute
Office: http://www.irri.org/gis
Perso: http://www.freewebs.com/ychemin

On Mon, Apr 28, 2008 at 3:47 AM, Yann Chemin <yann.chemin@gmail.com> wrote:

Hello list,

I would like to know the planned changes for the raster library,
especially the random access of pixels in the raster.

Not sure if all of those are actually planned, but here is a list:
http://grass.osgeo.org/wiki/GRASS_7_ideas_collection#Raster

I wanted to work on it some months back, but my daily job got more intense.
In the coming future, we will need to access easily any row for
parallel processing.

+1

Markus

Yann Chemin wrote:
> I would like to know the planned changes for the raster library,
> especially the random access of pixels in the raster.

Markus:

Not sure if all of those are actually planned, but here is a list:
http://grass.osgeo.org/wiki/GRASS_7_ideas_collection#Raster

> I wanted to work on it some months back, but my daily job got more
> intense.
> In the coming future, we will need to access easily any row for
> parallel processing.

One thing I wonder about for parallel processing of (serial) raster
modules- do we really need random read access to send each individual row
into a separate thread? The overhead with that seems counter-productive.
Couldn't we read some GRASS_NPROC envrio variable and then split the
overall number of rows by that number and create a small number of
threads, ie matching the system.

like: ceil(rows/nproc)
nproc=4
rows=1035
chunk size=259

the last thread reads/processes/writes fewer rows than the others as it
finds the EOF.

another thing I still wonder about (see thread from a month or so back)
is where to start? Modify the libs to support the concept, then tackle
each module on their own? ie concentrate on the non-I/O limited and
can't-do- much-about-it but throw more processor at the problem modules,
and leave non-number crunching modules alone? -- concentrate on areas
where we'll get the most bang for the buck / pick off low hanging fruit /
etc?

Hamish
(showing off his multi-proc naivet'e)

      ____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

Hi,

2008/4/28 Markus Neteler <neteler@osgeo.org>:

On Mon, Apr 28, 2008 at 3:47 AM, Yann Chemin <yann.chemin@gmail.com> wrote:
Not sure if all of those are actually planned, but here is a list:
http://grass.osgeo.org/wiki/GRASS_7_ideas_collection#Raster

btw, is there plan to bind together 2d and 3d library? Maybe I am just
reading wiki pages not carefully...

Martin

--
Martin Landa <landa.martin gmail.com> * http://gama.fsv.cvut.cz/~landa *

2008/4/28 Yann Chemin <yann.chemin@gmail.com>:

We have experimented a bit with parallel processing, and for a given
processor power, there is a minimum amount of operations on a pixel
that needs to be done before there is any time benefit of using
parallelization. I would also believe that multi-core parallelization
(OpenMP: easy to code in most of the case) would benefit earlier that
multi-cpus (MPICH: requires some stronger preparation about coding),
for the physical reason of transport distance of bits.

I honestly would not know at which "level" we should integrate
parallelization capacities, it would be great indeed to have the
library understand a direct row loop and choose to split the loop by
the number of core/cpus available, if we speak about direct raster
processing. This would mean, potentially every direct raster
processing module could benefit, if they actually need it (which is
the main question, actually).

Now many processing in raster (and vector) may now be parallelized
straight-forwardly. And this is the main thing, we often face complex
algorithms, those ones are connecting pixels together from places to
places in the map (i.e. interpolation), those required taylor-made
parallelization. Here we should need tools to ease the task of
parallelization of those complex systems.

Finally, the codes I am working on are mostly pixel-based, either
temporal signal processing, energy-balance, adata assimilation of 1-D
vertical models, some take minutes, some hours/days, and some may go
up to a month (i dont run those last ones, I am waiting for CPU
improvement). So I would benefit directly by anything even just
running half of the raster on each core of my dual-core laptop/desktop.

Well, I am sure there are more experienced people on this list about
parallelization, so I am waiting for comments.

Yann

2008/4/28 Hamish <hamish_b@yahoo.com>:

> > Yann Chemin wrote:
> > > I would like to know the planned changes for the raster library,
> > > especially the random access of pixels in the raster.
> Markus:
>
> > Not sure if all of those are actually planned, but here is a list:
> > http://grass.osgeo.org/wiki/GRASS_7_ideas_collection#Raster
> >
> > > I wanted to work on it some months back, but my daily job got more
> > > intense.
> > > In the coming future, we will need to access easily any row for
> > > parallel processing.
>
>
> One thing I wonder about for parallel processing of (serial) raster
> modules- do we really need random read access to send each individual row
> into a separate thread? The overhead with that seems counter-productive.
> Couldn't we read some GRASS_NPROC envrio variable and then split the
> overall number of rows by that number and create a small number of
> threads, ie matching the system.
>
>
> like: ceil(rows/nproc)
> nproc=4
> rows=1035
> chunk size=259
>
>
> the last thread reads/processes/writes fewer rows than the others as it
> finds the EOF.
>
>
>
> another thing I still wonder about (see thread from a month or so back)
> is where to start? Modify the libs to support the concept, then tackle
> each module on their own? ie concentrate on the non-I/O limited and
> can't-do- much-about-it but throw more processor at the problem modules,
> and leave non-number crunching modules alone? -- concentrate on areas
> where we'll get the most bang for the buck / pick off low hanging fruit /
> etc?
>
>
> Hamish
> (showing off his multi-proc naivet'e)
>
>
>
>
> ____________________________________________________________________________________
> Be a better friend, newshound, and
> know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>
>
>
> _______________________________________________
> grass-dev mailing list
> grass-dev@lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/grass-dev
>

--
Yann Chemin
International Rice Research Institute
Office: http://www.irri.org/gis
Perso: http://www.freewebs.com/ychemin

--
Yann Chemin
International Rice Research Institute
Office: http://www.irri.org/gis
Perso: http://www.freewebs.com/ychemin

Hamish wrote:

> > I would like to know the planned changes for the raster library,
> > especially the random access of pixels in the raster.
Markus:
> Not sure if all of those are actually planned, but here is a list:
> http://grass.osgeo.org/wiki/GRASS_7_ideas_collection#Raster
>
> > I wanted to work on it some months back, but my daily job got more
> > intense.
> > In the coming future, we will need to access easily any row for
> > parallel processing.

One thing I wonder about for parallel processing of (serial) raster
modules- do we really need random read access to send each individual row
into a separate thread? The overhead with that seems counter-productive.
Couldn't we read some GRASS_NPROC envrio variable and then split the
overall number of rows by that number and create a small number of
threads, ie matching the system.

If you just want to speed up top-to-bottom processing, that doesn't
require random access, just a scrolling window (which several modules
already use, either via rowio or with their own cache).

For random access, the main issue is that you want to avoid performing
the decompression, format conversion and resampling steps more than
once. In practice, this means making a temporary "raw" copy of the
data, and then caching it.

Exactly how you cache it depends upon your expected access pattern.
For truly random access, you probably want to cache it in rows. Where
there is some degree of locality, tiles will tend to produce better
results.

another thing I still wonder about (see thread from a month or so back)
is where to start? Modify the libs to support the concept, then tackle
each module on their own? ie concentrate on the non-I/O limited and
can't-do- much-about-it but throw more processor at the problem modules,
and leave non-number crunching modules alone? -- concentrate on areas
where we'll get the most bang for the buck / pick off low hanging fruit /
etc?

It depends upon whether we want to make the raster I/O operations
thread-safe. If we do, that could involve a significant amount of
work, particularly if we don't want to reduce efficiency.

One efficiency issue is that the library keeps a decompressed copy of
the last row which was read. This means that if you're up-sampling the
data (the current region has finer resolution than the raster),
adjacent rows which correspond to the same source row don't require
reading and de-compressing the data.

[However, the re-sampling and the conversion to the requested type
(CELL/FCELL/DCELL) are repeated for each row. Even though it's almost
inevitable, it isn't actually guaranteed that you'll request the same
format or the same resolution for each row.]

If you are trying to parallelise a top-to-bottom module, and one
thread requests a row that is in the middle of being read by another
thread, should it perform a redundant read, or simply wait for the
original thread to de-compress the row?

Also, the approach of having a single "slot" for the most recent row
won't extend to multiple threads. E.g. if you have 10 threads and
you're up-scaling the data 2:1, you would need 5 slots (each source
row will be consumed by two threads).

Parallelising the output is simpler. However, if you want to support
compressed files, there would need to be a critical section so that
each thread can reliably determine the offset at which its data is
written. Regardless of whether you want compressed files, if you don't
have pwrite(), you would need to make lseek() + write() into a
critical section.

If you have pwrite() and don't need compressed files, there are no
inherent concurrency issues. There might be issues with the existing
code using pre-allocated buffers, but those can be fixed.

BTW, for 7.x, can we assume that alloca() is available? It would make
it much easier to write re-entrant code by avoiding the need to
pre-allocate buffers (the alternative is lots of calls to malloc/free,
which could be a significant performance hit).

--
Glynn Clements <glynn@gclements.plus.com>