[GRASS-dev] Re: No reply from developer mailing list

Hello Jyotish,

yes, no answer is "go ahead, this sounds good to us".

GRASS is mostly a row processing engine when we work on raster data.
It does have a restriction for parallel processing, it is that it does
compress rows on writing, but it means also uncompressing them on
reading. i.e. no row distribution possible directly: because no random
access.

Best for raster parallelisation (by blocks or rows) would be Random
Access. That would certainly mean uncompressed raster data, which is
an issue on many levels, some philosophical, some practical.

Right now a direct raster MPI or OpenMP implementation is distribution
of pixels in a given row, because that row is accessed, uncompressed,
so any pixel slot can be accessed in parallel. Ideally, we would like
the same access on rows or group of rows (blocks) at the same time for
more elegant parallelization.

Now, this may be different module to module, as some (hydrological?)
modules load full or large portions of images in memory, there maybe a
good parallelization option in those cases. It is on a case by case
basis.

Parallelization is excellent with sorting algorithms. Have you thought
maybe about basic GIS features like cleaning Vectors? I know that
large vector get checked for hours on import (r.in.ogr), see
http://biogeo.berkeley.edu/gadm/data/gadm_v0dot9_shp.zip the GADM
shapefile for example. I am not sure how it works, but reducing it by
half would save hours directly on large vectors.

I believe you could open a sort of poll online to ask users and
developer which functions/modules that take more than 0.5 hour to run
(especially using large datasets), and see those for a start.

Many good wishes and if anything to help I can try my best,
Yann

2009/3/24 Markus Neteler <neteler@osgeo.org>

On Mon, Mar 23, 2009 at 7:14 PM, Jyothish Soman
<jyothish.soman@gmail.com> wrote:
> Till now, there has not been any reply from the developer mailing list.

Well, you got one from me :slight_smile:
http://lists.osgeo.org/pipermail/grass-dev/2009-March/043000.html

> Without expert help, I might take up a problem that is either too small or
> too big, because of restricted time.

Neither nor - we are just heavily overloaded (or, say, not enough to
reply to more complex questions in short time).

> Should I wait for some more time, or
> should I start looking at v.build and r.terraflow as candidate functions,
> and start writing my application around it.
>
> Both are, I assume, large enough and slow enough.

v.build is AFAIK heavily based on the vectorlib which is under change
in GRASS 7 at time.
I think that r.mapcalc might be a good candidate too as then many
users will benefit (r.terraflow might be used by few people only).

We could ask on grass-user what people would like to see
accelerated.

In grass-dev, it is common to not obtain an answer which means "good,
please go ahead". I certainly understand that your proposal deserves
answers! And we'll do so.

Best
Markus

PS: I attach main.c from i.atcorr which was openMP'ed by Yann Chemin
recently (have added him in CC). I should have tested it already but
didn't have the time yet... It's amazing - only few lines of changes to
get openMP enabled!

--
Yann Chemin
Mobile: +33 (06) 10 11 39 26
Home: +33 (02) 35 27 08 20,
Address: Gite de Mortagne,
16 rue de la chenaie,
76110 Bec de Mortagne,
France

Perso: http://www.freewebs.com/ychemin
YiKingDo: http://yikingdo.unblog.fr/

Yann Chemin wrote:

GRASS is mostly a row processing engine when we work on raster data.
It does have a restriction for parallel processing, it is that it does
compress rows on writing, but it means also uncompressing them on
reading. i.e. no row distribution possible directly: because no random
access.

Best for raster parallelisation (by blocks or rows) would be Random
Access. That would certainly mean uncompressed raster data, which is
an issue on many levels, some philosophical, some practical.

GRASS supports random reads whether the data is compressed or
uncompressed. It used to support random writes provided that the data
is uncompressed, but that was removed in 7.0 because nothing was using
it. It could be re-instated if there was a use for it.

Even without random writes, there's no reason why a multi-threaded
application can't buffer writes and have a worker thread write them
out in order.

A bigger issue is that much of the code in the libraries isn't
thread-safe. It's better in 7.0, although you still can have multiple
threads performing concurrent I/O on a single raster.

--
Glynn Clements <glynn@gclements.plus.com>