[GRASSLIST:1701] GRASS on clusters

Hi List,
I'd like to reopen Dr. Rich Shepard's
year old message (see below) to this
list, regarding using GRASS with a
cluster. I have a big project coming up
and am hoping to set up a cluster to
help speed up the workload. Have any of
you experimented further with GRASS and
clusters? Messages to the listserv
indicate that it is indeed possible, but
I didn't see if anyone had actually
tried it. If anyone has, would they mind
giving me a clue as to how it might turn
out? Does it speed things up, or is it
more hassle than its worth? I am using
linux as my OS and will be using Beowulf
as my clustering software, and have six
pentiums to start with.

Thanks for any input, feedback, and
comments!

Lars

Lars Bromley
IT Guy
American Association for
  the Advancement of Science

PS Someone on the ERDAS list got flamed
today for referring to GIS folks as
'toadies', he said it was cuz that's
what he learned to call them in the
military. Does anyone on this list know
the appropriate military response for
Image Analysts? :wink:

              To: grasslist@baylor.edu
              Subject:
SMP/Cluster-GRASS: closure
              From: Rich Shepard
<rshepard@appl-ecosys.com>
              Date: Sat, 22 Apr 2000
09:46:16 -0700 (PDT)
              Sender:
owner-GRASSLIST@baylor.edu

            I've dug more deeply into
requirements for running GRASS on a
          multiprocessor or cluster
system (such as a beowulf cluster). What
I've
          learned are:

            1) It can be done.

            2) It is too expensive to be
practical for us.

            Apparently, The Portland
Group is one of a few compiler/tool
developers
          out there (NAG being another
one). A 2-developer license for their
cluster
          development kit (CDK) costs
$2,500 plus $500/year for minor upgrades
and
          technical assistance. Since
I'm the only developer here, that's much
to much
          money to spend on a single
project for a single user.

            Ergo, while it was a nice
concept, it's not for us. For a team
using the
          tools for more than one
project, or with more than one
developer, it may be
          a very practical approach.

          'Nuff said,

          Rich

          Dr. Richard B. Shepard,
President

                                 Applied
Ecosystem Services, Inc. (TM)
                        Making
environmentally-responsible mining
happen. (SM)

--------------------------------
                      2404 SW 22nd
Street | Troutdale, OR 97060-1247 |
U.S.A.
           + 1 503-667-4517 (voice) | +
1 503-667-8863 (fax) |
rshepard@appl-ecosys.com

On Wed, 4 Apr 2001, Lars Bromley wrote:

I'd like to reopen Dr. Rich Shepard's year old message (see below) to this
list, regarding using GRASS with a cluster. I have a big project coming
up and am hoping to set up a cluster to help speed up the workload. Have
any of you experimented further with GRASS and clusters? Messages to the
listserv indicate that it is indeed possible, but I didn't see if anyone
had actually tried it. If anyone has, would they mind giving me a clue as
to how it might turn out? Does it speed things up, or is it more hassle
than its worth? I am using linux as my OS and will be using Beowulf as my
clustering software, and have six pentiums to start with.

Lars,

  Some day, I'd like to be able to afford the port. The parallelizing
compiler is very expensive (at least on my budget), but it is do-able. Right
now, I have higher priorities on my GRASS time, and I've not read anything
from anyone else about putting together such a port.

  There would be a large time investment, too, to reorganize the code so it
could be parallelized. Candidly, I've no clue what that would require.

Rich

Dr. Richard B. Shepard, President

                       Applied Ecosystem Services, Inc. (TM)
            2404 SW 22nd Street | Troutdale, OR 97060-1247 | U.S.A.
+ 1 503-667-4517 (voice) | + 1 503-667-8863 (fax) | rshepard@appl-ecosys.com

On Wed, Apr 04, 2001 at 07:01:50PM -0700, Rich Shepard wrote:

On Wed, 4 Apr 2001, Lars Bromley wrote:

> I'd like to reopen Dr. Rich Shepard's year old message (see below) to this
> list, regarding using GRASS with a cluster. I have a big project coming
> up and am hoping to set up a cluster to help speed up the workload. Have
> any of you experimented further with GRASS and clusters? Messages to the
> listserv indicate that it is indeed possible, but I didn't see if anyone
> had actually tried it. If anyone has, would they mind giving me a clue as
> to how it might turn out? Does it speed things up, or is it more hassle
> than its worth? I am using linux as my OS and will be using Beowulf as my
> clustering software, and have six pentiums to start with.

Lars,

  Some day, I'd like to be able to afford the port. The parallelizing
compiler is very expensive (at least on my budget), but it is do-able. Right
now, I have higher priorities on my GRASS time, and I've not read anything
from anyone else about putting together such a port.

  There would be a large time investment, too, to reorganize the code so it
could be parallelized. Candidly, I've no clue what that would require.

Rich

excuse me to indrude here... Just a hint:
GRASS 5 contains the "gmath" library which is a wrapper to LAPACK/BLAS.
LAPACK/BLAS itself is a sophisticated mathematical library which exists
in parallelized form (as far as I know). See:
http://www.netlib.org/lapack/

Like that cpu-intensive parts in GRASS code could be rewritten to utilize
the "gmath" lib (src/libes/gmath/).

Maybe that's a starting point?

Regards

Markus Neteler

PS: Some "gmath" details can be found in the "GRASS 5 programmer's manual":
    Bereich Geographie – Naturwissenschaftliche Fakultät – Leibniz Universität Hannover

Looks like I may have missed an important detail out there: GRASS would need to
be recompiled in parallelized form, and this requires an expensive compiler or
some heavy programming? Darn details and their devils...

I was hoping that Beowulf would analyze the processes and distribute the
computing tasks, am I misunderstanding how such software works?

Thanks for your time,
Lars

Markus Neteler wrote:

On Wed, Apr 04, 2001 at 07:01:50PM -0700, Rich Shepard wrote:
> On Wed, 4 Apr 2001, Lars Bromley wrote:
>
> > I'd like to reopen Dr. Rich Shepard's year old message (see below) to this
> > list, regarding using GRASS with a cluster. I have a big project coming
> > up and am hoping to set up a cluster to help speed up the workload. Have
> > any of you experimented further with GRASS and clusters? Messages to the
> > listserv indicate that it is indeed possible, but I didn't see if anyone
> > had actually tried it. If anyone has, would they mind giving me a clue as
> > to how it might turn out? Does it speed things up, or is it more hassle
> > than its worth? I am using linux as my OS and will be using Beowulf as my
> > clustering software, and have six pentiums to start with.
>
> Lars,
>
> Some day, I'd like to be able to afford the port. The parallelizing
> compiler is very expensive (at least on my budget), but it is do-able. Right
> now, I have higher priorities on my GRASS time, and I've not read anything
> from anyone else about putting together such a port.
>
> There would be a large time investment, too, to reorganize the code so it
> could be parallelized. Candidly, I've no clue what that would require.
>
> Rich
>

excuse me to indrude here... Just a hint:
GRASS 5 contains the "gmath" library which is a wrapper to LAPACK/BLAS.
LAPACK/BLAS itself is a sophisticated mathematical library which exists
in parallelized form (as far as I know). See:
http://www.netlib.org/lapack/

Like that cpu-intensive parts in GRASS code could be rewritten to utilize
the "gmath" lib (src/libes/gmath/).

Maybe that's a starting point?

Regards

Markus Neteler

PS: Some "gmath" details can be found in the "GRASS 5 programmer's manual":
    http://www.geog.uni-hannover.de/grass/grassdevel.html#prog

On Thu, Apr 05, 2001 at 09:03:26AM -0400, Lars Bromley wrote:

Looks like I may have missed an important detail out there: GRASS would need to
be recompiled in parallelized form, and this requires an expensive compiler or
some heavy programming? Darn details and their devils...

I was hoping that Beowulf would analyze the processes and distribute the
computing tasks, am I misunderstanding how such software works?

Thanks for your time,

Hi Lars,

I am not a specialist... As far as I know it's not sufficient to simply
use a parallelized compiler if code is written linear. I am quite sure
that algorithms need to be modified. We would need someone to search for
time consuming functions in GRASS code and modify those.

Perhaps you know more than me!
Just guessing,

Markus

PS: From the beowulf FAQ (www.beowulf.org):
"3. Can I take my software and run it on a Beowulf and have it go faster?
[1999-05-13]

Maybe, if you put some work into it. You need to split it into
parallel tasks that communicate using MPI or PVM or network sockets or
SysV IPC. Then you need to recompile it.

Or, as Greg Lindahl points out, if you just want to run the same
program a few thousand times with different input files, a shell script
will suffice.

As Christopher Bohn points out, even multi-threaded software won't
automatically get a speedup; multi-threaded software assumes
shared-memory. There are some distributed shared memory packages under
development (DIPC, Mosix, ...), but the memory access patterns in
software written for an SMP machine could potentially result in a
*loss* of performance on a DSM machine."

Markus Neteler wrote:

> On Wed, Apr 04, 2001 at 07:01:50PM -0700, Rich Shepard wrote:
> > On Wed, 4 Apr 2001, Lars Bromley wrote:
> >
> > > I'd like to reopen Dr. Rich Shepard's year old message (see below) to this
> > > list, regarding using GRASS with a cluster. I have a big project coming
> > > up and am hoping to set up a cluster to help speed up the workload. Have
> > > any of you experimented further with GRASS and clusters? Messages to the
> > > listserv indicate that it is indeed possible, but I didn't see if anyone
> > > had actually tried it. If anyone has, would they mind giving me a clue as
> > > to how it might turn out? Does it speed things up, or is it more hassle
> > > than its worth? I am using linux as my OS and will be using Beowulf as my
> > > clustering software, and have six pentiums to start with.
> >
> > Lars,
> >
> > Some day, I'd like to be able to afford the port. The parallelizing
> > compiler is very expensive (at least on my budget), but it is do-able. Right
> > now, I have higher priorities on my GRASS time, and I've not read anything
> > from anyone else about putting together such a port.
> >
> > There would be a large time investment, too, to reorganize the code so it
> > could be parallelized. Candidly, I've no clue what that would require.
> >
> > Rich
> >
>
> excuse me to indrude here... Just a hint:
> GRASS 5 contains the "gmath" library which is a wrapper to LAPACK/BLAS.
> LAPACK/BLAS itself is a sophisticated mathematical library which exists
> in parallelized form (as far as I know). See:
> LAPACK — Linear Algebra PACKage
>
> Like that cpu-intensive parts in GRASS code could be rewritten to utilize
> the "gmath" lib (src/libes/gmath/).
>
> Maybe that's a starting point?
>
> Regards
>
> Markus Neteler
>
> PS: Some "gmath" details can be found in the "GRASS 5 programmer's manual":
> Bereich Geographie – Naturwissenschaftliche Fakultät – Leibniz Universität Hannover

--
Markus Neteler * University of Hannover
Institute of Physical Geography and Landscape Ecology
Schneiderberg 50 * D-30167 Hannover * Germany
Tel: ++49-(0)511-762-4494 Fax: -3984

On Thu 05 Apr 01 11:41, Markus Neteler wrote:

On Wed, Apr 04, 2001 at 07:01:50PM -0700, Rich Shepard wrote:
> On Wed, 4 Apr 2001, Lars Bromley wrote:

excuse me to indrude here... Just a hint:
GRASS 5 contains the "gmath" library which is a wrapper to LAPACK/BLAS.
LAPACK/BLAS itself is a sophisticated mathematical library which exists
in parallelized form (as far as I know). See:
http://www.netlib.org/lapack/

LAPACK and BLAS do have parallelised libraries PLAPACK and PBLAS, which work
with PVM on clusters. This is probably the best route to follow, and could
speed up the numerically intensive stuff. I'd like to keep in touch on this
process, as I will hopefully be parallelising some of my own geophysical
algorithms in the near future.

Cheers

Henk

--
Henk Coetzee
Council for Geoscience
Pretoria, South Africa
email: henkc@geoscience.org.za