S-plus and Grass

My experience with splus is limited, but I thought you might
be vaguely interested in some of my thoughts.

1. Interface with Grass.
I found with my datasets (path lengths from v.line/r.drain/r.cost),
it was easier to slam-dunk the data through a nawk shell script
than trying to convince splus that my column headings were important.
I currently use a series of scripts to strip off any unnecessary muck
that comes from the output of the v.line and I place tabs between all
columns and create headers (names files to splus). Works pretty slick and
seems to give me alot more manoverability than dealing with sites files.

2. Spatial Statistics
Splus is sorely lacking in both spatial statistics (morans, kriging, etc.)
and in cluster analysis (WPGMA, UPGMA, etc.). The cluster diagrams are
pretty poor compared to SYSTAT and I don't know how to get the cophenetic
matrix out of splus (might be missing something obvious...). I also find
the discussion of autocorrelation to be limited (actually they mainly discuss
serial correlation in the sense of time series and really don't address
spatial autocorrelation at all).

3. Parametric statistics
Regression analysis appears to be stubborn. I have not currently had
luck with changing the order things go into a multiple regression
(akin to stepwise foward/backward). Again, I have bailed to running
a shell script to rearrange my database, then bring the new database in
and try again.

4. Odd-ball problems
Splus has an unpleasant habit of saving all sorts of tidbits in various
hidden locations. The number of tmp files created can be overwhelming--although
fixable with a function (the documentation of the function "on.exit" is
horrible). I also find it annoying that some terms that splus happily uses
I can't locate in any of my statistics books.

In sum, I am stuck with splus because it is the only package on our
system. I have used SYSTAT, SAS, BMDP, JMP, and various other packages
and am afraid to say that splus ranks towards the bottom. FOr example, I
output data, after testing it in splus (and getting all the sexy graphics),
to a pc with SYSTAT to run Moran's I or to get a lollipop diagram of
cluster data (and the cophenetic matrix). To test the spatial autocorrelation
of the database, it is currently easier to telnet to a system with SAS on it
and have someone run the data than to deal with splus.

As someone in our office exclaimed laughing: "This fancy (expensive) program
on our UNIX net was just beat out by a MAC (a LC40 mac even)???", the sad truth

is that splus does a nifty job on graphics, but as an interface to spatial
databases, I cannot for the life of me, figure out why they don't cover
spatial statistics, autocorrelation, Moran's Index, kriging, ansiotrophy,
etc.etc.etc.etc...

I would be most interested in any summary as splus is an ongoing headache for
me!

Gillian Bowser
Resource Management Specialist
National Park Service-Rocky Mountain Support Office
Denver, Colorado

Gillian Bowser (gillian@rmro.nps.gov) writes on 17 July 1995:
[...]

databases, I cannot for the life of me, figure out why they don't cover
spatial statistics, autocorrelation, Moran's Index, kriging, ansiotrophy,
etc.etc.etc.etc...

Because it's not a *spatial* statistics package...

FWIW, you can calculate Moran's I and the Geary Ratio (plus their
standard errors) using v.autocorr (on a triangulation of sites from
s.geom). It will also spit out a W (connectivity) matrix.

Quadrat count statistics can be calculated using s.qcount,
including:
  Fisher el al. (1922) Relative Variance,
  David & Moore (1954) Index of Cluster Size,
  Douglas (1975) Index of Cluster Frequency,
  Lloyd (1967) "mean crowding",
  Lloyd (1967) Index of patchiness, and
  Morisita's (1959) I (variability b/n patches)
(see Cressie, chapter 9)

Semivariogram modeling is possible with s.sv, m.svfit, and g.gnuplot
(no nested structures, sorry).

Plus there's some other statistical software for sites available:
s.univar, s.probplt, and s.normal (which includes a dozen or so
different tests of (log)normality and other alternatives).

Most of these programs are available at:
  ftp://pasture.ecn.purdue.edu/pub/mccauley/grass
See also the tutorials
  ftp://pasture.ecn.purdue.edu/pub/mccauley/grass/tutorials
and
  http://soils.ecn.purdue.edu/~mccauley/cdhc/

I disagree with the philosoply of one- and two-way links to stat
packages. These types of functions should be available *in* GIS
packages, not in some third party program (free or otherwise). It's
great that S-plus may become linked to GRASS (as it is with ARC/INFO),
but I see these types of solutions as quick fixes. Why can't we
include this functionality *in* GRASS? Sure keeps the price
down... :slight_smile:

So, with all of that said, what other types of programs would you like
to see? (I'm not committing myself but just getting ideas for my
rainy-day hobby list :slight_smile: If you can include a pointer to a public
domain src for this, please do. It significantly speeds up development.

Make sure that the methods suggested make sense for spatial
data. Remember Tober's first law ("everything is related to everything
else, but near things are more related than distant things") and the
assumption of independence by most traditional (non-spatial) methods.

Also FWIW, here's an interesting article (which I don't fully
agree with, but still worth mentioning):

@Article{ anselin93,
  author = "Luc Anselin and Rustin F. Dodson and Sheri Hudak",
  title = "Linking {GIS} and Spatial Data Anlysis in Practice",
  journal = "Geographic Systems",
  year = "1993",
  volume = "1",
  number = "1",
  pages = "3-23"
}

If you need spatial statistics ASAP and can't wait for recreational
programmers, you might check into SPACESTAT. It apparently has links to
ARC/INFO, IDRISI, OSU-MAP, and generic raster files (does this mean
GRASS?) See http://www.ncgia.ucsb.edu/pubs/software.html

Regards,
Darrell
--
James Darrell McCauley, PhD http://soils.ecn.purdue.edu/~mccauley/
Agricultural & Biological Engineering mccauley@ecn.purdue.edu
Purdue University tel: 317.494.1198 fax: 317.496.1115

Statsci has just announced a new "add on" to Splus 3.3 -- it does include
a number of spatial stats. Statsci has had Noel Cressie working with them
on these.. check them out...

On Mon, 17 Jul 1995, James Darrell McCauley wrote:

Gillian Bowser (gillian@rmro.nps.gov) writes on 17 July 1995:
[...]
>databases, I cannot for the life of me, figure out why they don't cover
>spatial statistics, autocorrelation, Moran's Index, kriging, ansiotrophy,
>etc.etc.etc.etc...

Because it's not a *spatial* statistics package...

FWIW, you can calculate Moran's I and the Geary Ratio (plus their
standard errors) using v.autocorr (on a triangulation of sites from
s.geom). It will also spit out a W (connectivity) matrix.

Quadrat count statistics can be calculated using s.qcount,
including:
  Fisher el al. (1922) Relative Variance,
  David & Moore (1954) Index of Cluster Size,
  Douglas (1975) Index of Cluster Frequency,
  Lloyd (1967) "mean crowding",
  Lloyd (1967) Index of patchiness, and
  Morisita's (1959) I (variability b/n patches)
(see Cressie, chapter 9)

Semivariogram modeling is possible with s.sv, m.svfit, and g.gnuplot
(no nested structures, sorry).

Plus there's some other statistical software for sites available:
s.univar, s.probplt, and s.normal (which includes a dozen or so
different tests of (log)normality and other alternatives).

Most of these programs are available at:
  ftp://pasture.ecn.purdue.edu/pub/mccauley/grass
See also the tutorials
  ftp://pasture.ecn.purdue.edu/pub/mccauley/grass/tutorials
and
  http://soils.ecn.purdue.edu/~mccauley/cdhc/

I disagree with the philosoply of one- and two-way links to stat
packages. These types of functions should be available *in* GIS
packages, not in some third party program (free or otherwise). It's
great that S-plus may become linked to GRASS (as it is with ARC/INFO),
but I see these types of solutions as quick fixes. Why can't we
include this functionality *in* GRASS? Sure keeps the price
down... :slight_smile:

So, with all of that said, what other types of programs would you like
to see? (I'm not committing myself but just getting ideas for my
rainy-day hobby list :slight_smile: If you can include a pointer to a public
domain src for this, please do. It significantly speeds up development.

Make sure that the methods suggested make sense for spatial
data. Remember Tober's first law ("everything is related to everything
else, but near things are more related than distant things") and the
assumption of independence by most traditional (non-spatial) methods.

Also FWIW, here's an interesting article (which I don't fully
agree with, but still worth mentioning):

@Article{ anselin93,
  author = "Luc Anselin and Rustin F. Dodson and Sheri Hudak",
  title = "Linking {GIS} and Spatial Data Anlysis in Practice",
  journal = "Geographic Systems",
  year = "1993",
  volume = "1",
  number = "1",
  pages = "3-23"
}

If you need spatial statistics ASAP and can't wait for recreational
programmers, you might check into SPACESTAT. It apparently has links to
ARC/INFO, IDRISI, OSU-MAP, and generic raster files (does this mean
GRASS?) See http://www.ncgia.ucsb.edu/pubs/software.html

Regards,
Darrell
--
James Darrell McCauley, PhD http://soils.ecn.purdue.edu/~mccauley/
Agricultural & Biological Engineering mccauley@ecn.purdue.edu
Purdue University tel: 317.494.1198 fax: 317.496.1115

W. Fredrick Limp, Director and Professor FAX: (501) 575-5218
CAST, Center for Advanced Spatial Technologies TEL: (501) 575-6159
12 Ozark Hall, U of Ark., Fayetteville AR 72701 fred@cast.uark.edu
Opinions expressed here are mine, at least I think they are.