It came up on the other list about calculating s.univar stats
for sites in a cell. I've been going back and forth with Gordon Keith
on the other list and privately, about a modified s.univar.
Anyway, I've got most of a version but I hit a snag when the count
for a cell is one. Some of the stats come up NaN. At the moment,
I've hooked in modifications so the offending stats don't divide
by zero (which was the problem). Now, if there's just one data point,
what are appropriate values for:
standard deviation (0.0 ?)
skewness (0.0 ?)
kurtosis (-3.0?)
coefficient of variation (0.0?)
The results above are what I get when I guard for the divide by zero
cases (n = 1). Zero seems right for most, but I don't know about
kurtosis. It has a -3.0 bias added to it at the end.
Once I get this resolved, I just have to hook in a proper "main",
and I'll put it up for testing...
--
begin 664 .signature
M<F5L;&E-("Y'(&-I<D4@/G1E;BYS<&I`,FUG93P)"`@("`@("`@("`@("`@(
M"`@("`@("`@("`@("`@("`A%<FEC($<N($UI;&QE<B`\96=M,D!J<',N;F5T
"/@H`
`
end
On Wed, 18 Sep 2002, Eric G. Miller wrote:
It came up on the other list about calculating s.univar stats
for sites in a cell. I've been going back and forth with Gordon Keith
on the other list and privately, about a modified s.univar.
Anyway, I've got most of a version but I hit a snag when the count
for a cell is one. Some of the stats come up NaN. At the moment,
I've hooked in modifications so the offending stats don't divide
by zero (which was the problem). Now, if there's just one data point,
what are appropriate values for:
standard deviation (0.0 ?)
skewness (0.0 ?)
kurtosis (-3.0?)
coefficient of variation (0.0?)
From R
x <- 2
mean(x)
[1] 2
sd(x)
[1] NA
library(e1071)
kurtosis(x)
[1] NA
skewness(x)
[1] NA
where NA is not available, here not defined. Zero is wrong by definition,
but Inf is a possibility - at least for variance or standard deviation. Do
we have a general NULL that could be used?
The results above are what I get when I guard for the divide by zero
cases (n = 1). Zero seems right for most, but I don't know about
kurtosis. It has a -3.0 bias added to it at the end.
Once I get this resolved, I just have to hook in a proper "main",
and I'll put it up for testing...
Roger
--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand@nhh.no
and: Department of Geography and Regional Development, University of
Gdansk, al. Mar. J. Pilsudskiego 46, PL-81 378 Gdynia, Poland.
On Wed, Sep 18, 2002 at 09:49:37AM +0200, Roger Bivand wrote:
On Wed, 18 Sep 2002, Eric G. Miller wrote:
> It came up on the other list about calculating s.univar stats
> for sites in a cell. I've been going back and forth with Gordon Keith
> on the other list and privately, about a modified s.univar.
>
> Anyway, I've got most of a version but I hit a snag when the count
> for a cell is one. Some of the stats come up NaN. At the moment,
> I've hooked in modifications so the offending stats don't divide
> by zero (which was the problem). Now, if there's just one data point,
> what are appropriate values for:
>
> standard deviation (0.0 ?)
> skewness (0.0 ?)
> kurtosis (-3.0?)
> coefficient of variation (0.0?)
>From R
> x <- 2
> mean(x)
[1] 2
> sd(x)
[1] NA
> library(e1071)
> kurtosis(x)
[1] NA
> skewness(x)
[1] NA
where NA is not available, here not defined. Zero is wrong by definition,
but Inf is a possibility - at least for variance or standard deviation. Do
we have a general NULL that could be used?
There isn't a way to store NULL in sites files (NaN would be approriate,
but the library doesn't support it). Maybe zero will have to do. A
sample size of one is about as useful as an anecdote anyway... 
--
begin 664 .signature
M<F5L;&E-("Y'(&-I<D4@/G1E;BYS<&I`,FUG93P)"`@("`@("`@("`@("`@(
M"`@("`@("`@("`@("`@("`A%<FEC($<N($UI;&QE<B`\96=M,D!J<',N;F5T
"/@H`
`
end
Eric,
zero may be very misleading, so the program should give clear warning that
there are cells
with count one and the zeroes in the output can actually mean NaN. There is no
point
to spend too much time on this as the sites should be moved to the new
vector format as soon as it is stable enough (Radim, please keep us posted).
Helena
where NaN is supported.
"Eric G. Miller" wrote:
On Wed, Sep 18, 2002 at 09:49:37AM +0200, Roger Bivand wrote:
> On Wed, 18 Sep 2002, Eric G. Miller wrote:
>
> > It came up on the other list about calculating s.univar stats
> > for sites in a cell. I've been going back and forth with Gordon Keith
> > on the other list and privately, about a modified s.univar.
> >
> > Anyway, I've got most of a version but I hit a snag when the count
> > for a cell is one. Some of the stats come up NaN. At the moment,
> > I've hooked in modifications so the offending stats don't divide
> > by zero (which was the problem). Now, if there's just one data point,
> > what are appropriate values for:
> >
> > standard deviation (0.0 ?)
> > skewness (0.0 ?)
> > kurtosis (-3.0?)
> > coefficient of variation (0.0?)
>
> >From R
>
> > x <- 2
> > mean(x)
> [1] 2
> > sd(x)
> [1] NA
> > library(e1071)
> > kurtosis(x)
> [1] NA
> > skewness(x)
> [1] NA
>
> where NA is not available, here not defined. Zero is wrong by definition,
> but Inf is a possibility - at least for variance or standard deviation. Do
> we have a general NULL that could be used?
There isn't a way to store NULL in sites files (NaN would be approriate,
but the library doesn't support it). Maybe zero will have to do. A
sample size of one is about as useful as an anecdote anyway... 
--
begin 664 .signature
M<F5L;&E-("Y'(&-I<D4@/G1E;BYS<&I`,FUG93P)"`@("`@("`@("`@("`@(
M"`@("`@("`@("`@("`@("`A%<FEC($<N($UI;&QE<B`\96=M,D!J<',N;F5T
"/@H`
`
end
_______________________________________________
grass5 mailing list
grass5@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass5
On Wed, Sep 18, 2002 at 08:24:00AM -0400, Helena Mitasova wrote:
Eric,
zero may be very misleading, so the program should give clear warning that
there are cells
with count one and the zeroes in the output can actually mean NaN. There is no
point
to spend too much time on this as the sites should be moved to the new
vector format as soon as it is stable enough (Radim, please keep us posted).
Haven't spent much time on it, half the needed code was already written.
Guess, I'll just have it zero those stats in the n=1 case. Also, I
think I'll add an option for specifying a lower bound on the number of
samples per cell. Hmm, I wonder if anyone would find it interesting to
look at the variance of the means calculated for each cell? Guess I'll
leave that to a stats package...
--
begin 664 .signature
M<F5L;&E-("Y'(&-I<D4@/G1E;BYS<&I`,FUG93P)"`@("`@("`@("`@("`@(
M"`@("`@("`@("`@("`@("`A%<FEC($<N($UI;&QE<B`\96=M,D!J<',N;F5T
"/@H`
`
end
On Thu, 19 Sep 2002 11:49, Eric G. Miller wrote:
Hmm, I wonder if anyone would find it
interesting to look at the variance of the means calculated for each
cell?
That's pretty close to what I actually want it for.
The original sites are depth fixes from a swath echosounder.
I want the average value for each cell to create a DTM (median is
actually better than mean for my application).
I'm then defining training regions over areas of the DTM and calculating
a variety of statistics for all the cells in each region. (mean, std
dev, skew, kurt) x (depth, slope, aspect, profile, tangent) and seeing
if any of these can be used in any way as an indicator of habitat.
Regards
Gordon
--
Gordon Keith
Programmer/Data Analyst
Marine Acoustics
CSIRO Marine Research
http://www.marine.csiro.au
"640K ought to be enough for anybody." Bill Gates, 1981
On Wednesday 18 September 2002 02:24 pm, Helena Mitasova wrote:
Eric,
zero may be very misleading, so the program should give clear warning that
there are cells
with count one and the zeroes in the output can actually mean NaN. There is
no point
to spend too much time on this as the sites should be moved to the new
vector format as soon as it is stable enough (Radim, please keep us
posted).
Helena
where NaN is supported.
The new vector format (not so important) / API (more important) is not yet
stable for current upgrade of all v.*/s.* modules, but it is already stable
enough for some tests with modules like s.surf.rst. Such modules require
large data sets (> 10^6 points), which is not quite typical for vectors
(10^5 elements). Here some problems may appear (speed - reading data from
database) and it may influence new vectors before it is stable.
I would appreciate if somebody who knows something about s.surf.rst could
try to upgrade to g51 (Vect_select_lines_by_box should be used instead of
src/libes/rst_gmsl/tree?), I am ready to help.
Radim