[GRASS-dev] [GRASS GIS] #873: r.univar: allow percentile= to be doubles

#873: r.univar: allow percentile= to be doubles
-----------------------------+----------------------------------------------
Reporter: hamish | Owner: grass-dev@lists.osgeo.org
     Type: enhancement | Status: new
Priority: major | Milestone: 6.5.0
Component: Raster | Version: svn-develbranch6
Keywords: r.univar, stats | Platform: All
      Cpu: All |
-----------------------------+----------------------------------------------
Hi,

for `r.univar` it is useful to have sub-integer percentile output, e.g.
for calculating confidence intervals/sigma at 68.2689%, 95.45%, and
99.73%. Currently `r.univar` only accepts whole integers for the
percentile=. Actually it's worse: it accepts floats on the command line
but they quietly get truncated into integers.(!)

see http://article.gmane.org/gmane.comp.gis.grass.user/20744

currently `r.colors.stddev` is hit by this and probably anyone doing stats
on map values.

the attached patch (vs. 6.5svn) attempts to fix this. please review.
R-bridge users please compare with R's answers.

{{{
#spearfish
g.region rast=elevation.10m
r.univar elevation.10m perc=68,68.2689,95,95.45,99,99.73 -e
}}}

Hamish

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/873&gt;
GRASS GIS <http://grass.osgeo.org>

#873: r.univar: allow percentile= to be doubles
--------------------------+-------------------------------------------------
  Reporter: hamish | Owner: grass-dev@lists.osgeo.org
      Type: enhancement | Status: new
  Priority: major | Milestone: 6.5.0
Component: Raster | Version: svn-develbranch6
Resolution: | Keywords: r.univar, stats
  Platform: All | Cpu: All
--------------------------+-------------------------------------------------
Comment (by glynn):

Replying to [ticket:873 hamish]:

> for `r.univar` it is useful to have sub-integer percentile output, e.g.
for calculating confidence intervals/sigma at 68.2689%, 95.45%, and
99.73%. Currently `r.univar` only accepts whole integers for the
percentile=. Actually it's worse: it accepts floats on the command line
but they quietly get truncated into integers.(!)

> See [http://article.gmane.org/gmane.comp.gis.grass.user/20744\]

> Currently `r.colors.stddev` is hit by this and probably anyone doing
stats on map values.

Note that `r.quantile` supports non-integer percentiles, works with large
maps, and is significantly more efficient.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/873#comment:1&gt;
GRASS GIS <http://grass.osgeo.org>

#873: r.univar: allow percentile= to be doubles
--------------------------+-------------------------------------------------
  Reporter: hamish | Owner: grass-dev@lists.osgeo.org
      Type: enhancement | Status: new
  Priority: major | Milestone: 6.5.0
Component: Raster | Version: svn-develbranch6
Resolution: | Keywords: r.univar, stats
  Platform: All | Cpu: All
--------------------------+-------------------------------------------------
Comment (by mmetz):

Replying to [comment:1 glynn]:
> Note that `r.quantile` supports non-integer percentiles, works with
large maps, and is significantly more efficient.

IMHO, it would still be nice if `r.univar` supports floating point
percentiles. The `r.univar` manual could mention that `r.quantile` is the
better choice if only quantiles and/or percentiles are requested, e.g.
under NOTES where it already says that `r.univar` can use large amounts of
system memory for extended statistics. I would suggest to also add
`r.quantile` to the SEE ALSO section.

Markus M

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/873#comment:2&gt;
GRASS GIS <http://grass.osgeo.org>

#873: r.univar: allow percentile= to be doubles
--------------------------+-------------------------------------------------
  Reporter: hamish | Owner: grass-dev@lists.osgeo.org
      Type: enhancement | Status: new
  Priority: major | Milestone: 6.5.0
Component: Raster | Version: svn-develbranch6
Resolution: | Keywords: r.univar, stats
  Platform: All | Cpu: All
--------------------------+-------------------------------------------------
Comment (by hamish):

patch committed to 6.5svn with r40939. (will port to trunk soon)

> R-bridge users please compare with R's answers.

I'd still like verification before considering if this should go in 6.4 or
not. I'd settle for the list of commands to test it myself.

mmetz:
> The r.univar manual could mention that r.quantile is the better
> choice if only quantiles and/or percentiles are requested

added in the patch.

Hamish

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/873#comment:3&gt;
GRASS GIS <http://grass.osgeo.org>

#873: r.univar: allow percentile= to be doubles
--------------------------+-------------------------------------------------
  Reporter: hamish | Owner: grass-dev@lists.osgeo.org
      Type: enhancement | Status: closed
  Priority: major | Milestone: 6.5.0
Component: Raster | Version: svn-develbranch6
Resolution: fixed | Keywords: r.univar, stats
  Platform: All | Cpu: All
--------------------------+-------------------------------------------------
Changes (by hamish):

  * status: new => closed
  * resolution: => fixed

Comment:

backported to relbr64 with r41377.

Tested in 6.5svn against R's quantile(). mean residuals less than 3mm for
Spearfish's elevation.10m.

test method follows.

GRASS 6.5svn:
{{{
#spearfish
g.region rast=elevation.10m
Q=`seq -s, 0 5 100`

r.univar -e elevation.10m percentile=$Q -g
n=2654802
null_cells=0
cells=2654802
min=1061.064087
max=1846.743408
range=785.679321
mean=1348.37144603811
mean_of_abs=1348.37144603811
stddev=175.494309916948
variance=30798.2528132259
coeff_var=13.0152793158443
sum=3579659211.6848597527
first_quartile=1196.8
median=1309.37
third_quartile=1480.29
percentile_0=1061.06
percentile_5=1123.09
percentile_10=1153.04
percentile_15=1171.19
percentile_20=1184.32
percentile_25=1196.8
percentile_30=1210.82
percentile_35=1229.45
percentile_40=1251.94
percentile_45=1280.39
percentile_50=1309.37
percentile_55=1346.4
percentile_60=1378.71
percentile_65=1410.04
percentile_70=1440.23
percentile_75=1480.29
percentile_80=1525.85
percentile_85=1568.51
percentile_90=1613.6
percentile_95=1671.23
percentile_100=1846.74

Q=`seq -s, 0 0.1 1`
r.univar -e elevation.10m percentile=$Q -g
n=2654802
null_cells=0
cells=2654802
min=1061.064087
max=1846.743408
range=785.679321
mean=1348.37144603811
mean_of_abs=1348.37144603811
stddev=175.494309916948
variance=30798.2528132259
coeff_var=13.0152793158443
sum=3579659211.6848597527
first_quartile=1196.8
median=1309.37
third_quartile=1480.29
percentile_0= 1061.06
percentile_0_1=1073.82
percentile_0_2=1078.85
percentile_0_3=1083.27
percentile_0_4=1085.54
percentile_0_5=1087.8
percentile_0_6=1090.37
percentile_0_7=1092.09
percentile_0_8=1093.77
percentile_0_9=1095.2
percentile_1= 1096.48
}}}

R 2.7.1:
{{{
library(spgrass6)
G <- gmeta6()
spear <- rast.get6("elevation.10m", cat = FALSE, ignore.stderr = TRUE)

summary(spear)

Q5 <- quantile(spear$elevation.10m, probs=seq(0,1,0.05), na.rm=TRUE,
names=TRUE, type=8)
cat(Q5, sep="\n")
1061.064
1123.092
1153.037
1171.192
1184.320
1196.800
1210.821
1229.449
1251.937
1280.389
1309.368
1346.401
1378.709
1410.040
1440.232
1480.286
1525.850
1568.510
1613.603
1671.230
1846.743

Q1 <- quantile(spear$elevation.10m, probs=seq(0,0.01,0.001), na.rm=TRUE,
names=TRUE, type=8)
cat(Q1, sep="\n")
1061.064
1073.824
1078.849
1083.274
1085.538
1087.803
1090.373
1092.091
1093.775
1095.205
1096.479
}}}

Octave/Matlab:
{{{
Q5_G = [ ...
1061.06
1123.09
1153.04
1171.19
1184.32
1196.8
1210.82
1229.45
1251.94
1280.39
1309.37
1346.4
1378.71
1410.04
1440.23
1480.29
1525.85
1568.51
1613.6
1671.23
1846.74 ];

Q5_R = [ ...
1061.064
1123.092
1153.037
1171.192
1184.320
1196.800
1210.821
1229.449
1251.937
1280.389
1309.368
1346.401
1378.709
1410.040
1440.232
1480.286
1525.850
1568.510
1613.603
1671.230
1846.743 ];

Q1_G = [ ...
1061.06
1073.82
1078.85
1083.27
1085.54
1087.8
1090.37
1092.09
1093.77
1095.2
1096.48 ];

Q1_R = [ ...
1061.064
1073.824
1078.849
1083.274
1085.538
1087.803
1090.373
1092.091
1093.775
1095.205
1096.479 ];

%summarize residuals
>> mean ( abs(Q5_G - Q5_R) )
ans = 0.001 meter
>> std( abs(Q5_G - Q5_R) )
ans = 0.0014

>> mean ( abs(Q1_G - Q1_R) )
ans = 0.003 meter
>> std( abs(Q1_G - Q1_R) )
ans = 0.0015
}}}

Hamish

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/873#comment:4&gt;
GRASS GIS <http://grass.osgeo.org>