[GRASS-dev] Some comments on the v.surf.icw add-on

I have just tested v.surf.icw in some detail.
In general, it is a very nice module that produces
far more realistic results than plain euclidean v.surf.idw
in the right circumstances.

I made some observations and have some questions/comments I'd
like to share:

Line 150:
This produces an error message from the shell if any of the
env vars does not exist (cosmetics, really):

  if [ $GIS_FLAG_V -eq 1 ] || [ "$GRASS_VERBOSE" -gt 1 ] ; then

Line 234:

  r.cost $VERBOSE -k in=tmp_icw_area_$$ out=cost_site.$NUM coordinate=$EASTING,$NORTHING
  =
  r.cost $VERBOSE -k in=tmp_icw_area_$$ output=cost_site.$NUM coordinate=$EASTING,$NORTHING

Otherwise, it won't play any longer with newer versions of r.cost
that have the "outdir=" option.

Line 239:

This changes the original input data without any warning or
documentation about it:

  # so the divisor exists and the weighting is huge at the exact sample spots
  # more efficient to reclass to 1?
  r.mapcalc "cost_site.$NUM = if(cost_site.$NUM == 0, 0.1, cost_site.$NUM)"

Apparently, this is done to avoid a divison by zero in the standard
IDW formula (line 246):

  EXPRESSION="1.0 / pow(cost_site.$NUM $DIVISOR, $FRICTION )"

If so, it needs to be documented. I actually used costs
normalized to [0,1] and ran into trouble here.
I realize now that wasn't a good idea, because a movement
costof "0" between two spatially distinct locations is physically
implausible. However, this behaviour still needs to be document,
perhaps advising users to use a minimum cost of "1"?

Line 246:

The variable $DIVISOR is never initialized (empty) but still
used in the expression:

  EXPRESSION="1.0 / pow(cost_site.$NUM $DIVISOR, $FRICTION )"

In addition, I would like to know where the second
IDW formula comes from (-r flag). Any literature
references? When would this be the preferable formula?

Thanks,

Ben

------
Files attached to this email may be in ISO 26300 format (OASIS Open Document Format). If you have difficulty opening them, please visit http://iso26300.info for more information.

Ben wrote:

I have just tested v.surf.icw in some detail.

[ http://grass.osgeo.org/wiki/GRASS_AddOns#v.surf.icw ]

In general, it is a very nice module that produces
far more realistic results than plain euclidean v.surf.idw

thanks.

in the right circumstances.

exactly.

I made some observations and have some questions/comments
I'd like to share:

Line 150:
This produces an error message from the shell if any of the

env vars does not exist (cosmetics, really):

  if [ $GIS_FLAG_V -eq 1 ] || [ "$GRASS_VERBOSE" -gt 1 ] ; then

right, there was a code comment on the line above that saying it
would happen. now fixed in addons-svn.

Line 234:

  r.cost $VERBOSE -k in=tmp_icw_area_$$
out=cost_site.$NUM coordinate=$EASTING,$NORTHING
  =
  r.cost $VERBOSE -k in=tmp_icw_area_$$
output=cost_site.$NUM coordinate=$EASTING,$NORTHING

Otherwise, it won't play any longer with newer versions of
r.cost that have the "outdir=" option.

right, fixed in svn.

Line 239:

This changes the original input data without any warning or
documentation about it:

in what I consider to be a corner case. The intention is to do just
the opposite,

  # so the divisor exists and the weighting is huge at the exact sample spots
  # more efficient to reclass to 1?
  r.mapcalc "cost_site.$NUM = if(cost_site.$NUM == 0, 0.1, cost_site.$NUM)"

Apparently, this is done to avoid a divison by zero in the
standard IDW formula (line 246):

  EXPRESSION="1.0 / pow(cost_site.$NUM $DIVISOR, $FRICTION )"

yes, it is to avoid the DIV0, but in a more fundamentally it is there
to ensure that if a starting data point falls within a cell then that
data point will hold (just less than) infinitely more weight than the
other starting points around it.

when a starting point falls inside a cell, the cost to get to that cell
is 0, which makes the inverse distance weight infinite, which makes the
resulting output cell NULL, and not == starting point value. So your
final output map would have little holes everywhere you had a starting
point. I suppose v.to.rast + r.patch is another way to solve that.

I had not considered the case where the input cost map has real zero-
friction areas in it.

If so, it needs to be documented. I actually used costs
normalized to [0,1] and ran into trouble here.

I'd need to know a bit more about what you are doing with it to get my
head around a more general solution. (offlist is fine)

0-1 normalized data is not exotic IMO, so I'll entertain thoughts on
how to make this work.

I'm using it with a flat cost map of 1-everywhere and using it to calculate
true-distance on a horizontal plane around obstacles. So I've never
considered that zero-cost might happen beyond the starting cells before.

I realize now that wasn't a good idea, because a movement
costof "0" between two spatially distinct locations is physically
implausible.

it depends on your medium. perhaps physically implausible but why limit
ourselves to physical applications? or with hi-temp superconductors
coming online.. ..shrug

However, this behaviour still needs to be document,
perhaps advising users to use a minimum cost of "1"?

or just recommend to not to set costs of zero.

that 0.1 could be set to e.g. an order of magnitude smaller than the
cost map's minimum value instead of being hardcoded at 0.1.
In my case the distances are in thousands of meters so 0.1 was small
enough.

Line 246:

The variable $DIVISOR is never initialized (empty) but
still used in the expression:

  EXPRESSION="1.0 / pow(cost_site.$NUM $DIVISOR, $FRICTION )"

that's harmless/obscure/as designed. in bash if a variable is unset &
you then go and use it, it just comes out as a blank. which is what is
intended here. to make it more readable I've explicitly set it = "" now
in svn.

In addition, I would like to know where the second
IDW formula comes from (-r flag). Any literature
references?

"Moving Vessel ADCP Measurement of Tidal Streamfunction using Radial
Basis Functions" - Vennell & Beatson, Journal of Physical Oceanography
2006-111.

see section 2 on RBFs and thin plate splines.

@article{vennell2006moving,
  title={{Moving vessel acoustic Doppler current profiler measurement of tidal stream function using radial basis functions}},
  author={Vennell, R. and Beatson, R.},
  journal={Journal of Geophysical Research. C. Oceans},
  volume={111},
  year={2006},
  publisher={American Geophysical Union, 2000 Florida Ave., N. W. Washington DC 20009 USA,}
}

there is some nice thin plate spline interpolation method + application
in there.

When would this be the preferable formula?

to be honest I just threw it in to see how it did. Interpolation is the
art of making stuff up in a plausible way, there is no one correct method
to use for all cases. the best you can do is to choose a method that
minimizes damage.

What is appropriate or preferable will be case dependent. I suggest you
make a slope map of your output map & view it in nviz + r.profile to get
a handle on how your method is behaving. It's not always obvious by
looking at a colorized map.

If you look at a slope map or profile of the output you'll see that 1/d^2
makes for perfectly smooth transitions but also tends to go off wildly at
unconstrained boundaries. 1/d^3 does better keeping the edges under
control but the transitions are not as smooth. You could run it twice
and average those two cases with r.series I'm not sure how a single
1/d^2.5 run would compare.

To maintain confidence in the output map you might also take care to make
another distance-from-all-starting-points cost map and set anything more
than the average distance between sampling stations to be NULL. (That's
why the post_mask= option is there)

IIRC I choose to use ln() instead of log10() for -r because that made
it act in a way more dissimilar to 1/d^n so more signal to explore, and
seemed well, for lack of a better term, more natural. In the r.volcano
addon module I try a number of other decay functions which could be just
as easily dropped into it.

If this all seems a bit loose, it is, but that is the nature and
interesting part of the art of gap-filling IMO. One thing I like about
the above paper is that it looks at how you can decide how much field
sampling you have to do to before you can be confident that the
interpolation will be realistic and not just line-between-points voodoo.

Hamish