[GRASSLIST:1574] Re: interpolate nominal values

I want to interpolate nominal data (soil types as point information).
Which possibilities do I have, what's the best method? -Any
experiences, suggestions?

They're hard to interpolate. Nearest neighbour (forming thiessen polygon) seems
a useful action; indicator simulation another one. Most interpolation methods
are based on weighted averages, and are therefore only suitable to
interval/ratio
data. Don't use them.
--
Edzer

At 16:29 08.03.01 +0100, Edzer J. Pebesma wrote:

> I want to interpolate nominal data (soil types as point information).
> Which possibilities do I have, what's the best method? -Any
> experiences, suggestions?

They're hard to interpolate. Nearest neighbour (forming thiessen polygon) seems
a useful action; indicator simulation another one. Most interpolation methods
are based on weighted averages, and are therefore only suitable to
interval/ratio

but with the Thiessen Polygons you run into the trouble of producing 'plateaus' and 'canyons' between observed points... (which are hardly to explain in the 'real' world) Thiessen seems to be a starting point of understanding spatial interpolation, but it's not the end of the story. According to my experience you can use other methods (Inverse Distance works good for surprisingly many applications), but you have to understand the mechanism of the interpolation model. Kriging is difficult to handle, due to the variograms one has to estimate before running an interpolation. For some application regularised splines with tension outperforms Kriging and IDW quite well... but it really depends on the situation. (I only have experience in precipitation interpolation, but not with soils...)

Bernhard
--
CDE, Centre for Development and Environment, University of Berne

..................................
the.sign
Bernhard Sturm
WebDesign & Software
Steinauweg 22
CH - 3007 Bern
fon: ++41 (0)31 371 83 21
fax: ++41 (0)31 371 83 21
mobil: ++41 (0)78 775 99 30
www.the-sign.ch

On Thu, 8 Mar 2001, Bernhard Sturm wrote:

but with the Thiessen Polygons you run into the trouble of producing
'plateaus' and 'canyons' between observed points... (which are hardly to
explain in the 'real' world) Thiessen seems to be a starting point of
understanding spatial interpolation, but it's not the end of the story.

  Hold on folks! Before we get too far down this side path, pay attention,
please, to the data type: Nominal.

  I know that you all know this, but I'll provide a quick review from basic
statistics on data types.

  Nominal: Qualitative rather than quantitative values, and therefore
incapable of any arithmetic operations (including interpolation to a 3D
surface. Names of soil types cannot be manipulated as numbers.

  Ordinal: Quantifies by ordering on a linear scale, but not by
magnitude. An example might be soil color, ranging from the blue-grey gleyed
hydric soils through yellows, reds and browns to black, organic soils. Each
color can be assigned a number along that scale, but the numbers can be used
only to determine which ones are greater than others.

  Interval: Quantifies by defining relative position on an interval
scale, but without reference to a fixed point. Temperature measurements are
a good example. 20 C is twice the number of 10 C, but that doesn't mean it's
twice as hot.

  Ratio: Measurements on a scale with a true zero and equal
intervals. Precipitation is a good example. There can be 0 cm of rain, and
10 cm of rain is twice the amount of 5 cm of rain.

  The only way you can interpolate nominal data to a surface is if your
units of measurement is furlongs per fortnight. :slight_smile:

Rich

Dr. Richard B. Shepard, President

                       Applied Ecosystem Services, Inc. (TM)
              Making environmentally-responsible mining happen. (SM)
                       --------------------------------
            2404 SW 22nd Street | Troutdale, OR 97060-1247 | U.S.A.
+ 1 503-667-4517 (voice) | + 1 503-667-8863 (fax) | rshepard@appl-ecosys.com

It's getting OT, but I think it's worth the discussion:

  Hold on folks! Before we get too far down this side path, pay attention,
please, to the data type: Nominal.

you are right. I was completely absorbed by my precipitation data.... but then again you could in fact use the thiessen polygons to interpolate nominal data, as there are no weights on the interpolated values. you just draw boundaries between type1 to type2... (the border line between the two types is only a function of the distance between type1 and type2 sample point, and will not be weighted by any calculated mean value, hence it must be possible to draw such a map...)
And if you take it one step further. There is one way you could treat your nominal data as a ratio data type: you assign to soil type 1 the number 1; soil type 2 is 2, and so on. Then you treat these numbers as ratios and interpolate them to a surface (IDW for instance), now you will get a surface with a lot of fractions between 1...2, if you re-class your surface in such a way that you say that everything <1.5 is assigned to soil type 1 and everything >=1.5 is soil type 2 then you could extract your isolines, showing you the boundary of the two soil types... I am aware that this is by no means a proper application, but I could imagine that one would then be able to draw a soil type map (it all depends on the quantity of your sample points, if you have observed enough points you must end up with quite a reliable map...).
Does anybody know if this method is completely wrong, or could it be used to some extent?
I believe I will have to re-read the first chapters of "Applied Geostatistics"...
*smirk*

cheers and good night
Bernhard
--
CDE, Centre for Development and Environment, University of Berne

..................................
the.sign
Bernhard Sturm
WebDesign & Software
Steinauweg 22
CH - 3007 Bern
fon: ++41 (0)31 371 83 21
fax: ++41 (0)31 371 83 21
mobil: ++41 (0)78 775 99 30
www.the-sign.ch

On Thu, 8 Mar 2001, Bernhard Sturm wrote:

you are right. I was completely absorbed by my precipitation data.... but
then again you could in fact use the thiessen polygons to interpolate
nominal data, as there are no weights on the interpolated values. you just
draw boundaries between type1 to type2... (the border line between the two
types is only a function of the distance between type1 and type2 sample
point, and will not be weighted by any calculated mean value, hence it must
be possible to draw such a map...)

Bernhard,

  While this might be computationally feasible, I suggest that it is
meaningless in the real world.

  Soil maps (and soil taxonomy in general) are best represented either by
vector polygons (the way they're drawn on paper maps) or raster regions that
more accurately reflect the transition zone between one soil type and the
adjacent one.

  Now, if you had ratio data representing, for example, samples of plant
species' densities and height (just to pick some arbitrary parameters for
the sake of this discussion) you could interpolate those data and drape the
soils layer over them to see if there were meaningful patters in the
relationship.

  Perhaps my point will be better understood if I present an absurd example.
(Don't try this at home! It can be done only by experts on a closed course.)
Suppose you had point (sites) data of fast-food restaurants (i.e., junk food
vendors) from a typical American medium size city. The nominal data
categories include "McDonalds", "Burger King", "Taco John", etc. Now,
regardless of whether you use IDW, Veronoi diagrams/Thiessen polygons, or
Krieging, you interpolate a 3D surface of these points. How do you interpret
the results? What will it tell you (or me, for that matter)?

Rich

Dr. Richard B. Shepard, President

                       Applied Ecosystem Services, Inc. (TM)
              Making environmentally-responsible mining happen. (SM)
                       --------------------------------
            2404 SW 22nd Street | Troutdale, OR 97060-1247 | U.S.A.
+ 1 503-667-4517 (voice) | + 1 503-667-8863 (fax) | rshepard@appl-ecosys.com

Rich Shepard wrote:

  While this might be computationally feasible, I suggest that it is
meaningless in the real world.

Archaeologists use this sort of info all the time, although it could be argued
that we don't live in the real world either ;>)

Suppose you had point (sites) data of fast-food restaurants (i.e., junk food
vendors) from a typical American medium size city. The nominal data
categories include "McDonalds", "Burger King", "Taco John", etc. Now,
regardless of whether you use IDW, Veronoi diagrams/Thiessen polygons, or
Krieging, you interpolate a 3D surface of these points. How do you interpret
the results? What will it tell you (or me, for that matter)?

Isn't that exactly the kind of info that those fast-food mega-corps used in the
first place to locate their gut grenade establishments? Population dynamics,
transportation nodes, economic bases, etc. I've been waiting for an opportunity
to use point data of that sort to relate population to water powered mills from
maps. Taking a county in the 19th century from a typical map which shows mill
symbols, one assumes (I know the acronym) all the mills are shown. Raw material
has to get to market by either water or road, so one is dealing with either a
custom or merchant mill. Using densities of mills, one can theoretically work
out population densities and correlate them with standing historic structures.
Leads to holes in the data with very much too large service areas, hence we go
look for more mills.

17th century potteries are a similar situation. One finds pottery, but no kiln.
Using the finds locations, one should be able to discern movement and the
central point of highest density which should be the kiln.

Lyle

On Thu, 8 Mar 2001, Lyle E. Browning wrote:

Archaeologists use this sort of info all the time, although it could be argued
that we don't live in the real world either ;>)

Lyle,

  I won't touch that one, thank you!

Isn't that exactly the kind of info that those fast-food mega-corps used in the
first place to locate their gut grenade establishments? Population dynamics,
transportation nodes, economic bases, etc.

  This is a flour from a different mill. Performing a nearest-neighbor
analysis by fitting Veronoi polygons around each fast-food outlet will show
all the points that are closer to one burger than another. However, while
this is interpolation (finding the midline between all adjacent pairs of
points) you could not proceed to build an elevation model from it and have
the resulting surface mean anything.

  With the soils data that initiated this thread, the point represents an
area. Veronoi polygons don't mean anything here, and producing a 3D surface
map is equally meaningless. And that's where the discussion went: how do you
interpolate a surface from the point data?

  Perhaps you build 3D continuous surfaces from pottery shards or flour mill
locations, and it means something within the field of archaeology. But, as
a quantitative ecologist, I'd map the soils data and all I could say is
that's the soil type at this point. Period.

  One of my undergraduate chemistry profs (from whom I took inorganic and
physical chemistry) told me he couldn't understand how anyone could be a
field biologist. "After all," he said, "everything is a variable. Nothing is
a constant. How can you make sense of it?" "Statistics," I replied. "And
mumbling with big words when that doesn't help." :slight_smile:

  Actually, I think this thread is quite appropriate for the GRASSlist.
Knowing the mechanics of running the software does not qualify the operator
as a spatial analyst. As we are all well aware, teaching someone how to
manipulate a word processor does not make him a good writer. I like this
discussion.

Rich

Dr. Richard B. Shepard, President

                       Applied Ecosystem Services, Inc. (TM)
              Making environmentally-responsible mining happen. (SM)
                       --------------------------------
            2404 SW 22nd Street | Troutdale, OR 97060-1247 | U.S.A.
+ 1 503-667-4517 (voice) | + 1 503-667-8863 (fax) | rshepard@appl-ecosys.com

Rich Shepard wrote:

  This is a flour from a different mill. Performing a nearest-neighbor
analysis by fitting Veronoi polygons around each fast-food outlet will show
all the points that are closer to one burger than another. However, while
this is interpolation (finding the midline between all adjacent pairs of
points) you could not proceed to build an elevation model from it and have
the resulting surface mean anything.

Not having done any of that, I couldn't begin to argue otherwise. But, ;>), in
looking at the patterns generated in relation to the surrounding areas, we'd end up
with a good idea of where the prehistoric McDonald's were built and extrapolate to
locate others. We might think of them as temples. Areas for seating of the
worshippers, priestly areas wherein burnt offerings are prepared, all the trappings
of ritual behavior previously observed. Personally, I cannot imagine an
archaeologist 200 years down the road digging one of them, but then again, they may
by then be of such rare status that they will achieve importance beyond their mere
presence.

  With the soils data that initiated this thread, the point represents an
area. Veronoi polygons don't mean anything here, and producing a 3D surface
map is equally meaningless. And that's where the discussion went: how do you
interpolate a surface from the point data?

Not having been there, and not having done that, I can't argue again. My own
experience is that archaeology is still in relative infancy and just the exercise of
obtaining points, putting them through their paces and identifying data gaps is
extremely useful. Of course, after the gaps are filled, it still remains to be seen
what it all means. I'm still in the data acquisition mode rather than the synthesis
mode.

  Perhaps you build 3D continuous surfaces from pottery shards or flour mill
locations, and it means something within the field of archaeology. But, as
a quantitative ecologist, I'd map the soils data and all I could say is
that's the soil type at this point. Period.

We'd look at soils and examine site locations. Certain soil types attract
prehistoric and sometimes historic occupants. The inverse is also true. We have
looked at very good productive soils and found that the prehistoric types who were
incipient horticulturists avoided them and camped on the adjacent types. That's a
data gap we have and need to fill. We're still far too much into the individual site
and what it means relative to other excavated sites versus using a GIS to synthesize
the spatial aspects of what we already know and see where it leads.

We typically look at pottery distributions from a site and see where activities took
place and who did what. Assuming that the info is there to work with and it's not
all uniform, the distribution of different pottery types can tell the socio/economic
status of the inhabitants, big house versus worker activity locations, craft
locations, etc.

The Park Service examined the Little Big Horn battlefield, located and mapped
bullets and from the known info on what caliber guns the 7th Cavalry had versus what
else was found, it was shown that the Sioux had better guns and all sorts of
interesting interpretation was possible from those points. Not elevation models
there, granted, but at basis, the first level analysis was fascinating. That's
primarily where we're going at a snail's pace in my profession.

"everything is a variable. Nothing is
a constant. How can you make sense of it?" "Statistics,"

Lies, damned lies and statistics. Now add the never ending problem that in
archaeology you're not dealing with the full universe of what was on the site, but
only that which survives. And then you attempt to make some form of reality out of
it. Constant reiterations are required. New info comes in, out goes the old model
after much argument.

Knowing the mechanics of running the software does not qualify the operator
as a spatial analyst. As we are all well aware, teaching someone how to
manipulate a word processor does not make him a good writer. I like this
discussion.

Proof's in the pudding, as always.

Rich Shepard wrote:

  Soil maps (and soil taxonomy in general) are best represented either by
vector polygons (the way they're drawn on paper maps) or raster regions that
more accurately reflect the transition zone between one soil type and the
adjacent one.

I disagree here. It depends on how the data were obtained: were they
mapped in the field as polygons (in which case vector seems most appropriate)
or were they interpolated (estimated) from point samples (e.g. in the case
of chemical composition)? In the latter case, raster representation seems
the obvious way to go -- most interpolation programs output raster maps.

  Now, if you had ratio data representing, for example, samples of plant
species' densities and height (just to pick some arbitrary parameters for
the sake of this discussion) you could interpolate those data and drape the
soils layer over them to see if there were meaningful patters in the
relationship.

  Perhaps my point will be better understood if I present an absurd example.
(Don't try this at home! It can be done only by experts on a closed course.)
Suppose you had point (sites) data of fast-food restaurants (i.e., junk food
vendors) from a typical American medium size city. The nominal data
categories include "McDonalds", "Burger King", "Taco John", etc. Now,

Like the confusion about "nominal" vs. "interval/ratio" data, you're mixing
two spatial data types here -- geostatistical data vs. point pattern data
(using Cressie's "Statistics for spatial data" terminology). Geostatistical
data are obtained (measured) at a limited number of locations, but takes on
a value (could be measured) at any location in the region of interest. Point
pattern data are patterns of occurences of specific items (cities, fast food
stores, trees,...) in a region. The latter can be "transformed" to the former
by using the concept of local density.

One typical problem for geostatistical data is interpolation. E.g., soil type
is measured at a limited number of sites, and now we want to make a soil map
from it. For nominal data, the way to go is define an indicator variable
for each soil type, give it a 1 if the soil type is present and else a 0.
Interpolated values may be interpreted as _estimated_ probabilities of
occurence for that soil type. (Note that they are not real probabilities).

For further clues, look into the geostats literature for indicator kriging,
indicator cokriging and indicator simulation. There's a lot written about this.
--
Edzer

On Fri, 9 Mar 2001, Edzer J. Pebesma wrote:

I disagree here. It depends on how the data were obtained: were they
mapped in the field as polygons (in which case vector seems most appropriate)
or were they interpolated (estimated) from point samples (e.g. in the case
of chemical composition)? In the latter case, raster representation seems
the obvious way to go -- most interpolation programs output raster maps.

Edzer,

  Actually, I think that we agree. I've no idea how soil taxonomists do
their field mapping in any other country, but in this one they take samples
and extrapolate from those. The presumed boundaries (often inferred from
vegetation, topography and other indicators) are drawn as vectors on
georeferenced aerial photographs (the forerunners of today's digital
orthophoto quad, or DOQ). There is no other way the old Soil Conservation
Service (now with the snappy new name of Natural Resource Conservation
Service) could map soils for agricultural and other purposes
county-by-county in all the states. It is a very coarse estimate, and the
descriptive text notes the assumed percentages of different types within a
map unit.

Like the confusion about "nominal" vs. "interval/ratio" data, you're mixing
two spatial data types here -- geostatistical data vs. point pattern data
(using Cressie's "Statistics for spatial data" terminology). Geostatistical
data are obtained (measured) at a limited number of locations, but takes on
a value (could be measured) at any location in the region of interest. Point
pattern data are patterns of occurences of specific items (cities, fast food
stores, trees,...) in a region. The latter can be "transformed" to the former
by using the concept of local density.

  Now here we disagree. Perhaps in some limited cases this may work, but not
in many real world sitiations. Consider this: you have recorded the
locations of bunny rabbits (using a GPS receiver) throughout a large area
(say, for example, 200 hectares). Can you use spatial statistics to
determine the home range of each bunny? I suggest that you cannot validly do
this.

One typical problem for geostatistical data is interpolation. E.g., soil type
is measured at a limited number of sites, and now we want to make a soil map
from it. For nominal data, the way to go is define an indicator variable
for each soil type, give it a 1 if the soil type is present and else a 0.
Interpolated values may be interpreted as _estimated_ probabilities of
occurence for that soil type. (Note that they are not real probabilities).

  If you take into account factors such as topography and vegetation, you
can estimate the bounds of a specific soil type by kreiging or drawing
Veronoi polygons. But, ... that's not the implications of the original
question nor the first batch of responses.

  The assumption of the initial responders was that a 3D surface could be
generated from the point data. My response was that you cannot interpolate
nominal data (whether '1'/'0' or a textual name) into a meaningful surface.
I still stick with this opinion.

Rich

Dr. Richard B. Shepard, President

                       Applied Ecosystem Services, Inc. (TM)
              Making environmentally-responsible mining happen. (SM)
                       --------------------------------
            2404 SW 22nd Street | Troutdale, OR 97060-1247 | U.S.A.
+ 1 503-667-4517 (voice) | + 1 503-667-8863 (fax) | rshepard@appl-ecosys.com

Hallo everyone,

Chris Duke of South Hampton kindly told me of the command above which
allows you to import RTF version2, level5 files form the Ordnance survey
of the British Isles. This is
available for Grass 5.0. Has anyone here modified it so that it can be
use for GRASS 4.3? I need to use Grass 4.3 so that I can in turn use Mark
Lake's programm for Cumulative Viewshed analysis.

Any one?

best wishes,
Gail.

Gail Higginbottom,
Centre for European Studies and General Linguistics,
University of Adelaide,
Adelaide, South Australia,
Australia. 5005.
and
Department of Physics and Mathematical Physics

ghigginb@physics.adelaide.edu.au

http://www.physics.adelaide.edu.au/~ghigginb/

(08) 8303:6440