[GRASS5] [bug #859] (grass) raster data at lower resolution: no resampling...

this bug's URL: http://intevation.de/rt/webrt?serial_num=859
-------------------------------------------------------------------------

Hi again,

(sorry to be so chatty)

from the GRASS prog's manual I understand that raster data
are resampled on the fly when looking at them in lower resolution.
I have tested this and found, that the maximum value is used
instead... that's wrong in my opinion.

An example

#use g.region to select a 6x6 subregion, say, at 30m resolution
g.region
#calculate a test map
r.mapcalc test="row() + col()"
#look at it:
d.mon x0
d.rast test; d.rast.num test
r.out.ascii in=test out=-

#switch to half resolution
g.region res=60
d.mon x1
d.rast test; d.rast.num test
r.out.ascii in=test out=-

# compare.

The last years I thought that the programmer's manual is right.
Obviously not (or not any more).

Looks like another bug,

Markus

--- Headers Follow ---

From neteler@itc.it Mon Nov 26 23:50:11 2001

Return-Path: <neteler@itc.it>
Delivered-To: grass-bugs@lists.intevation.de
Received: from mail.intevation.de (aktaia.intevation.org [212.95.126.10])
  by lists.intevation.de (Postfix) with ESMTP id 292F8139D0
  for <grass-bugs@lists.intevation.de>; Mon, 26 Nov 2001 23:50:11 +0100 (CET)
Received: from camelot.itc.it (camelot.itc.it [195.223.171.5])
  by mail.intevation.de (Postfix) with ESMTP id 681161C786
  for <grass-bugs@intevation.de>; Mon, 26 Nov 2001 23:50:10 +0100 (CET)
Received: from artemide.itc.it (artemide [10.0.10.10])
  by camelot.itc.it (8.11.3/8.11.3) with ESMTP id fAQMo9d24259
  for <grass-bugs@intevation.de>; Mon, 26 Nov 2001 23:50:09 +0100 (MET)
Received: from thuille.itc.it. (thuille [10.40.0.110])
  by artemide.itc.it (8.11.3/8.11.3) with ESMTP id fAQMo7712635
  for <grass-bugs@intevation.de>; Mon, 26 Nov 2001 23:50:08 +0100 (MET)
Received: (from neteler@localhost)
  by thuille.itc.it. (8.11.6/8.11.2) id fAQMo7d30738
  for grass-bugs@intevation.de; Mon, 26 Nov 2001 23:50:07 +0100
Date: Mon, 26 Nov 2001 23:50:07 +0100
From: Markus Neteler <neteler@itc.it>
To: grass-bugs@intevation.de
Subject: raster data at lower resolution: no resampling...
Message-ID: <20011126235007.A30734@itc.it>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i

-------------------------------------------- Managed by Request Tracker

Request Tracker wrote:

(sorry to be so chatty)

from the GRASS prog's manual I understand that raster data
are resampled on the fly when looking at them in lower resolution.

Yes. Note: "resampled" does not imply "interpolated"; you can't
meaningfully interpolate category values.

I have tested this and found, that the maximum value is used
instead...

Incorrect. When rescaling 2:1, the lower-right cell of each 2x2 block
will be used.

that's wrong in my opinion.

An example

#use g.region to select a 6x6 subregion, say, at 30m resolution
g.region
#calculate a test map
r.mapcalc test="row() + col()"

Try a different example:

  r.mapcalc test="14 - row() - col()"

This "flips" the output diagonally, so the values decrease from
top-left to bottom-right. This time, the resampled version gets the
minimum value.

The last years I thought that the programmer's manual is right.
Obviously not (or not any more).

To which part of the manual are you referring?

--
Glynn Clements <glynn.clements@virgin.net>

Glynn Clements wrote:

Request Tracker wrote:

> (sorry to be so chatty)
>
> from the GRASS prog's manual I understand that raster data
> are resampled on the fly when looking at them in lower resolution.

Yes. Note: "resampled" does not imply "interpolated"; you can't
meaningfully interpolate category values.

I have just send an email to Markus explaining just that. It was too
long for
the list. If raster represents a continuous field, one has to
reinterpolate,
if it is an area or line map with categories (what GRASS originally was
designed for) the resampling was acceptable.

Helena

Thanks Helena and Glynn,

On Mon, Nov 26, 2001 at 11:56:31PM +0000, Glynn Clements wrote:

Request Tracker wrote:

> (sorry to be so chatty)
>
> from the GRASS prog's manual I understand that raster data
> are resampled on the fly when looking at them in lower resolution.

Yes. Note: "resampled" does not imply "interpolated"; you can't
meaningfully interpolate category values.

yes, I agree in general. But..

> I have tested this and found, that the maximum value is used
> instead...

Incorrect. When rescaling 2:1, the lower-right cell of each 2x2 block
will be used.

yes, that's what I also see.

that's wrong in my opinion.

I agree - using the lower-right cell is as wrong as using any other
cell falling into the 2x2. That's why I expected an average on the fly.
O.K., I did not test for FP maps.

>
> An example
>
> #use g.region to select a 6x6 subregion, say, at 30m resolution
> g.region
> #calculate a test map
> r.mapcalc test="row() + col()"

Try a different example:

  r.mapcalc test="14 - row() - col()"

This "flips" the output diagonally, so the values decrease from
top-left to bottom-right. This time, the resampled version gets the
minimum value.

Mhh, here I get the middle value (either upper right or lower left).

So, what's the right representation for this case?

> The last years I thought that the programmer's manual is right.
> Obviously not (or not any more).

To which part of the manual are you referring?

I found it on page 61 (the printed page number) of the current version.

"Users expect map layers to be resampled into the current region. This
implies that raster maps must be extended with no data for portions of the
region which do not cover the map layer, and that the raster map data be
resampled to the region resolution if the raster map resolution is
different. Users also expect new map layers to be created with exactly the
same boundaries and resolution as the current region.
"

Maybe I am on the wrong path, but I understand from above that both
FP and int maps are averaged at lower resolution. Some month ago
I did the same test with int maps, that time, if I recall correctly,
the values were averaged.

Best regards

Markus

On Mon, Nov 26, 2001 at 05:20:37PM -0600, Helena wrote:

Glynn Clements wrote:

> Request Tracker wrote:
>
> > (sorry to be so chatty)
> >
> > from the GRASS prog's manual I understand that raster data
> > are resampled on the fly when looking at them in lower resolution.
>
> Yes. Note: "resampled" does not imply "interpolated"; you can't
> meaningfully interpolate category values.
>

I have just send an email to Markus explaining just that. It was too long
for the list. If raster represents a continuous field, one has to
reinterpolate, if it is an area or line map with categories (what GRASS
originally was designed for) the resampling was acceptable.

... maybe I am too tired right now, but how shall GRASS detect if
the raster represents a continuous field (in INT) or a categories map?
Currently GRASS seems to silently do something.

Please be patient :slight_smile:

Markus

Markus Neteler wrote:

> that's wrong in my opinion.

I agree - using the lower-right cell is as wrong as using any other
cell falling into the 2x2.

Well, right or wrong, "resampling" involves "sampling" the data, i.e.
extracting "samples". In the absence of some mechanism to control the
sampling, you "get what you're given".

O.K., I did not test for FP maps.

Even for FP maps, it might not be desirable to interpolate.
Interpolation may "create" values which never occurred anywhere in the
original data.

If the original data was a sampling of a continuous "surface", then
interpolation is reasonable. But are all FP maps guaranteed to be
such? Or could they be tessellations (partitionings) where the
individal partitions have a constant FP value? If they could, then
interpolation would be the wrong thing to do here.

> > An example
> >
> > #use g.region to select a 6x6 subregion, say, at 30m resolution
> > g.region
> > #calculate a test map
> > r.mapcalc test="row() + col()"
>
> Try a different example:
>
> r.mapcalc test="14 - row() - col()"
>
> This "flips" the output diagonally, so the values decrease from
> top-left to bottom-right. This time, the resampled version gets the
> minimum value.

Mhh, here I get the middle value (either upper right or lower left).

I created an 6x6 X-Y location with a resolution of 1, then tested with
"g.region res=1" and "g.region res=2".

If you take a subregion of an existing region, the results might
depend on the alignment of the subregion within the region. Or, for a
geographically-correlated region, they might depend upon the region's
(or subregion's) geographic coordinates.

So, what's the right representation for this case?

> > The last years I thought that the programmer's manual is right.
> > Obviously not (or not any more).
>
> To which part of the manual are you referring?
I found it on page 61 (the printed page number) of the current version.

"Users expect map layers to be resampled into the current region. This
implies that raster maps must be extended with no data for portions of the
region which do not cover the map layer, and that the raster map data be
resampled to the region resolution if the raster map resolution is
different. Users also expect new map layers to be created with exactly the
same boundaries and resolution as the current region."

Maybe I am on the wrong path, but I understand from above that both
FP and int maps are averaged at lower resolution.

I don't see anything in the above paragraph to suggest that behaviour.
In particular, it doesn't say that the layers will be interpolated.
"resampled" is ambiguous; it could refer to extracting samples either
from the existing values, or from some surface passing through (or
near) the existing values.

--
Glynn Clements <glynn.clements@virgin.net>

Markus Neteler wrote:

> > > (sorry to be so chatty)
> > >
> > > from the GRASS prog's manual I understand that raster data
> > > are resampled on the fly when looking at them in lower resolution.
> >
> > Yes. Note: "resampled" does not imply "interpolated"; you can't
> > meaningfully interpolate category values.
> >
>
> I have just send an email to Markus explaining just that. It was too long
> for the list. If raster represents a continuous field, one has to
> reinterpolate, if it is an area or line map with categories (what GRASS
> originally was designed for) the resampling was acceptable.

... maybe I am too tired right now, but how shall GRASS detect if
the raster represents a continuous field (in INT) or a categories map?
Currently GRASS seems to silently do something.

I don't think that it can "detect" it. It needs to be told.

And if it's to interpolate, then how? There are lots of possible
interpolation techniques, and none of them is the "one true way".

Consider the following (one dimensional) example:

  x y
  0.0 0.0
  1.0 1.0
  2.0 1.0
  3.0 0.0

Many interpolation techniques (e.g. finding a polynomial which passes
through those four points) would produce a maximum at x=1.5, and the
maximum would be greater than 1. If the actual values are guaranteed
to lie in the [0,1] range, this is clearly wrong.

Please be patient :slight_smile:

Sure.

--
Glynn Clements <glynn.clements@virgin.net>

On Mon, Nov 26, 2001 at 06:37:26PM -0600, Helena wrote:

Markus Neteler wrote:

> > that's wrong in my opinion.
> I agree - using the lower-right cell is as wrong as using any other
> cell falling into the 2x2. That's why I expected an average on the fly.
> O.K., I did not test for FP maps.

If you do an average, for example when you have
e.g. categories 1 (say forest) and 9(corn) you get a category 5(grass)
which is not there at at all. Category and "field" maps are very different
in what you can do with them.

I agree - in this case we can get a pretty wrong result.
What about using the dominant value (mhhh, if there is non what then?).

> "Users expect map layers to be resampled into the current region. This
> implies that raster maps must be extended with no data for portions of the
> region which do not cover the map layer, and that the raster map data be
> resampled to the region resolution if the raster map resolution is
> different. Users also expect new map layers to be created with exactly the
> same boundaries and resolution as the current region.

this just says resampled - it does not say averaged or reinterpolated

O.k. - the problem is that I have no idea what "resampled" means in
this case. It seems that we have to study the related sources.I just want
to understand what's happening.

> Maybe I am on the wrong path, but I understand from above that both
> FP and int maps are averaged at lower resolution. Some month ago
> I did the same test with int maps, that time, if I recall correctly,
> the values were averaged.

resampled does not really means averaged and averaging is not a good
solution for every case of resampling (especially if you have categories
but it is not good for fields either). From what I remember the resampling
just assigns the same value which is within that cell to all the cells it
is split into, when the resolution is lower, then you probably have to
look at the source code what kind of rule

Yes, for the first case (higher resolution) the values are dublicated.
But for the lower resolution case I don't understand the method. I
try to find it in the code.

it is using. Somebody might have changed it but originally resampling did not
do any computations.

>... maybe I am too tired right now, but how shall GRASS detect if
>the raster represents a continuous field (in INT) or a categories map?

User should know it and handle it appropriately. It would be hard to cover
all possible cases (especially if you have large number of categories )
and you would have to run analysis for continuity - which can be done, but
even if you know it, it very much depends on the pnenomemon that you are
working with and on the application what you should use for resampling.

Does it make sense print a warning in this case? To make the user
aware that the resolution doesn't match?
At least when exporting data or using r.mapcalc this is quite important.

What is the correct way to do it for imagery?

>Currently GRASS seems to silently do something.

You can have g.region write out a warning that all maps with diferent
resolution will be resampled on the fly and users should check whether it
won't affect their application (and if yes, use appropriate GRASS module
to reinterpolate or resample). We were thinking (especially for 3D) to
give an option for the user to select which resampling method to use when
running g.region, but it got messy pretty quickly, as there were just too
many options and possibilities.

Helena

Markus

Helena wrote:

If raster represents a continuous field, one has to
reinterpolate,
if it is an area or line map with categories (what GRASS originally was
designed for) the resampling was acceptable.

Well, for continuous field, it depends what you are doing and what
represent your values : you may prefer to resample...

For categories, the right answer is probably to take the most
represented value...

BTW, Are the values representing an average on the pixel's surface,
or a sampled value somewhere inside the pixel (generaly the center
or the top left corner).
In Grass, where is the point representing the pixel ?

the second solution is very interesting when you resample by
an integer factor : you have just to take one line of n and in
each line one point of n, beginning first row, first column.

In the other case, a factor of 2 puts just the resampling point
at the corners of 4 pixels, unless you move the region for the
new raster a half pixel NW, in order to use the same algorithm
as above..

--
Michel WURTZ - DIG - Maison de la télédétection
               500, rue J.F. Breton
               34093 MONTPELLIER Cedex 5

Markus Neteler wrote:

O.k. - the problem is that I have no idea what "resampled" means in
this case. It seems that we have to study the related sources.I just want
to understand what's happening.

I guess you should really look at the source and then write what you have found
into the prog manual and users manual (probably for g.region entry).
Does anybody know what it is doing already, to save Markus some time?

Does it make sense print a warning in this case? To make the user
aware that the resolution doesn't match?
At least when exporting data or using r.mapcalc this is quite important.

I agree with that, but I am not sure how much work it would be.
It would certainly be beneficial if r.mapcalc let user know
e.g. when computing C=A+B+D it could write : Warning, resampling map A to match
the resolution.
When exporting, it would be useful too. In r.flow, where it really matters,
we just say that "your resolution does not match the current resolution" and exit
so the user has to fisrt change his region to that of the elevation input or
resample/reinterpolate
his elevation. So it is really application and data dependent.

Helena

On Tue, Nov 27, 2001 at 10:31:04AM -0500, Helena Mitasova wrote:

Markus Neteler wrote:

> O.k. - the problem is that I have no idea what "resampled" means in
> this case. It seems that we have to study the related sources.I just want
> to understand what's happening.

I guess you should really look at the source and then write what you have found
into the prog manual and users manual (probably for g.region entry).
Does anybody know what it is doing already, to save Markus some time?

Well, I need assistance since I don't understand the code :slight_smile:
It should be here:
src/libes/gis/get_row.c
[...]
* Step 2: Convert the data into a CPU readable format, and subsequently
* resample the data. the data is stored in a second intermediate
* buffer (the type of the data in this buffer is Y)....
[...]

Maybe someone professional looks with doxygen or whatever at the
dependencies. Is it done in XDR or somewhere else? And what is done?

> Does it make sense print a warning in this case? To make the user
> aware that the resolution doesn't match?
> At least when exporting data or using r.mapcalc this is quite important.

I agree with that, but I am not sure how much work it would be.
It would certainly be beneficial if r.mapcalc let user know
e.g. when computing C=A+B+D it could write : Warning, resampling map A to match
the resolution.
When exporting, it would be useful too. In r.flow, where it really matters,
we just say that "your resolution does not match the current resolution" and exit
so the user has to fisrt change his region to that of the elevation input or
resample/reinterpolate
his elevation. So it is really application and data dependent.

Helena

Thanks for your comments. I fear we have to dig into the code to
find out the current implementation. From that we may decide if and
where to implement user notifications.

Markus

Markus Neteler wrote:

> > O.k. - the problem is that I have no idea what "resampled" means in
> > this case. It seems that we have to study the related sources.I just want
> > to understand what's happening.
>
> I guess you should really look at the source and then write what you have found
> into the prog manual and users manual (probably for g.region entry).
> Does anybody know what it is doing already, to save Markus some time?

Well, I need assistance since I don't understand the code :slight_smile:
It should be here:
src/libes/gis/get_row.c
[...]
* Step 2: Convert the data into a CPU readable format, and subsequently
* resample the data. the data is stored in a second intermediate
* buffer (the type of the data in this buffer is Y)....
[...]

Maybe someone professional looks with doxygen or whatever at the
dependencies. Is it done in XDR or somewhere else? And what is done?

get_map_row_nomask() calls transfer_to_cell_XX(), either directly or
via one of the other transfer_to_cell_* functions.

transfer_to_cell_XX() calls one of cell_values_{int,float,double},
passing the mapping table FCB.col_map. This is generated by
G__create_window_mapping() in window_map.c.

--
Glynn Clements <glynn.clements@virgin.net>

As far as I can tell, in a nutshell:

The code attempts to map the cell row/col value that is most closely
aligned with the center of the region row/col. It builds an explicit
column mapping, and leaves the decision to which rows to use for
each row fetch operation.

It's difficult to follow a lot of that code since it uses globals
and additionally uses #define's to rename various parts of a global
data structure. Relevant files include: G.h, opencell.c, get_row.c,
window_map.c, and probably a few others (depending on what you think
is relevant).

All of the low-level byte encoding/compression at the file level is
not relevant to the mapping of cell values to match the region.

--
Eric G. Miller <egm2@jps.net>

On Tue, Nov 27, 2001 at 06:56:57PM -0800, Eric G. Miller wrote:

As far as I can tell, in a nutshell:

The code attempts to map the cell row/col value that is most closely
aligned with the center of the region row/col. It builds an explicit
column mapping, and leaves the decision to which rows to use for
each row fetch operation.

It's difficult to follow a lot of that code since it uses globals
and additionally uses #define's to rename various parts of a global
data structure. Relevant files include: G.h, opencell.c, get_row.c,
window_map.c, and probably a few others (depending on what you think
is relevant).

All of the low-level byte encoding/compression at the file level is
not relevant to the mapping of cell values to match the region.

--
Eric G. Miller <egm2@jps.net>

Thanks Eric and Glynn for looking at that.

A note for those not knowing Doxygen (www.doxygen.org):
I have added a Doxygen control file into the main directory,
to get it
cvs up Doxyfile

When running "doxygen" on GRASS or just on the src/libes directory
a nice bunch of HTML pages comes out with graphs for the dependencies.
This may be a way to get an overview of the code structure.

Shall I put these HTML pages online?

Eric, did you find somewhere the resampling rules?

Markus

On Wed, Nov 28, 2001 at 09:39:17AM +0100, Markus Neteler wrote:

A note for those not knowing Doxygen (www.doxygen.org):
I have added a Doxygen control file into the main directory,
to get it
cvs up Doxyfile

When running "doxygen" on GRASS or just on the src/libes directory
a nice bunch of HTML pages comes out with graphs for the dependencies.

Note that doxygen uses graphviz for creation of the graphs.
And graphviz _not_ Free Software, though the claim to be "open
source".

Though I recommend to _not_ use it.
  Bernhard

--
Professional Service around Free Software (intevation.net)
The FreeGIS Project (freegis.org)
Association for a Free Informational Infrastructure (ffii.org)
FSF Europe (fsfeurope.org)

"Eric G. Miller" wrote:

As far as I can tell, in a nutshell:

The code attempts to map the cell row/col value that is most closely
aligned with the center of the region row/col. It builds an explicit
column mapping, and leaves the decision to which rows to use for
each row fetch operation.

Markus, if I understand correctly what Eric writes here it appears
that when resampling to lower resolution, GRASS uses the value which
is in a high resolution cell located in the center (or close to the
center ) of the low resolution cell.
I think that this is the best thing it can do both for field and
category data when it
does not know what type of phenomenon the data represent.
I don't think that we have a problem here although some warning about
this behavior
would be appropriate and it definitely needs to be documented.

Helena

It's difficult to follow a lot of that code since it uses globals
and additionally uses #define's to rename various parts of a global
data structure. Relevant files include: G.h, opencell.c, get_row.c,
window_map.c, and probably a few others (depending on what you think
is relevant).

All of the low-level byte encoding/compression at the file level is
not relevant to the mapping of cell values to match the region.

--
Eric G. Miller <egm2@jps.net>
_______________________________________________
grass5 mailing list
grass5@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass5