[GRASS-dev] Re: [GRASS-user] problems using r.proj with large data set

[CC'd to grass-dev]

Maciej Sieczka wrote:

> Is this a problem with large files that I will just have to work around or
> is it something to do with my setup?

Propably the same, very old issue:
http://intevation.de/rt/webrt?serial_num=241

I looked into this a while ago. Unfortunately, you can't use rowio (or
a home-grown equivalent), as libgis doesn't allow the projection to be
changed while maps are open. So, you have to read the entire input
map, close it, change the projection, then write the output map.

To get around the memory issues, you would first need to copy the
relevant portion of the input map to a temporary file, then use a
cache backed by that file.

The segment library would do the job, although it could add a
significant performance overhead.

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements wrote:

Maciej Sieczka wrote:

Is this a problem with large files that I will just have to work around or
is it something to do with my setup?

Propably the same, very old issue:
http://intevation.de/rt/webrt?serial_num=241

I looked into this a while ago. Unfortunately, you can't use rowio (or
a home-grown equivalent), as libgis doesn't allow the projection to be
changed while maps are open. So, you have to read the entire input
map, close it, change the projection, then write the output map.

To get around the memory issues, you would first need to copy the
relevant portion of the input map to a temporary file, then use a
cache backed by that file.

The segment library would do the job, although it could add a
significant performance overhead.

Are you sure it would help for all possible projections? The way r.proj works -- reverse-projecting cell-by-cell into the input map and looking up the value of the nearest neighbor cell (default method) -- it seems difficult to always have the right "relevant part" of the input map in memory. Between two cylindrical projections perhaps, but for other types wouldn't there be a lot of swapping in and out of memory of the "relevant parts". Hmm... I guess this is what you mean with performance overhead.

Morten

Morten Hulden wrote:

>>> Is this a problem with large files that I will just have to work around or
>>> is it something to do with my setup?
>> Propably the same, very old issue:
>> http://intevation.de/rt/webrt?serial_num=241
>
> I looked into this a while ago. Unfortunately, you can't use rowio (or
> a home-grown equivalent), as libgis doesn't allow the projection to be
> changed while maps are open. So, you have to read the entire input
> map, close it, change the projection, then write the output map.
>
> To get around the memory issues, you would first need to copy the
> relevant portion of the input map to a temporary file, then use a
> cache backed by that file.
>
> The segment library would do the job, although it could add a
> significant performance overhead.

Are you sure it would help for all possible projections? The way r.proj
works -- reverse-projecting cell-by-cell into the input map and looking
up the value of the nearest neighbor cell (default method) -- it seems
difficult to always have the right "relevant part" of the input map in
memory. Between two cylindrical projections perhaps, but for other types
wouldn't there be a lot of swapping in and out of memory of the
"relevant parts".

A tile-based cache will work for all practical cases; you just need to
ensure that the cache is large enough. In practice, this means that
you need enough space for all of the tiles corresponding to a single
output row.

Even in the worst (practical) cases, you will only need memory
proportional to the size (width or height) of the source map, rather
than its area.

Hmm... I guess this is what you mean with performance
overhead.

No. I was referring specifically to the design of the segment library,
in particular:

1. Tiles can be any size. The size isn't fixed at compile time, and
isn't constrained to powers of two, so you have to perform two
division/remainder operations to convert an (x,y) pair to a segment
number and offset. If you use powers of two, you only need to use
shift/mask operations, which are faster; using a fixed power of two
would be faster still.

2. Each cell lookup requires a function call. It may be more efficient
to implement the "fast path" (where the requested cell is already in
memory) as a macro.

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements wrote:

A tile-based cache will work for all practical cases; you just need to
ensure that the cache is large enough. In practice, this means that
you need enough space for all of the tiles corresponding to a single
output row.

Even in the worst (practical) cases, you will only need memory
proportional to the size (width or height) of the source map, rather
than its area.

Hmm... I guess this is what you mean with performance overhead.

No. I was referring specifically to the design of the segment library,
in particular:

1. Tiles can be any size. The size isn't fixed at compile time, and
isn't constrained to powers of two, so you have to perform two
division/remainder operations to convert an (x,y) pair to a segment
number and offset. If you use powers of two, you only need to use
shift/mask operations, which are faster; using a fixed power of two
would be faster still.

2. Each cell lookup requires a function call. It may be more efficient
to implement the "fast path" (where the requested cell is already in
memory) as a macro.

What about splitting the output area into temporary smaller chunks and using r.proj's built-in cropping of the input map, then r.patching the chunks into a larger output map?

Of course, this can be done manually even now, but since most of the code is already in place, it could be a relatively easy solution.

Gerald Nelson wrote:
Is there any possibility that r.proj could become a front end for gdalwarp; essentially doing the steps I did manually behind the scenes?

I guess one reason people still want to use r.proj is because the output maps will be aligned and have exactly the same resolution as the current grid, and thus be available for further analysis together with other maps of the location with the tools that require all maps to have same grid.

The scopes of gdalwarp and r.proj are slightly different.

Morten

Morten Hulden wrote:

> A tile-based cache will work for all practical cases; you just need to
> ensure that the cache is large enough. In practice, this means that
> you need enough space for all of the tiles corresponding to a single
> output row.
>
> Even in the worst (practical) cases, you will only need memory
> proportional to the size (width or height) of the source map, rather
> than its area.
>
>> Hmm... I guess this is what you mean with performance
>> overhead.
>
> No. I was referring specifically to the design of the segment library,
> in particular:
>
> 1. Tiles can be any size. The size isn't fixed at compile time, and
> isn't constrained to powers of two, so you have to perform two
> division/remainder operations to convert an (x,y) pair to a segment
> number and offset. If you use powers of two, you only need to use
> shift/mask operations, which are faster; using a fixed power of two
> would be faster still.
>
> 2. Each cell lookup requires a function call. It may be more efficient
> to implement the "fast path" (where the requested cell is already in
> memory) as a macro.

What about splitting the output area into temporary smaller chunks and
using r.proj's built-in cropping of the input map, then r.patching the
chunks into a larger output map?

I would expect that to be less efficient than importing the input map
into a tile cache, as you will end up reading each portion of the
input map multiple times.

You can minimise that problem to some extent by segmenting the output
map vertically (i.e. into horizontal bands), but unless the projection
is [pseudo-]cylindrical, you will still have some overlap.

That still leaves the r.patch overhead.

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements wrote:

> > Is this a problem with large files that I will just have to work around or
> > is it something to do with my setup?
>
> Propably the same, very old issue:
> http://intevation.de/rt/webrt?serial_num=241

I looked into this a while ago. Unfortunately, you can't use rowio (or
a home-grown equivalent), as libgis doesn't allow the projection to be
changed while maps are open. So, you have to read the entire input
map, close it, change the projection, then write the output map.

To get around the memory issues, you would first need to copy the
relevant portion of the input map to a temporary file, then use a
cache backed by that file.

I've added to CVS a modified version of r.proj named r.proj.seg. This
behaves identically to r.proj, but uses a tile cache rather than
reading the entire map into memory.

Currently, each tile is 64 * 64 cells, and the size of the cache is
fixed at 2 * (nx + ny) tiles (where nx * ny is the size of the source
region in tiles), which I would expect to suffice for any realistic
transformation.

[It will definitely suffice for any affine or "near"-affine
transformation. It would be useful to know what level of distortion
needs to be accommodated in practice.]

Cache replacement is random (i.e. when a new tile is loaded, the tile
which it replaces is selected at random). In practice, random
replacement tends to work almost as well as an "optimal" algorithm,
but without the overhead of maintaining usage statistics, as well as
not having any degenerate cases.

Suggestions for improvements and/or performance statistics for
"real-world" usage are welcome.

--
Glynn Clements <glynn@gclements.plus.com>

On Sunday 17 December 2006 17:14, Glynn Clements wrote:

Glynn Clements wrote:
> > > Is this a problem with large files that I will just have to work
> > > around or is it something to do with my setup?
> >
> > Propably the same, very old issue:
> > http://intevation.de/rt/webrt?serial_num=241
>
> I looked into this a while ago. Unfortunately, you can't use rowio (or
> a home-grown equivalent), as libgis doesn't allow the projection to be
> changed while maps are open. So, you have to read the entire input
> map, close it, change the projection, then write the output map.
>
> To get around the memory issues, you would first need to copy the
> relevant portion of the input map to a temporary file, then use a
> cache backed by that file.

I've added to CVS a modified version of r.proj named r.proj.seg. This
behaves identically to r.proj, but uses a tile cache rather than
reading the entire map into memory.

Currently, each tile is 64 * 64 cells, and the size of the cache is
fixed at 2 * (nx + ny) tiles (where nx * ny is the size of the source
region in tiles), which I would expect to suffice for any realistic
transformation.

[It will definitely suffice for any affine or "near"-affine
transformation. It would be useful to know what level of distortion
needs to be accommodated in practice.]

Cache replacement is random (i.e. when a new tile is loaded, the tile
which it replaces is selected at random). In practice, random
replacement tends to work almost as well as an "optimal" algorithm,
but without the overhead of maintaining usage statistics, as well as
not having any degenerate cases.

Suggestions for improvements and/or performance statistics for
"real-world" usage are welcome.

Interesting work Gylnn. Not having tried your module yet, I am wondering: does
the segmenting result in any sort of artifacts in the output raster? i.e. I
have seen artifaces related to segmenting with the v.surf.rst module before,
when optimal settings are not used.

i'll give the new r.proj.seg a try later today.

cheers,

Dylan

--
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

Dylan Beaudette wrote:

Interesting work Gylnn. Not having tried your module yet, I am wondering: does
the segmenting result in any sort of artifacts in the output raster? i.e. I
have seen artifaces related to segmenting with the v.surf.rst module before,
when optimal settings are not used.

No. The projection algorithm is unchanged; the only difference is in
the mechanism used to retrieve the contents of a specific source cell.

The normal r.proj essentially does:

  FCELL val = ibuffer[row][col];

OTOH, with r.proj.seg the code is more like:

  typedef FCELL block[BDIM][BDIM];

  #define HI(i) ((i) / BDIM)
  #define LO(i) ((i) % BDIM)

  ...

  block *b = cache->blocks[HI(row)][HI(col)];

  if (!b) /* block not in memory */
    b = cache->blocks[HI(row)][HI(col)] = read_block(cache, row, col);

  FCELL val = b[LO(row)][LO(col)];

The above isn't actual code; for a start, BDIM is a power of 2, and
the HI/LO macros use shift/mask instead of division/remainder. Also,
the details are hidden inside macros to simplify the application code,
which actually looks like:

  FCELL *p = CPTR(cache,row,col);

The main point to note is that interpolation isn't affected by segment
boundaries, which I suspect is probably what was happening with
v.surf.rst.

Assuming that it works out okay, I'll probably end up merging
r.proj.seg into r.proj, with the macro handling the details of whether
to read from a tile cache (large maps) or from a single block of
memory (small maps).

Actually, there is one minor difference in the observable behaviour of
the two programs. r.proj.seg doesn't include the half-baked
null-handling hack used by r.proj for the bilinear and bicubic
methods; it just propagates the nulls.

Consequently, r.proj.seg will tend to enlarge null regions slightly
when using bilinear or bicubic interpolation, whereas r.proj tends to
fill them in (with values whose accuracy is rather dubious). If you
want null regions filled, run r.fillnulls first.

BTW, in the course of writing r.proj.seg, I discovered that
method=bilinear was seriously broken (it had the row/column swapped,
essentially flipping each source cell along the main diagonal). The
fact that no-one noticed until now suggests that either no-one used
method=bilinear, or they didn't pay much attention to the results.

--
Glynn Clements <glynn@gclements.plus.com>

On Monday 18 December 2006 13:07, Glynn Clements wrote:

Dylan Beaudette wrote:
> Interesting work Gylnn. Not having tried your module yet, I am wondering:
> does the segmenting result in any sort of artifacts in the output raster?
> i.e. I have seen artifaces related to segmenting with the v.surf.rst
> module before, when optimal settings are not used.

No. The projection algorithm is unchanged; the only difference is in
the mechanism used to retrieve the contents of a specific source cell.

Great.

The normal r.proj essentially does:

  FCELL val = ibuffer[row][col];

OTOH, with r.proj.seg the code is more like:

  typedef FCELL block[BDIM][BDIM];

  #define HI(i) ((i) / BDIM)
  #define LO(i) ((i) % BDIM)

  ...

  block *b = cache->blocks[HI(row)][HI(col)];

  if (!b) /* block not in memory */
    b = cache->blocks[HI(row)][HI(col)] = read_block(cache, row, col);

  FCELL val = b[LO(row)][LO(col)];

The above isn't actual code; for a start, BDIM is a power of 2, and
the HI/LO macros use shift/mask instead of division/remainder. Also,
the details are hidden inside macros to simplify the application code,
which actually looks like:

  FCELL *p = CPTR(cache,row,col);

The main point to note is that interpolation isn't affected by segment
boundaries, which I suspect is probably what was happening with
v.surf.rst.

Ok, this was my main concern (unfounded it seems).

Assuming that it works out okay, I'll probably end up merging
r.proj.seg into r.proj, with the macro handling the details of whether
to read from a tile cache (large maps) or from a single block of
memory (small maps).

This makes sense. For small maps is it much faster to not use the cache?

Actually, there is one minor difference in the observable behaviour of
the two programs. r.proj.seg doesn't include the half-baked
null-handling hack used by r.proj for the bilinear and bicubic
methods; it just propagates the nulls.

Interesting. I am rather new to r.proj, as I have used gdalwarp for everything
until now.

Consequently, r.proj.seg will tend to enlarge null regions slightly
when using bilinear or bicubic interpolation, whereas r.proj tends to
fill them in (with values whose accuracy is rather dubious). If you
want null regions filled, run r.fillnulls first.

Indeed. Was this a 'documented' feature?

BTW, in the course of writing r.proj.seg, I discovered that
method=bilinear was seriously broken (it had the row/column swapped,
essentially flipping each source cell along the main diagonal). The
fact that no-one noticed until now suggests that either no-one used
method=bilinear, or they didn't pay much attention to the results.

Would this accound for banding in the output from any method=bilinear warps? I
have noticed this before, but had been told that it was an artifact in the
source data.

Cheers,

--
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

Dylan Beaudette wrote:

> Assuming that it works out okay, I'll probably end up merging
> r.proj.seg into r.proj, with the macro handling the details of whether
> to read from a tile cache (large maps) or from a single block of
> memory (small maps).

This makes sense. For small maps is it much faster to not use the cache?

I haven't performed timings; to be of much use, they would need to be
done with "real-world" cases.

As it stands, even if you made the cache large enough to hold the
entire map, it would currently get written out to a temporary file
then read back in on demand.

There's bound to be some overhead to using an extra level of
indirection, but it's probably trivial compared to the projection
calculations (and particularly compared to bicubic interpolation).

One of the next things which needs to be done to r.proj.seg is to add
an option to set the cache size. Once that's done, I'll look into
trapping the case where the cache can hold the entire map, and
eliminate the temporary file in that situation.

> Consequently, r.proj.seg will tend to enlarge null regions slightly
> when using bilinear or bicubic interpolation, whereas r.proj tends to
> fill them in (with values whose accuracy is rather dubious). If you
> want null regions filled, run r.fillnulls first.

Indeed. Was this a 'documented' feature?

AFAICT, no. The word "null" doesn't appear in the r.proj manpage.

> BTW, in the course of writing r.proj.seg, I discovered that
> method=bilinear was seriously broken (it had the row/column swapped,
> essentially flipping each source cell along the main diagonal). The
> fact that no-one noticed until now suggests that either no-one used
> method=bilinear, or they didn't pay much attention to the results.

Would this accound for banding in the output from any method=bilinear warps? I
have noticed this before, but had been told that it was an artifact in the
source data.

Maybe. It's most noticable if the source data is significantly coarser
than the destination region. E.g. re-projecting GTOPO30 DEMs (30
arc-seconds ~= 660m x 930m) to the default Spearfish region (30m)
makes it quite clear.

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements wrote:

> > BTW, in the course of writing r.proj.seg, I discovered that
> > method=bilinear was seriously broken (it had the row/column
> > swapped, essentially flipping each source cell along the main
> > diagonal). The fact that no-one noticed until now suggests that
> > either no-one used method=bilinear, or they didn't pay much
> > attention to the results.
>
> Would this accound for banding in the output from any
> method=bilinear warps? I have noticed this before, but had been
> told that it was an artifact in the source data.

Maybe. It's most noticable if the source data is significantly coarser
than the destination region. E.g. re-projecting GTOPO30 DEMs (30
arc-seconds ~= 660m x 930m) to the default Spearfish region (30m)
makes it quite clear.

(I tried bilinear last week and got banding, so went back to nearest)

I've done some trials:
  http://bambi.otago.ac.nz/hamish/grass/r.proj_methods_ORIG.png
  http://bambi.otago.ac.nz/hamish/grass/r.proj_methods_BCYR.png

these are the same rasters, the two images show different r.colors
rules.

WRT "banding". I suspect this is like the first image- it is just an
artifact of using category based color rules with a FCELL raster map.
Unknown values -> white, exact "plateaus"-> original color rule.

#ORIG rules:
% 0 255
0:255
1:0
2:200:0:255
3:241:185:255
4:60:190:217
5:197:235:244
6:98:171:116
7:246:200:110
8:185:150:83

changing the rules to FP ranges makes the banding disappear, but
introduces new color artifacts due to meaningless inter-cat values.

% 0 255
0:255 1:0
1:0 2:200:0:255
2:200:0:255 3:241:185:255
3:241:185:255 4:60:190:217
4:60:190:217 5:197:235:244
5:197:235:244 6:98:171:116
6:98:171:116 7:246:200:110
7:246:200:110 8:185:150:83

WRT the bug Glynn found, the above issue hides the bug, but in the
second false-color image you can see that the fixed bilinear method more
closely matches the cubic method. (this is much more obvious if viewed
with xganim)

any reason r.proj forces FCELL output?

Hamish

Hamish wrote:

> > > BTW, in the course of writing r.proj.seg, I discovered that
> > > method=bilinear was seriously broken (it had the row/column
> > > swapped, essentially flipping each source cell along the main
> > > diagonal). The fact that no-one noticed until now suggests that
> > > either no-one used method=bilinear, or they didn't pay much
> > > attention to the results.
> >
> > Would this accound for banding in the output from any
> > method=bilinear warps? I have noticed this before, but had been
> > told that it was an artifact in the source data.
>
> Maybe. It's most noticable if the source data is significantly coarser
> than the destination region. E.g. re-projecting GTOPO30 DEMs (30
> arc-seconds ~= 660m x 930m) to the default Spearfish region (30m)
> makes it quite clear.

Results:

Before fix: http://www.zen87603.zen.co.uk/proj-old.png
After fix: http://www.zen87603.zen.co.uk/proj-new.png

(I tried bilinear last week and got banding, so went back to nearest)

I've done some trials:
  http://bambi.otago.ac.nz/hamish/grass/r.proj_methods_ORIG.png
  http://bambi.otago.ac.nz/hamish/grass/r.proj_methods_BCYR.png

changing the rules to FP ranges makes the banding disappear, but
introduces new color artifacts due to meaningless inter-cat values.

Interpolation (bilinear or bicubic) doesn't make sense for discrete
categories.

any reason r.proj forces FCELL output?

Probably to conserve memory.

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements wrote:

> > Assuming that it works out okay, I'll probably end up merging
> > r.proj.seg into r.proj, with the macro handling the details of whether
> > to read from a tile cache (large maps) or from a single block of
> > memory (small maps).
>
> This makes sense. For small maps is it much faster to not use the cache?

I haven't performed timings; to be of much use, they would need to be
done with "real-world" cases.

As it stands, even if you made the cache large enough to hold the
entire map, it would currently get written out to a temporary file
then read back in on demand.

There's bound to be some overhead to using an extra level of
indirection, but it's probably trivial compared to the projection
calculations (and particularly compared to bicubic interpolation).

One of the next things which needs to be done to r.proj.seg is to add
an option to set the cache size. Once that's done, I'll look into
trapping the case where the cache can hold the entire map, and
eliminate the temporary file in that situation.

I've done that; some (not exactly representative) results reprojecting
GTOPO30 to spearfish (30m, 634 cols x 477 rows):

$ r.proj input=w140n90 location=global mapset=glynn output=proj method=nearest

real 0m2.971s
user 0m2.922s
sys 0m0.023s
$ r.proj.seg memory=100 input=w140n90 location=global mapset=glynn output=proj method=nearest

real 0m2.929s
user 0m2.891s
sys 0m0.018s
$ r.proj input=w140n90 location=global mapset=glynn output=proj method=bilinear

real 0m3.419s
user 0m3.280s
sys 0m0.106s
$ r.proj.seg input=w140n90 location=global mapset=glynn output=proj method=bilinear memory=100

real 0m3.461s
user 0m3.346s
sys 0m0.088s
$ r.proj input=w140n90 location=global mapset=glynn output=proj method=cubic

real 0m3.877s
user 0m3.731s
sys 0m0.111s
$ r.proj.seg memory=100 input=w140n90 location=global mapset=glynn output=proj method=cubic

real 0m4.169s
user 0m4.058s
sys 0m0.093s

The speed of r.proj.seg relative to r.proj ranges from ~1.4% faster
(nearest) to ~7.5% slower (bicubic). If those results hold up for
representative data, r.proj.seg can replace r.proj.

--
Glynn Clements <glynn@gclements.plus.com>

On Tuesday 19 December 2006 05:19, Glynn Clements wrote:

Glynn Clements wrote:
> > > Assuming that it works out okay, I'll probably end up merging
> > > r.proj.seg into r.proj, with the macro handling the details of
> > > whether to read from a tile cache (large maps) or from a single block
> > > of memory (small maps).
> >
> > This makes sense. For small maps is it much faster to not use the
> > cache?
>
> I haven't performed timings; to be of much use, they would need to be
> done with "real-world" cases.
>
> As it stands, even if you made the cache large enough to hold the
> entire map, it would currently get written out to a temporary file
> then read back in on demand.
>
> There's bound to be some overhead to using an extra level of
> indirection, but it's probably trivial compared to the projection
> calculations (and particularly compared to bicubic interpolation).
>
> One of the next things which needs to be done to r.proj.seg is to add
> an option to set the cache size. Once that's done, I'll look into
> trapping the case where the cache can hold the entire map, and
> eliminate the temporary file in that situation.

I've done that; some (not exactly representative) results reprojecting
GTOPO30 to spearfish (30m, 634 cols x 477 rows):

$ r.proj input=w140n90 location=global mapset=glynn output=proj
method=nearest

real 0m2.971s
user 0m2.922s
sys 0m0.023s
$ r.proj.seg memory=100 input=w140n90 location=global mapset=glynn
output=proj method=nearest

real 0m2.929s
user 0m2.891s
sys 0m0.018s
$ r.proj input=w140n90 location=global mapset=glynn output=proj
method=bilinear

real 0m3.419s
user 0m3.280s
sys 0m0.106s
$ r.proj.seg input=w140n90 location=global mapset=glynn output=proj
method=bilinear memory=100

real 0m3.461s
user 0m3.346s
sys 0m0.088s
$ r.proj input=w140n90 location=global mapset=glynn output=proj
method=cubic

real 0m3.877s
user 0m3.731s
sys 0m0.111s
$ r.proj.seg memory=100 input=w140n90 location=global mapset=glynn
output=proj method=cubic

real 0m4.169s
user 0m4.058s
sys 0m0.093s

The speed of r.proj.seg relative to r.proj ranges from ~1.4% faster
(nearest) to ~7.5% slower (bicubic). If those results hold up for
representative data, r.proj.seg can replace r.proj.

i had similar tming results for my data:

Input Projection Parameters: +proj=longlat +no_defs +a=6378137
+rf=298.257222101 +towgs84=0.000,0.000,0.000
Input Unit Factor: 1

Output Projection Parameters: +proj=aea +lat_1=20 +lat_2=60 +lat_0=40
+lon_0=-96 +x_0=0 +y_0=0 +no_defs +a=6378137 +rf=298.257222101
+towgs84=0.000,0.000,0.000
Output Unit Factor: 1
Input:
Cols: 3500 (3500)
Rows: 2500 (2500)
North: 40.000000 (40.000000)
South: 37.500000 (37.500000)
West: -123.000000 (-123.000000)
East: -119.500000 (-119.500000)
ew-res: 0.001000
ns-res: 0.001000

Output:
Cols: 3934 (7340)
Rows: 3994 (13894)
North: 306990.000000 (560610.000000)
South: -52470.000000 (-689850.000000)
West: -2223720.000000 (-2236860.000000)
East: -1869660.000000 (-1576260.000000)
ew-res: 90.000000
ns-res: 90.000000

---------------- results --------------------
r.proj:
real 0m32.558s
user 0m31.690s
sys 0m0.336s

r.proj.seg:
real 0m34.995s
user 0m32.978s
sys 0m0.952s

the output from r.proj and r.proj.seg give about 99.99% identical results. I
have attached the output from r.report run on the difference between the two
maps.

it looks like a couple of us have confirmed these results. Replacing r.proj
with the new r.proj.seg seems like a good idea:

1. fixes the nasty bilinear bug (yikes)
2. handles very large files (great!)

great job glynn!

Cheers,

--
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

(attachments)

report.txt (2.73 KB)

Dylan Beaudette wrote:

the output from r.proj and r.proj.seg give about 99.99% identical results. I
have attached the output from r.report run on the difference between the two
maps.

Odd. There shouldn't be *any* difference (other than the fact that
r.proj.seg will tend to produce some nulls where r.proj produces
non-null values). The difference map should have min=max=0, i.e.
wherever both programs produce non-null values, they should be
identical.

AFAICT, there shouldn't even be any rounding-error differences; the
actual interpolation expressions are essentially identical.

Can you figure out a way to reproduce this with readily-available data
(or even synthetic data)?

--
Glynn Clements <glynn@gclements.plus.com>

Jerry Nelson wrote:

Here's the results of my attempt to use r.proj.seg. I'm using grass6.3cvs,
which I updated yesterday (19 Dec). There are some other issues with the
update (eg using the gui to do d.vect generates strange error message), so
it is possible these results are not a problem with r.proj.seg. Jerry

Allocating memory and reading input map...
File size limit exceeded

No, this is an r.proj.seg issue.

r.proj.seg input=AfricaHornElev location=world mapset=PERMANENT
output=AfricaHornElev method=cubic --overwrite

Input Projection Parameters: +proj=longlat +a=6378137 +rf=298.257223563
+no_defs
Input Unit Factor: 1

Output Projection Parameters: +proj=utm +zone=37 +a=6378137
+rf=298.257223563 +n
o_defs
Output Unit Factor: 1
Input:
Cols: 22875 (25200)
Rows: 27201 (27600)

22,875 * 27,201 = 622,222,875 cells * 4 bytes/cell = 2,488,891,500
bytes, which will exceed the file size limit unless LFS is enabled
(--enable-largefile) and working.

I did try to make the code work with LFS, but I may well have gotten
something wrong; I haven't tested that aspect yet (I don't have any
maps that large).

Is USE_LARGEFILES set in include/Make/Platform.make?

If it is, my first guess would be that this (in r.proj.seg/Makefile):

  ifneq ($(USE_LARGEFILES),)
    EXTRA_CFLAGS = -D_FILE_OFFSET_BITS=64
  endif

should have come *after* this:

  include $(MODULE_TOPDIR)/include/Make/Module.make

OTOH, even if you have LFS ...

Output:
Cols: 17067 (17067)
Rows: 21048 (21048)

... 17067 * 21048 = 359,226,216 output cells is a rather large region
even for "normal" GRASS modules. I would expect that to take a while,
particularly for method=cubic, but it will certainly make a good test
case.

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements wrote:

BTW, in the course of writing r.proj.seg, I discovered that
method=bilinear was seriously broken (it had the row/column
swapped, essentially flipping each source cell along the main
diagonal).

..

Results:

Before fix: http://www.zen87603.zen.co.uk/proj-old.png
After fix: http://www.zen87603.zen.co.uk/proj-new.png

backported to the 6.2 branch.

Hamish

I just had a reason to try this out (nearly forgot about it). I'm a little confused - you say here that the cache size is fixed, yet there is an option to set the cache size (in MB) mentioned in the docs. Maybe this was added later?

The docs should probably also be updated with info relevant to r.proj.seg, instead of being a straight copy of the r.proj description.

On to trying to run it - tempfile problems (I recall seeing something about tempfiles on the list).

In location saproj, mapset srtm. Projecting data from location world (LL proj), mapset sa, map sa (7.5GB):

> r.proj.seg loc=world map=sa in=sa out=srtm

Input Projection Parameters: blah blah
Output Projection Parameters: blah blah
Input: blah blah
Output: blah blah

ERROR: can't make mapset element .tmp/Sumomo.local
        (/Volumes/Guu/gis/grassdb/world/srtm/.tmp)

[it doesn't get to the "Allocating memory and reading input map..." message]

... how the heck is it getting world/srtm as the current mapset (for creating temp files in)!? It should be saproj/srtm. Somehow the location is getting mixed up.

> g.gisenv get=MAPSET
srtm
> g.gisenv get=LOCATION_NAME
saproj

return the correct values.

On Dec 17, 2006, at 7:14 PM, Glynn Clements wrote:

I've added to CVS a modified version of r.proj named r.proj.seg. This
behaves identically to r.proj, but uses a tile cache rather than
reading the entire map into memory.

Currently, each tile is 64 * 64 cells, and the size of the cache is
fixed at 2 * (nx + ny) tiles (where nx * ny is the size of the source
region in tiles), which I would expect to suffice for any realistic
transformation.

[It will definitely suffice for any affine or "near"-affine
transformation. It would be useful to know what level of distortion
needs to be accommodated in practice.]

Cache replacement is random (i.e. when a new tile is loaded, the tile
which it replaces is selected at random). In practice, random
replacement tends to work almost as well as an "optimal" algorithm,
but without the overhead of maintaining usage statistics, as well as
not having any degenerate cases.

Suggestions for improvements and/or performance statistics for
"real-world" usage are welcome.

-----
William Kyngesburye <kyngchaos@kyngchaos.com>
http://www.kyngchaos.com/

"Oh, look, I seem to have fallen down a deep, dark hole. Now what does that remind me of? Ah, yes - life."

- Marvin

William Kyngesburye wrote:

I just had a reason to try this out (nearly forgot about it). I'm a
little confused - you say here that the cache size is fixed, yet
there is an option to set the cache size (in MB) mentioned in the
docs. Maybe this was added later?

Yes. The current version defauxts to 2*(nx+ny) blocks, but can be
overriden via the memory= option.

The docs should probably also be updated with info relevant to
r.proj.seg, instead of being a straight copy of the r.proj description.

The only thing that differs between the two is the memory= option. I'm
not sure what needs to be said about that option beyond what's already
in its description string.

On to trying to run it - tempfile problems (I recall seeing something
about tempfiles on the list).

Right; it needs to create a temporary file.

In location saproj, mapset srtm. Projecting data from location world
(LL proj), mapset sa, map sa (7.5GB):

> r.proj.seg loc=world map=sa in=sa out=srtm

Input Projection Parameters: blah blah
Output Projection Parameters: blah blah
Input: blah blah
Output: blah blah

ERROR: can't make mapset element .tmp/Sumomo.local
        (/Volumes/Guu/gis/grassdb/world/srtm/.tmp)

[it doesn't get to the "Allocating memory and reading input map..."
message]

... how the heck is it getting world/srtm as the current mapset (for
creating temp files in)!? It should be saproj/srtm. Somehow the
location is getting mixed up.

Aha.

r.proj.seg (and r.proj) both work by switching environments between
the input and output locations. It appears to be creating the
temporary file while the input location is active, when it should be
using the output location (you may not have write permission for the
input location).

Thanks for discovering this; I'll apply the following as soon as I've
tested it:

--- raster/r.proj.seg/readcell.c 19 Dec 2006 13:24:54 -0000 1.3
+++ raster/r.proj.seg/readcell.c 8 Feb 2007 10:59:22 -0000
@@ -51,7 +51,10 @@

   if (nblocks < nx * ny)
   {
+ /* Temporary file must be created in output location */
+ G__switch_env();
     filename = G_tempfile();
+ G__switch_env();
     c->fd = open(filename, O_RDWR|O_CREAT|O_EXCL, 0600);
     if (c->fd < 0)
       G_fatal_error("Unable to open temporary file");

--
Glynn Clements <glynn@gclements.plus.com>

On Feb 8, 2007, at 5:12 AM, Glynn Clements wrote:

The docs should probably also be updated with info relevant to
r.proj.seg, instead of being a straight copy of the r.proj description.

The only thing that differs between the two is the memory= option. I'm
not sure what needs to be said about that option beyond what's already
in its description string.

maybe mention it in the description? Other than the memory option and name

In location saproj, mapset srtm. Projecting data from location world
(LL proj), mapset sa, map sa (7.5GB):

r.proj.seg loc=world map=sa in=sa out=srtm

Input Projection Parameters: blah blah
Output Projection Parameters: blah blah
Input: blah blah
Output: blah blah

ERROR: can't make mapset element .tmp/Sumomo.local
        (/Volumes/Guu/gis/grassdb/world/srtm/.tmp)

[it doesn't get to the "Allocating memory and reading input map..."
message]

... how the heck is it getting world/srtm as the current mapset (for
creating temp files in)!? It should be saproj/srtm. Somehow the
location is getting mixed up.

Aha.

r.proj.seg (and r.proj) both work by switching environments between
the input and output locations. It appears to be creating the
temporary file while the input location is active, when it should be
using the output location (you may not have write permission for the
input location).

That's the odd part - it's really a combination of the input *location* and the output (current) *mapset* name.

Thanks for discovering this; I'll apply the following as soon as I've
tested it:

--- raster/r.proj.seg/readcell.c 19 Dec 2006 13:24:54 -0000 1.3
+++ raster/r.proj.seg/readcell.c 8 Feb 2007 10:59:22 -0000
@@ -51,7 +51,10 @@

   if (nblocks < nx * ny)
   {
+ /* Temporary file must be created in output location */
+ G__switch_env();
     filename = G_tempfile();
+ G__switch_env();
     c->fd = open(filename, O_RDWR|O_CREAT|O_EXCL, 0600);
     if (c->fd < 0)
       G_fatal_error("Unable to open temporary file");

I didn't see the switch_env in main just before readcell is called (and might not have realized what it meant had seen it).

I'll give this a try, it'll help project the second have of this project today if it works. Since you didn't get any response before, I'll try some timing tests to compare r.proj and r.proj.seg, and compare the data generated, some time in the next week or so.

-----
William Kyngesburye <kyngchaos@kyngchaos.com>
http://www.kyngchaos.com/

[Trillian] What are you supposed to do WITH a maniacally depressed robot?

[Marvin] You think you have problems? What are you supposed to do if you ARE a maniacally depressed robot? No, don't try and answer, I'm 50,000 times more intelligent than you and even I don't know the answer...

- HitchHiker's Guide to the Galaxy