[GRASS5] r.in.mat, r.out.mat for MATLAB I/O

[resent as yesterday's message never made it to the list..?]

Hi,

To make my life a bit easier, I was thinking out writing two new
modules: r.in.mat and r.out.mat. These would save and load Matlab
binary ".mat" files in a manner much like r.in.ascii & r.out.ascii.

The format is endian-aware, highly portable, and fairly simple.

Octave is a GPL'd program that works "not unlike" Matlab. It
could read/write these files as well.

Issues:

* There are two MAT-File formats, version 4 & 5. Version 4 is simpler
and >~5 years backwards compatible, but version 5 lets me put a map into
a nice structure instead of just dumping a series of variables and
arrays to the workspace. Version 5 also allows >2-D arrays, which would
allow someone to add support for 3D rasters at some point if they
wanted.

* Matlab provides headers and libraries for reading/writing MAT files
for both C and FORTRAN, but for both portability and copyrights reasons I
wouldn't use them. I might try and use some Octave code if it fits.

* I've got to make a choice whether to make [1,1] to be top-left or
bottom-left of the map. Matlab can deal with both ("axis xy" or
"axis ij"), but "xy" (bottom-left) is default. Which is better to use?

* The data storage format is (I think) in the FORTRAN style,

e.g. an array called "A":

A = [ 1 2 3
      4 5 6
      7 8 9 ];

is saved as A[9] = {1, 4, 7, 2, 5, 8, 3, 6, 9};
with rows x column data stored elsewhere.

I can see this being a headache.. Have to load the whole map instead of
just one row at a time. Have to use lots of memory or lots of slow loops..

So my current plan is to write from scratch as two GRASS modules
r.[in|out].mat that will write a structure as described below. Matlab
variables can't start with a number or include certain characters (e.g.
"-"), so the structure name would be a cleansed version of the map name.

cleansed_mapname {
  north: xxxxxx.xx
  south: xxxxxx.xx
  east: xxxxxx.xx
  west: xxxxxx.xx
  map_array: [m by n] (int|float|double)
  title: 'char array'
  meta: cell array containing "hist" file
}

Rows, cols, resolution, and type (int|float|double) are all built in
or discoverable from the map_array & NSEW. So wouldn't be included.

I'd follow r.[in|out].ascii's style of n,s,e,w variables being values at
the outer edge of cells, not the centroids of outside edge cells.
(This should be true for everything in GRASS, yes?)

I could, but probably wouldn't at this point, include support for 3D maps,
as well as category and color information. I would write the programs to
be extensible though (warning on unknown data type beyond mandatory ones).

Null is always "NaN".

Matlab supports the IEEE value of "inf" and "-inf", do I have to trap
that in r.in.mat and convert to null?

thanks for any comments & ideas,
Hamish

Hamish wrote:

To make my life a bit easier, I was thinking out writing two new
modules: r.in.mat and r.out.mat. These would save and load Matlab
binary ".mat" files in a manner much like r.in.ascii & r.out.ascii.

The format is endian-aware, highly portable, and fairly simple.

Octave is a GPL'd program that works "not unlike" Matlab. It
could read/write these files as well.

Issues:

* There are two MAT-File formats, version 4 & 5. Version 4 is simpler
and >~5 years backwards compatible, but version 5 lets me put a map into
a nice structure instead of just dumping a series of variables and
arrays to the workspace. Version 5 also allows >2-D arrays, which would
allow someone to add support for 3D rasters at some point if they
wanted.

* Matlab provides headers and libraries for reading/writing MAT files
for both C and FORTRAN, but for both portability and copyrights reasons I
wouldn't use them. I might try and use some Octave code if it fits.

* I've got to make a choice whether to make [1,1] to be top-left or
bottom-left of the map. Matlab can deal with both ("axis xy" or
"axis ij"), but "xy" (bottom-left) is default. Which is better to use?

In GRASS, row 0 is the North edge and column 0 is the West edge so, to
the extent that it makes a difference, top-left would seem more
logical.

What is the difference exactly? Just how arrays are displayed?

* The data storage format is (I think) in the FORTRAN style,

e.g. an array called "A":

A = [ 1 2 3
      4 5 6
      7 8 9 ];

is saved as A[9] = {1, 4, 7, 2, 5, 8, 3, 6, 9};
with rows x column data stored elsewhere.

I can see this being a headache.. Have to load the whole map instead of
just one row at a time. Have to use lots of memory or lots of slow loops..

Or just dump the data in the order that GRASS provides it, and
transpose it in Matlab.

So my current plan is to write from scratch as two GRASS modules
r.[in|out].mat that will write a structure as described below. Matlab
variables can't start with a number or include certain characters (e.g.
"-"), so the structure name would be a cleansed version of the map name.

cleansed_mapname {
  north: xxxxxx.xx
  south: xxxxxx.xx
  east: xxxxxx.xx
  west: xxxxxx.xx
  map_array: [m by n] (int|float|double)
  title: 'char array'
  meta: cell array containing "hist" file
}

Rows, cols, resolution, and type (int|float|double) are all built in
or discoverable from the map_array & NSEW. So wouldn't be included.

I'd follow r.[in|out].ascii's style of n,s,e,w variables being values at
the outer edge of cells, not the centroids of outside edge cells.
(This should be true for everything in GRASS, yes?)

Yes. The cellhd file, struct Cell_head etc all refer to corners, not
centres.

I could, but probably wouldn't at this point, include support for 3D maps,
as well as category and color information. I would write the programs to
be extensible though (warning on unknown data type beyond mandatory ones).

Null is always "NaN".

Matlab supports the IEEE value of "inf" and "-inf", do I have to trap
that in r.in.mat and convert to null?

I'm not sure. The handling of infinite values isn't clearly defined.

E.g. r.mapcalc's division operator specifically checks for a zero
denominator and returns null in that case, although that was probably
from blindly copying the old behaviour. OTOH, this contradicts the
r.mapcalc manual page, which says:

  Division by 0 and modulus by 0 are acceptable
  and give a 0 result

This was true in 4.3 (which didn't have null or FP), but not in the
old 5.0 r.mapcalc nor the new one (division by zero returns null for
both integer and FP in both versions).

AFAICT, there isn't any fundamental reason why infinite values
shouldn't be allowed in FP maps. OTOH, I wouldn't be surprised if
there's a lot of code which doesn't handle this case (e.g. calling
sprintf("%f") and expecting to find a "." in the resulting string).

--
Glynn Clements <glynn.clements@virgin.net>

> To make my life a bit easier, I was thinking out writing two new
> modules: r.in.mat and r.out.mat. These would save and load Matlab
> binary ".mat" files in a manner much like r.in.ascii & r.out.ascii.

...

> * I've got to make a choice whether to make [1,1] to be top-left or
> bottom-left of the map. Matlab can deal with both ("axis xy" or
> "axis ij"), but "xy" (bottom-left) is default. Which is better to
> use?

In GRASS, row 0 is the North edge and column 0 is the West edge so, to
the extent that it makes a difference, top-left would seem more
logical.

ok

What is the difference exactly? Just how arrays are displayed?

Little difference really, in matlab's engine arrays start at [1,1] but
otherwise are just mathematical constructs so it doesn't really
matter/is all in your mind.

If you display raw values to the console screen, [1,1] is top left.
However if you plot the array as a surface in the default "axis xy"
mode, [1,1] is the bottom left (well [0,0] really but it snaps to the
extent of the data) & the map is displayed upside-down. If you give it
the command "axis ij", it flips the x-axis in the plot window and the
map displays as you'd expect.

> * The data storage format is (I think) in the FORTRAN style,
>
> e.g. an array called "A":
>
> A = [ 1 2 3
> 4 5 6
> 7 8 9 ];
>
> is saved as A[9] = {1, 4, 7, 2, 5, 8, 3, 6, 9};
> with rows x column data stored elsewhere.
>
> I can see this being a headache.. Have to load the whole map instead
> of just one row at a time. Have to use lots of memory or lots of
> slow loops..

Or just dump the data in the order that GRASS provides it, and
transpose it in Matlab.

That's *really* easy to do:
map = map';

with perhaps a flipud() or fliplr() if needed.

but that seems like a half measure to me.
Probably what I'll try first in the prototype version though.

> Matlab supports the IEEE values of "inf" and "-inf", do I have to
> trap that in r.in.mat and convert to null?

I'm not sure. The handling of infinite values isn't clearly defined.

Until someone has need for it (I don't), I think I'll just tag them as
null. (slope?)

regards,
Hamish