[GRASS-dev] Parsing output of r.category which includes labels

If I am not wrong, all use cases of `read_command()` [0, 1],
in (at least) the grass-addons repository, do not consider an output
from `r.category` which includes labels.

[0] https://grass.osgeo.org/grass74/manuals/libpython/script.html?highlight=read_command#script.core.read_command
[1] https://grass.osgeo.org/grass75/manuals/libpython/script.html?highlight=read_command#script.core.read_command

I work on such a case where category numbers come along with label strings.
To read category numbers, I came up with:

import grass.script as grass
grass.read_command('r.category', map=base).split('\n')[:-1]

for category in categories:
        category = category.split('\t')[0]

Is there any other command that will do this better? Would you consider
adding one?

Nikos

Hi Nikos,

You could use numpy and genfromtxt() to parse the output string...
genfromtxt() requires an StringIO object (or file) and StringIO (from io) requires unicode()...

So you could do:

from io import StringIO
import numpy as np
output = np.genfromtxt(StringIO(unicode(grass.read_command('r.category', map=base))) , delimiter='\t', dtype=None, names=['cat', 'label'])

That causes however some overhead [1]. So if it makes sense depends on what you want to do with the data in the further processing chain...

Cheers
Stefan

1: https://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html

-----Original Message-----
From: grass-dev <grass-dev-bounces@lists.osgeo.org> On Behalf Of Nikos Alexandris
Sent: søndag 19. august 2018 22:19
To: GRASS-GIS development mailing list <grass-dev@lists.osgeo.org>
Subject: [GRASS-dev] Parsing output of r.category which includes labels

If I am not wrong, all use cases of `read_command()` [0, 1], in (at least) the grass-addons repository, do not consider an output from `r.category` which includes labels.

[0] https://grass.osgeo.org/grass74/manuals/libpython/script.html?highlight=read_command#script.core.read_command
[1] https://grass.osgeo.org/grass75/manuals/libpython/script.html?highlight=read_command#script.core.read_command

I work on such a case where category numbers come along with label strings.
To read category numbers, I came up with:

```
import grass.script as grass
grass.read_command('r.category', map=base).split('\n')[:-1]

for category in categories:
        category = category.split('\t')[0] ```

Is there any other command that will do this better? Would you consider adding one?

Nikos

On 19/08/18 22:19, Nikos Alexandris wrote:

If I am not wrong, all use cases of `read_command()` [0, 1],
in (at least) the grass-addons repository, do not consider an output
from `r.category` which includes labels.

[0] https://grass.osgeo.org/grass74/manuals/libpython/script.html?highlight=read_command#script.core.read_command
[1] https://grass.osgeo.org/grass75/manuals/libpython/script.html?highlight=read_command#script.core.read_command

I work on such a case where category numbers come along with label strings.
To read category numbers, I came up with:

import grass.script as grass
grass.read_command('r.category', map=base).split('\n')[:-1]

for category in categories:
         category = category.split('\t')[0]

Is there any other command that will do this better? Would you consider
adding one?

If all the modules are trying to do is get a list of category values, your approach seems the right one to me, but a simple list comprehension should do the trick in one line:

cats = [int(x[0]) for x in [x.split('\t') for x in g.read_command('r.category', map='RasterMap').splitlines()]]

This will work whether there are labels or not. IMHO, there is no need to use anything more sophisticated.

Especially since a

grep -R "r.category" * | grep read_command

only gives 4 hits:

imagery/i.segment.uspo/i.segment.uspo.py: numsegments = len(gscript.read_command('r.category',
raster/r.geomorphon/testsuite/test_r_geom.py: category = read_command('r.category', map=self.outele)
raster/r.geomorphon/testsuite/test_r_geom.py: category = read_command('r.category', map=self.outsint)
raster/r.neighborhoodmatrix/r.neighborhoodmatrix.py: numneighbors = len(gscript.read_command('r.category',

The first and last only read the length (number) of categories, so this isn't an issue.

Have you met other instances ?

Moritz

* Moritz Lennert <mlennert@club.worldonline.be> [2018-08-20 13:40:36 +0200]:

On 19/08/18 22:19, Nikos Alexandris wrote:

If I am not wrong, all use cases of `read_command()` [0, 1],
in (at least) the grass-addons repository, do not consider an output
from `r.category` which includes labels.

[0] https://grass.osgeo.org/grass74/manuals/libpython/script.html?highlight=read_command#script.core.read_command
[1] https://grass.osgeo.org/grass75/manuals/libpython/script.html?highlight=read_command#script.core.read_command

I work on such a case where category numbers come along with label strings.
To read category numbers, I came up with:

import grass.script as grass
grass.read_command('r.category', map=base).split('\n')[:-1]

for category in categories:
        category = category.split('\t')[0]

Is there any other command that will do this better? Would you consider
adding one?

If all the modules are trying to do is get a list of category values, your approach seems the right one to me, but a simple list comprehension should do the trick in one line:

cats = [int(x[0]) for x in [x.split('\t') for x in g.read_command('r.category', map='RasterMap').splitlines()]]

Great. I love comprehensions (and generators).
It's one of my favourite Python exercises.

This will work whether there are labels or not. IMHO, there is no need to use anything more sophisticated.

Especially since a

grep -R "r.category" * | grep read_command

only gives 4 hits:

imagery/i.segment.uspo/i.segment.uspo.py: numsegments = len(gscript.read_command('r.category',
raster/r.geomorphon/testsuite/test_r_geom.py: category = read_command('r.category', map=self.outele)
raster/r.geomorphon/testsuite/test_r_geom.py: category = read_command('r.category', map=self.outsint)
raster/r.neighborhoodmatrix/r.neighborhoodmatrix.py: numneighbors = len(gscript.read_command('r.category',

The first and last only read the length (number) of categories, so this isn't an issue.

Have you met other instances ?

No. Yet, my Skepsis now is the following:

The argument you present, if I understand it right, is
"no need to bother", since there aren't but a few potential use cases.

What about better integration and more joyful scripting? `r.category`
handles both values and labels. And there is currently no
`grass.script` helper function that considers both labels out of the
box.

For example, a parser helper that will return a dictionary.
Is this "too much" here?

Thanks Moritz,
Nikos

* Stefan Blumentrath <Stefan.Blumentrath@nina.no> [2018-08-20 10:48:34 +0000]:

Hi Nikos,

You could use numpy and genfromtxt() to parse the output string...
genfromtxt() requires an StringIO object (or file) and StringIO (from io) requires unicode()...

So you could do:

from io import StringIO
import numpy as np
output = np.genfromtxt(StringIO(unicode(grass.read_command('r.category', map=base))) , delimiter='\t', dtype=None, names=['cat', 'label'])

That causes however some overhead [1]. So if it makes sense depends on what you want to do with the data in the further processing chain...

Cheers
Stefan

1: https://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html

Thank for the idea Stefan.
This is really a small (in size) task.
I think it's not worth to use NumPy just for a few lines.

Wouldn't the major reason, to use NumPy, be speed in computations?

Nikos

Hi Nikos,

If you are interested in dictionary output you could do:

category_labels = grass.parse_command('r.category', map='youmap', delimiter='\t')

Cheers
Stefan

-----Original Message-----
From: grass-dev <grass-dev-bounces@lists.osgeo.org> On Behalf Of Nikos Alexandris
Sent: tirsdag 21. august 2018 02:02
To: Moritz Lennert <mlennert@club.worldonline.be>
Cc: GRASS-GIS development mailing list <grass-dev@lists.osgeo.org>
Subject: Re: [GRASS-dev] Parsing output of r.category which includes labels

* Moritz Lennert <mlennert@club.worldonline.be> [2018-08-20 13:40:36 +0200]:

On 19/08/18 22:19, Nikos Alexandris wrote:

If I am not wrong, all use cases of `read_command()` [0, 1], in (at
least) the grass-addons repository, do not consider an output from
`r.category` which includes labels.

[0]
https://grass.osgeo.org/grass74/manuals/libpython/script.html?highligh
t=read_command#script.core.read_command
[1]
https://grass.osgeo.org/grass75/manuals/libpython/script.html?highligh
t=read_command#script.core.read_command

I work on such a case where category numbers come along with label strings.
To read category numbers, I came up with:

```
import grass.script as grass
grass.read_command('r.category', map=base).split('\n')[:-1]

for category in categories:
        category = category.split('\t')[0] ```

Is there any other command that will do this better? Would you
consider adding one?

If all the modules are trying to do is get a list of category values,
your approach seems the right one to me, but a simple list
comprehension should do the trick in one line:

cats = [int(x[0]) for x in [x.split('\t') for x in
g.read_command('r.category', map='RasterMap').splitlines()]]

Great. I love comprehensions (and generators).
It's one of my favourite Python exercises.

This will work whether there are labels or not. IMHO, there is no need
to use anything more sophisticated.

Especially since a

grep -R "r.category" * | grep read_command

only gives 4 hits:

imagery/i.segment.uspo/i.segment.uspo.py: numsegments =
len(gscript.read_command('r.category',
raster/r.geomorphon/testsuite/test_r_geom.py: category =
read_command('r.category', map=self.outele)
raster/r.geomorphon/testsuite/test_r_geom.py: category =
read_command('r.category', map=self.outsint)
raster/r.neighborhoodmatrix/r.neighborhoodmatrix.py: numneighbors =
len(gscript.read_command('r.category',

The first and last only read the length (number) of categories, so this
isn't an issue.

Have you met other instances ?

No. Yet, my Skepsis now is the following:

The argument you present, if I understand it right, is "no need to bother", since there aren't but a few potential use cases.

What about better integration and more joyful scripting? `r.category` handles both values and labels. And there is currently no `grass.script` helper function that considers both labels out of the box.

For example, a parser helper that will return a dictionary.
Is this "too much" here?

Thanks Moritz,
Nikos

On 21/08/18 12:47, Stefan Blumentrath wrote:

Hi Nikos,

If you are interested in dictionary output you could do:

category_labels = grass.parse_command('r.category', map='youmap', delimiter='\t')

Right, nice solution !

For those not very familiar with the scripting library: the "delimiter" option here is an option to parse_command, not to r.category. It decides which delimiter parse_command should use for deciding what is a key and what is a value.

Moritz

* Moritz Lennert <mlennert@club.worldonline.be> [2018-08-21 12:57:01 +0200]:

On 21/08/18 12:47, Stefan Blumentrath wrote:

Hi Nikos,

If you are interested in dictionary output you could do:

category_labels = grass.parse_command('r.category', map='youmap', delimiter='\t')

Right, nice solution !

For those not very familiar with the scripting library: the "delimiter" option here is an option to parse_command, not to r.category. It decides which delimiter parse_command should use for deciding what is a key and what is a value.

Fantastic. I read that 'delimiter' somewhere... :slight_smile:

Nikos

Stefan,

a somewhat irrelevant question to the original subject:

do you think the NumPy way is worth to collage a series of `r.stats`
outputs?

Imagine administrative boundaries and one `r.stats` call for each. They
may be tenths, or hundreds, or thousands as the script is meant to cover
European wide extents.

Or should I just work this out using native Python?

Thank you for any thoughts,
Nikos

* Stefan Blumentrath <Stefan.Blumentrath@nina.no> [2018-08-20 10:48:34 +0000]:

Hi Nikos,

You could use numpy and genfromtxt() to parse the output string...
genfromtxt() requires an StringIO object (or file) and StringIO (from io) requires unicode()...

So you could do:

from io import StringIO
import numpy as np
output = np.genfromtxt(StringIO(unicode(grass.read_command('r.category', map=base))) , delimiter='\t', dtype=None, names=['cat', 'label'])

That causes however some overhead [1]. So if it makes sense depends on what you want to do with the data in the further processing chain...

Cheers
Stefan

1: https://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html

[rest deleted]

Dear Nikos,

Can you give us a bit more context?
What is it you want to achieve? How are you using r.stats and what is it you want to do with the output?

Personally, I am not too familiar with performance implications of NumPy vs. plain Python, but rather use NumPy for convenience in matrix/table operations (avoiding pandas)...

Cheers
Stefan

-----Original Message-----
From: Nikos Alexandris <nik@nikosalexandris.net>
Sent: onsdag 22. august 2018 17:57
To: Stefan Blumentrath <Stefan.Blumentrath@nina.no>
Cc: GRASS-GIS development mailing list <grass-dev@lists.osgeo.org>
Subject: Re: [GRASS-dev] Parsing output of r.category which includes labels

Stefan,

a somewhat irrelevant question to the original subject:

do you think the NumPy way is worth to collage a series of `r.stats` outputs?

Imagine administrative boundaries and one `r.stats` call for each. They may be tenths, or hundreds, or thousands as the script is meant to cover European wide extents.

Or should I just work this out using native Python?

Thank you for any thoughts,
Nikos

* Stefan Blumentrath <Stefan.Blumentrath@nina.no> [2018-08-20 10:48:34 +0000]:

Hi Nikos,

You could use numpy and genfromtxt() to parse the output string...
genfromtxt() requires an StringIO object (or file) and StringIO (from io) requires unicode()...

So you could do:

from io import StringIO
import numpy as np
output =
np.genfromtxt(StringIO(unicode(grass.read_command('r.category',
map=base))) , delimiter='\t', dtype=None, names=['cat', 'label'])

That causes however some overhead [1]. So if it makes sense depends on what you want to do with the data in the further processing chain...

Cheers
Stefan

1: https://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html

[rest deleted]

* Stefan Blumentrath <Stefan.Blumentrath@nina.no> [2018-08-23 07:23:12 +0000]:

Dear Nikos,

Can you give us a bit more context? What is it you want to achieve?
How are you using r.stats and what is it you want to do with the
output?

Personally, I am not too familiar with performance implications of
NumPy vs. plain Python, but rather use NumPy for convenience in
matrix/table operations (avoiding pandas)...

for category in categories:
    statistics_filename = prefix + '_' + category
    r.stats(input=(base,reclassified),
            output=statistics_filename,
            flags='ncapl',
            separator=',',
            quiet=True)

Instead, I want to (modify the above so as to) collect/direct all
iterations in one output file.

The question, finally, seems to be: use Python's 'csv' library or prefer
NumPy? The number of records might reach tens of thousands (or more).

Nikos

On 23/08/18 11:52, Nikos Alexandris wrote:

* Stefan Blumentrath <Stefan.Blumentrath@nina.no> [2018-08-23 07:23:12 +0000]:

Dear Nikos,

Can you give us a bit more context? What is it you want to achieve?
How are you using r.stats and what is it you want to do with the
output?

Personally, I am not too familiar with performance implications of
NumPy vs. plain Python, but rather use NumPy for convenience in
matrix/table operations (avoiding pandas)...

for category in categories:
     statistics_filename = prefix + '_' + category
     r.stats(input=(base,reclassified),
             output=statistics_filename,
             flags='ncapl',
             separator=',',
             quiet=True)

Instead, I want to (modify the above so as to) collect/direct all
iterations in one output file.

You can check the r.neighborhoodmatrix addon for one solution which I shamelessly took from a SE answer:

https://trac.osgeo.org/grass/browser/grass-addons/grass7/raster/r.neighborhoodmatrix/r.neighborhoodmatrix.py#L152

The code takes a list of filenames and the merges these files.

I've been confronted with a similar problem using v.db.select these days and I've been thinking about adding a flag / parameter to relevant modules allowing to append an existing file, instead of overwriting it. Should just be a case of using mode "a" instead of "w", so shouldn't be too complicated.

If you want to, try it with r.stats, by applying this change:

Index: raster/r.stats/main.c

--- raster/r.stats/main.c (révision 72717)
+++ raster/r.stats/main.c (copie de travail)
@@ -223,7 +223,7 @@

      name = option.output->answer;
      if (name != NULL && strcmp(name, "-") != 0) {
- if (NULL == freopen(name, "w", stdout)) {
+ if (NULL == freopen(name, "a", stdout)) {
        G_fatal_error(_("Unable to open file <%s> for writing"), name);
    }
      }

and report back if it works as expected...

Moritz

* Moritz Lennert <mlennert@club.worldonline.be> [2018-08-23 13:02:03 +0200]:

On 23/08/18 11:52, Nikos Alexandris wrote:

* Stefan Blumentrath <Stefan.Blumentrath@nina.no> [2018-08-23 07:23:12 +0000]:

Dear Nikos,

Can you give us a bit more context? What is it you want to achieve?
How are you using r.stats and what is it you want to do with the
output?

Personally, I am not too familiar with performance implications of
NumPy vs. plain Python, but rather use NumPy for convenience in
matrix/table operations (avoiding pandas)...

for category in categories:
    statistics_filename = prefix + '_' + category
    r.stats(input=(base,reclassified),
            output=statistics_filename,
            flags='ncapl',
            separator=',',
            quiet=True)

Instead, I want to (modify the above so as to) collect/direct all
iterations in one output file.

You can check the r.neighborhoodmatrix addon for one solution which I shamelessly took from a SE answer:

https://trac.osgeo.org/grass/browser/grass-addons/grass7/raster/r.neighborhoodmatrix/r.neighborhoodmatrix.py#L152

The code takes a list of filenames and the merges these files.

I've been confronted with a similar problem using v.db.select these days and I've been thinking about adding a flag / parameter to relevant modules allowing to append an existing file, instead of overwriting it. Should just be a case of using mode "a" instead of "w", so shouldn't be too complicated.

If you want to, try it with r.stats, by applying this change:

Index: raster/r.stats/main.c

--- raster/r.stats/main.c (révision 72717)
+++ raster/r.stats/main.c (copie de travail)
@@ -223,7 +223,7 @@

    name = option.output->answer;
    if (name != NULL && strcmp(name, "-") != 0) {
- if (NULL == freopen(name, "w", stdout)) {
+ if (NULL == freopen(name, "a", stdout)) {
      G_fatal_error(_("Unable to open file <%s> for writing"), name);
  }
    }

and report back if it works as expected...

Moritz

I am sneaking in your scripts a lot lately. Just didn't see this, nor
did my greps/ags hit it.

The modification to r.stats works fine. So, it's a matter of an extra
flag then. Yet, I guess, this is to be explained with a proper warning (?).

Thank you Moritz, Nikos

ps- Some of my e-mails fail to deliver to your mailbox.