[GRASS-dev] how to work around "Arument list too long" error in a GRASS python script ?

Hello,

In a python script I have the following call:

grass.run_command('r.series',
                    input = rate_maps,
                    output = sum_rates,
                    method = 'sum',
                    overwrite = True,
                    quiet=True)

rate_maps is a list which in one instance contains 8559 map names, leading to an "OSError: [Errno 7] Argument list too long".

I know that in the shell I could use xargs to work around such a problem. But how to do this in python ?

I could obviously loop through all maps and thus sum them individually, but this just seems horribly inefficient.

Does anyone have a better solution ?

Moritz

* Moritz Lennert <mlennert@club.worldonline.be> [2015-06-11 18:37:36 +0200]:

Hello,

In a python script I have the following call:

grass.run_command('r.series',
                    input = rate_maps,
                    output = sum_rates,
                    method = 'sum',
                    overwrite = True,
                    quiet=True)

rate_maps is a list which in one instance contains 8559 map names,
leading to an "OSError: [Errno 7] Argument list too long".

I know that in the shell I could use xargs to work around such a
problem. But how to do this in python ?

What it the OS limit for it?

I could obviously loop through all maps and thus sum them individually,
but this just seems horribly inefficient.

Does anyone have a better solution ?

- Maybe split in two or three sessions (instead of looping over all)?
- Have you seen this
<https://gist.github.com/max-nova/487a82de00651a33f2c2#file-grass_demo-py-L28&gt;

Nikos

On Thu, Jun 11, 2015 at 12:37 PM, Moritz Lennert <
mlennert@club.worldonline.be> wrote:

Hello,

In a python script I have the following call:

grass.run_command('r.series',
                   input = rate_maps,
                   output = sum_rates,
                   method = 'sum',
                   overwrite = True,
                   quiet=True)

rate_maps is a list which in one instance contains 8559 map names, leading
to an "OSError: [Errno 7] Argument list too long".

I know that in the shell I could use xargs to work around such a problem.
But how to do this in python ?

I could obviously loop through all maps and thus sum them individually,
but this just seems horribly inefficient.

Does anyone have a better solution ?

Moritz

It seems we are hitting these issues quite often, we should consider add
input file as an option like many temporal modules already have. But
splitting as Nikos suggests has the advantage that you can parallelize the
computation.

Anna

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

+1 add input file as an option

···

On 11-06-15 19:22, Anna Petrášová wrote:

On Thu, Jun 11, 2015 at 12:37 PM, Moritz Lennert <mlennert@club.worldonline.be> wrote:

Hello,

In a python script I have the following call:

grass.run_command(‘r.series’,
input = rate_maps,
output = sum_rates,
method = ‘sum’,
overwrite = True,
quiet=True)

rate_maps is a list which in one instance contains 8559 map names, leading to an “OSError: [Errno 7] Argument list too long”.

I know that in the shell I could use xargs to work around such a problem. But how to do this in python ?

I could obviously loop through all maps and thus sum them individually, but this just seems horribly inefficient.

Does anyone have a better solution ?

Moritz

It seems we are hitting these issues quite often, we should consider add input file as an option like many temporal modules already have. But splitting as Nikos suggests has the advantage that you can parallelize the computation.

Anna


grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

_______________________________________________
grass-dev mailing list
[grass-dev@lists.osgeo.org](mailto:grass-dev@lists.osgeo.org)
[http://lists.osgeo.org/mailman/listinfo/grass-dev](http://lists.osgeo.org/mailman/listinfo/grass-dev)

On 11/06/15 19:02, Nikos Alexandris wrote:

* Moritz Lennert <mlennert@club.worldonline.be> [2015-06-11 18:37:36 +0200]:

Hello,

In a python script I have the following call:

grass.run_command('r.series',
                     input = rate_maps,
                     output = sum_rates,
                     method = 'sum',
                     overwrite = True,
                     quiet=True)

rate_maps is a list which in one instance contains 8559 map names,
leading to an "OSError: [Errno 7] Argument list too long".

I know that in the shell I could use xargs to work around such a
problem. But how to do this in python ?

What it the OS limit for it?

I suppose this is ARG_MAX ?

getconf ARG_MAX
2097152

A text file with all file names only uses 144551 bytes.

Or is there another limit I should look at ?

I could obviously loop through all maps and thus sum them individually,
but this just seems horribly inefficient.

Does anyone have a better solution ?

- Maybe split in two or three sessions (instead of looping over all)?

Yes, thanks, I can do that. I'll also try the file option mentioned by Anna (r.series actually has one). Didn't think of that.

Thanks to both of you !

Moritz

On Thu, Jun 11, 2015 at 1:46 PM, Moritz Lennert <
mlennert@club.worldonline.be> wrote:

On 11/06/15 19:02, Nikos Alexandris wrote:

* Moritz Lennert <mlennert@club.worldonline.be> [2015-06-11 18:37:36
+0200]:

Hello,

In a python script I have the following call:

grass.run_command('r.series',
                     input = rate_maps,
                     output = sum_rates,
                     method = 'sum',
                     overwrite = True,
                     quiet=True)

rate_maps is a list which in one instance contains 8559 map names,
leading to an "OSError: [Errno 7] Argument list too long".

I know that in the shell I could use xargs to work around such a
problem. But how to do this in python ?

What it the OS limit for it?

I suppose this is ARG_MAX ?

getconf ARG_MAX
2097152

A text file with all file names only uses 144551 bytes.

Or is there another limit I should look at ?

I could obviously loop through all maps and thus sum them individually,
but this just seems horribly inefficient.

Does anyone have a better solution ?

- Maybe split in two or three sessions (instead of looping over all)?

Yes, thanks, I can do that. I'll also try the file option mentioned by
Anna (r.series actually has one). Didn't think of that.

Oh, I didn't know that it already has it. Good to know!

Thanks to both of you !

Moritz

On 11/06/15 20:05, Anna Petrášová wrote:

On Thu, Jun 11, 2015 at 1:46 PM, Moritz Lennert
<mlennert@club.worldonline.be <mailto:mlennert@club.worldonline.be>> wrote:

    On 11/06/15 19:02, Nikos Alexandris wrote:

        * Moritz Lennert <mlennert@club.worldonline.be
        <mailto:mlennert@club.worldonline.be>> [2015-06-11 18:37:36 +0200]:

            Hello,

            In a python script I have the following call:

            grass.run_command('r.series',
                                  input = rate_maps,
                                  output = sum_rates,
                                  method = 'sum',
                                  overwrite = True,
                                  quiet=True)

            rate_maps is a list which in one instance contains 8559 map
            names,
            leading to an "OSError: [Errno 7] Argument list too long".

            I know that in the shell I could use xargs to work around such a
            problem. But how to do this in python ?

        What it the OS limit for it?

    I suppose this is ARG_MAX ?

    getconf ARG_MAX
    2097152

    A text file with all file names only uses 144551 bytes.

    Or is there another limit I should look at ?

            I could obviously loop through all maps and thus sum them
            individually,
            but this just seems horribly inefficient.

            Does anyone have a better solution ?

        - Maybe split in two or three sessions (instead of looping over
        all)?

    Yes, thanks, I can do that. I'll also try the file option mentioned
    by Anna (r.series actually has one). Didn't think of that.

Oh, I didn't know that it already has it. Good to know!

And, just for info, it solves my problem beautifully.

Moritz

Moritz Lennert wrote:

>> In a python script I have the following call:
>>
>> grass.run_command('r.series',
>> input = rate_maps,
>> output = sum_rates,
>> method = 'sum',
>> overwrite = True,
>> quiet=True)
>>
>> rate_maps is a list which in one instance contains 8559 map names,
>> leading to an "OSError: [Errno 7] Argument list too long".
>>
>> I know that in the shell I could use xargs to work around such a
>> problem. But how to do this in python ?
>
> What it the OS limit for it?

I suppose this is ARG_MAX ?

getconf ARG_MAX
2097152

A text file with all file names only uses 144551 bytes.

Or is there another limit I should look at ?

Apparently there's a limit on the maximum length of a single argument,
which appears to be fixed at 128 KiB:

  > subprocess.call(["echo", "x"*131071],stdout=devnull,stderr=devnull)
  0
  > subprocess.call(["echo", "x"*131072],stdout=devnull,stderr=devnull)
  Traceback (most recent call last):
[snipped]
  OSError: [Errno 7] Argument list too long

This specific issue could be worked around by having
grass.make_command() use multiple arguments when the value is a list,
e.g. "r.series input=map1 input=map2 ...".

But such cases are likely to be problematic on other platforms where
ARG_MAX may be much lower. POSIX only requires that it is at least
4096, and it wasn't so long ago that such a value was commonplace. And
that value includes the environment as well as the command line.

Modules which commonly use many input maps should have the option to
read map names from a file. This mostly affects r.series, which
already does so.

If this issue was more common, we could consider extending G_parser()
to allow transparently reading any option value from a file, using
e.g. input=@filename. But that may just cause similar issues with
--interface-description, G_recreate_command(), etc.

--
Glynn Clements <glynn@gclements.plus.com>

On 15/06/15 13:41, Glynn Clements wrote:

Moritz Lennert wrote:

In a python script I have the following call:

grass.run_command('r.series',
                      input = rate_maps,
                      output = sum_rates,
                      method = 'sum',
                      overwrite = True,
                      quiet=True)

rate_maps is a list which in one instance contains 8559 map names,
leading to an "OSError: [Errno 7] Argument list too long".

I know that in the shell I could use xargs to work around such a
problem. But how to do this in python ?

What it the OS limit for it?

I suppose this is ARG_MAX ?

getconf ARG_MAX
2097152

A text file with all file names only uses 144551 bytes.

Or is there another limit I should look at ?

Apparently there's a limit on the maximum length of a single argument,
which appears to be fixed at 128 KiB:

   > subprocess.call(["echo", "x"*131071],stdout=devnull,stderr=devnull)
   0
   > subprocess.call(["echo", "x"*131072],stdout=devnull,stderr=devnull)
   Traceback (most recent call last):
  [snipped]
   OSError: [Errno 7] Argument list too long

This specific issue could be worked around by having
grass.make_command() use multiple arguments when the value is a list,
e.g. "r.series input=map1 input=map2 ...".

But such cases are likely to be problematic on other platforms where
ARG_MAX may be much lower. POSIX only requires that it is at least
4096, and it wasn't so long ago that such a value was commonplace. And
that value includes the environment as well as the command line.

Modules which commonly use many input maps should have the option to
read map names from a file. This mostly affects r.series, which
already does so.

If this issue was more common, we could consider extending G_parser()
to allow transparently reading any option value from a file, using
e.g. input=@filename. But that may just cause similar issues with
--interface-description, G_recreate_command(), etc.

I think that the file option solution is the most adequate. r./v.patch might be potential candidates for that as well.

For the other modules that have such input lists, I doubt that anyone will ever use these with such long lists of maps.

Moritz

On Mon, Jun 15, 2015 at 7:41 AM, Glynn Clements <glynn@gclements.plus.com>
wrote:

Modules which commonly use many input maps should have the option to
read map names from a file. This mostly affects r.series, which
already does so.

If this issue was more common, we could consider extending G_parser()
to allow transparently reading any option value from a file, using
e.g. input=@filename. But that may just cause similar issues with
--interface-description, G_recreate_command(), etc.

Good idea, although I'm afraid of potential problems with interface.
Similar solution, with similar potential issues, is to create automatically
`input_file` when `input` is multiple or when specified in code. Moving
things to parser is definitively more advantageous for things like
overwrite but it may cause some code and reading duplication for Python
scripts.