[GRASS-dev] r.series with file option - too many files open

Hi devs,

I am trying to run r.series with a large number of input maps (5000). Given the large number of input maps, I am using the ‘file’ option.

GRASS 7.5.svn (latlon):~ > r.series output=speciescount method=sum file=test.txt

I get the warning:

WARNING: G__open(read): Unable to open
         '/home/paulo/Data/HASdata/latlon/redlist/cellhd/Eulemur_fulvus':
         Too many open files
ERROR: Error reading reclass file for raster map <Eulemur_fulvus> 

Eulemur_fulvus is the 511th file on the list. From the help file, I thought that using the file option will prevent one from hitting open files limit and the size limit of command line arguments. Am I misunderstanding the r.series helpfile?

On Mon, Jan 15, 2018 at 10:12 AM, Paulo van Breugel
<p.vanbreugel@gmail.com> wrote:

Hi devs,

I am trying to run r.series with a large number of input maps (5000). Given
the large number of input maps, I am using the 'file' option.

GRASS 7.5.svn (latlon):~ > r.series output=speciescount method=sum
file=test.txt

I get the warning:

WARNING: G__open(read): Unable to open

         '/home/paulo/Data/HASdata/latlon/redlist/cellhd/Eulemur_fulvus':

         Too many open files

ERROR: Error reading reclass file for raster map <Eulemur_fulvus>

Eulemur_fulvus is the 511th file on the list. From the help file, I thought
that using the file option will prevent one from hitting open files limit
and the size limit of command line arguments. Am I misunderstanding the
r.series helpfile?

You should probably use -z flag (do not keep files open), but I can
see it's not mentioned in the manual.

Anna

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

On Mon, Jan 15, 2018 at 4:12 PM, Paulo van Breugel <p.vanbreugel@gmail.com> wrote:

Hi devs,

I am trying to run r.series with a large number of input maps (5000). Given the large number of input maps, I am using the ‘file’ option.

GRASS 7.5.svn (latlon):~ > r.series output=speciescount method=sum file=test.txt

I get the warning:

WARNING: G__open(read): Unable to open

‘/home/paulo/Data/HASdata/latlon/redlist/cellhd/Eulemur_fulvus’:

Too many open files

ERROR: Error reading reclass file for raster map <Eulemur_fulvus>

Eulemur_fulvus is the 511th file on the list. From the help file, I thought that using the file option will prevent one from hitting open files limit and the size limit of command line arguments. Am I misunderstanding the r.series helpfile?

the file option prevents too long command lines: instead of providing hundreds of map names as input, only one file with the map names is provided

the manual is wrong:
Use the file option to analyze large amount of raster maps without hitting open files limit and the size limit of command line arguments.

must be
Use the -z flag to analyze large amount of raster maps without hitting open files limit and the file option to avoid hitting the size limit of command line arguments.

Markus M


grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

On 1/15/18 4:39 PM, Markus Metz wrote:

On Mon, Jan 15, 2018 at 4:12 PM, Paulo van Breugel <p.vanbreugel@gmail.com <mailto:p.vanbreugel@gmail.com>> wrote:
>
> Hi devs,
>
> I am trying to run r.series with a large number of input maps (5000). Given the large number of input maps, I am using the 'file' option.
>
> GRASS 7.5.svn (latlon):~ > r.series output=speciescount method=sum file=test.txt
>
> I get the warning:
>
> WARNING: G__open(read): Unable to open
>
> '/home/paulo/Data/HASdata/latlon/redlist/cellhd/Eulemur_fulvus':
>
> Too many open files
>
> ERROR: Error reading reclass file for raster map <Eulemur_fulvus>
>
> Eulemur_fulvus is the 511th file on the list. From the help file, I thought that using the file option will prevent one from hitting open files limit and the size limit of command line arguments. Am I misunderstanding the r.series helpfile?

the file option prevents too long command lines: instead of providing hundreds of map names as input, only one file with the map names is provided

the manual is wrong:
Use the /file/ option to analyze large amount of raster maps without hitting open files limit and the size limit of command line arguments.

must be
Use the -z flag to analyze large amount of raster maps without hitting open files limit and the /file/ option to avoid hitting the size limit of command line arguments.

Ah, yes, I totally forgot about that flag. Thanks (also Anna and Veronica). Perhaps somebody with write permission can correct his in the manual page?

Markus M
>
> _______________________________________________
> grass-dev mailing list
> grass-dev@lists.osgeo.org <mailto:grass-dev@lists.osgeo.org>
> https://lists.osgeo.org/mailman/listinfo/grass-dev

On Mon, Jan 15, 2018 at 5:27 PM, Paulo van Breugel
<p.vanbreugel@gmail.com> wrote:

On 1/15/18 4:39 PM, Markus Metz wrote:

...

the manual is wrong:
Use the file option to analyze large amount of raster maps without hitting
open files limit and the size limit of command line arguments.

must be
Use the -z flag to analyze large amount of raster maps without hitting open
files limit and the file option to avoid hitting the size limit of command
line arguments.

Ah, yes, I totally forgot about that flag. Thanks (also Anna and Veronica).
Perhaps somebody with write permission can correct his in the manual page?

Done in r72081: please verify. I think it could still be better.
https://trac.osgeo.org/grass/changeset/72081

Backport: Once we agree on the wording it should go into the release
branches as well.

markusN

Hi all,

···

2018-01-15 20:11 GMT+01:00 Markus Neteler <neteler@osgeo.org>:

On Mon, Jan 15, 2018 at 5:27 PM, Paulo van Breugel
<p.vanbreugel@gmail.com> wrote:

On 1/15/18 4:39 PM, Markus Metz wrote:

the manual is wrong:
Use the file option to analyze large amount of raster maps without hitting
open files limit and the size limit of command line arguments.

must be
Use the -z flag to analyze large amount of raster maps without hitting open
files limit and the file option to avoid hitting the size limit of command
line arguments.

Ah, yes, I totally forgot about that flag. Thanks (also Anna and Veronica).
Perhaps somebody with write permission can correct his in the manual page?

Done in r72081: please verify. I think it could still be better.
https://trac.osgeo.org/grass/changeset/72081

AFAIU, the file option does not necessarily make the process slower (it just replaces a list of comma separated maps in a command by a file), it is the -z flag that does that, because it will open and close files once and again, instead of keeping them open (Am I right, @markusM?)

Moreover, I checked r.hants and r.series.lwr, which are based on r.series and have the same file option and -z flag. They all have different wording. Maybe we could agree here on the 3 of them. From my understanding, they are not entirely correct. Here the text snippets:

r.hants:
“Use the file option to analyze large amount of raster maps without hitting open files limit and the size limit of command line arguments. The computation is slower than with the input option method. For every single row in the output map(s) all input maps are opened and closed. The amount of RAM will rise linearly with the number of specified input maps. The input and file options are mutually exclusive. The option input is a comma separated list of raster map names and the option file is a text file with a new line separated list of raster map names. Note that the order of maps in one option or the other is very important.”

r.series.lwr:
“Use the -z flag to analyze large amounts of raster maps without hitting the open files limit and the size limit of command line arguments. This will however increase the processing time. For every single row in the output map(s) all input maps are opened and closed. The amount of RAM used will rise linearly with the number of specified input maps.The input and file options are mutually exclusive. Input is a text file with a new line separated list of raster map names”

I can change them accordingly once we understand correctly the effect of both file option and -z flag. Which other modules share this -z flag?

best,

Vero

On Mon, Jan 15, 2018 at 8:41 PM, Veronica Andreo <veroandreo@gmail.com> wrote:

Hi all,

2018-01-15 20:11 GMT+01:00 Markus Neteler <neteler@osgeo.org>:

On Mon, Jan 15, 2018 at 5:27 PM, Paulo van Breugel
<p.vanbreugel@gmail.com> wrote:

On 1/15/18 4:39 PM, Markus Metz wrote:

the manual is wrong:
Use the file option to analyze large amount of raster maps without hitting
open files limit and the size limit of command line arguments.

must be
Use the -z flag to analyze large amount of raster maps without hitting open
files limit and the file option to avoid hitting the size limit of command
line arguments.

Ah, yes, I totally forgot about that flag. Thanks (also Anna and Veronica).
Perhaps somebody with write permission can correct his in the manual page?

Done in r72081: please verify. I think it could still be better.
https://trac.osgeo.org/grass/changeset/72081

AFAIU, the file option does not necessarily make the process slower (it just replaces a list of comma separated maps in a command by a file), it is the -z flag that does that, because it will open and close files once and again, instead of keeping them open (Am I right, @markusM?)

Moreover, I checked r.hants and r.series.lwr, which are based on r.series and have the same file option and -z flag. They all have different wording. Maybe we could agree here on the 3 of them. From my understanding, they are not entirely correct. Here the text snippets:

r.hants:
“Use the file option to analyze large amount of raster maps without hitting open files limit and the size limit of command line arguments. The computation is slower than with the input option method. For every single row in the output map(s) all input maps are opened and closed. The amount of RAM will rise linearly with the number of specified input maps. The input and file options are mutually exclusive. The option input is a comma separated list of raster map names and the option file is a text file with a new line separated list of raster map names. Note that the order of maps in one option or the other is very important.”

r.series.lwr:
“Use the -z flag to analyze large amounts of raster maps without hitting the open files limit and the size limit of command line arguments. This will however increase the processing time. For every single row in the output map(s) all input maps are opened and closed. The amount of RAM used will rise linearly with the number of specified input maps.The input and file options are mutually exclusive. Input is a text file with a new line separated list of raster map names”

I can change them accordingly once we understand correctly the effect of both file option and -z flag.

The manual of r.series is IMHO now correct. The file option avoids hitting the size limit of command line arguments, and the -z flag avoids hitting the limit of the number of open files.

In all three manuals, the sections starting with “The maximum number of raster maps” (in r.series “Number of raster maps to be processed”) should be identical. Ideally, the manual of r.hants or r.series.lwr would be used as template, adding the latest change of r.series with regard to the file option and the -z flag, then sync all manuals.

Which other modules share this -z flag?

r.series.accumulate

t.rast.series and t.rast.aggregate do not have the -z flag, but should probably have it because they call r.series.

Markus M

best,
Vero

Perfect! I’ll sync them, and also r.series.accumulate :slight_smile:

cheers,
Vero

···

2018-01-15 22:00 GMT+01:00 Markus Metz <markus.metz.giswork@gmail.com>:

On Mon, Jan 15, 2018 at 8:41 PM, Veronica Andreo <veroandreo@gmail.com> wrote:

Hi all,

2018-01-15 20:11 GMT+01:00 Markus Neteler <neteler@osgeo.org>:

On Mon, Jan 15, 2018 at 5:27 PM, Paulo van Breugel
<p.vanbreugel@gmail.com> wrote:

On 1/15/18 4:39 PM, Markus Metz wrote:

the manual is wrong:
Use the file option to analyze large amount of raster maps without hitting
open files limit and the size limit of command line arguments.

must be
Use the -z flag to analyze large amount of raster maps without hitting open
files limit and the file option to avoid hitting the size limit of command
line arguments.

Ah, yes, I totally forgot about that flag. Thanks (also Anna and Veronica).
Perhaps somebody with write permission can correct his in the manual page?

Done in r72081: please verify. I think it could still be better.
https://trac.osgeo.org/grass/changeset/72081

AFAIU, the file option does not necessarily make the process slower (it just replaces a list of comma separated maps in a command by a file), it is the -z flag that does that, because it will open and close files once and again, instead of keeping them open (Am I right, @markusM?)

Moreover, I checked r.hants and r.series.lwr, which are based on r.series and have the same file option and -z flag. They all have different wording. Maybe we could agree here on the 3 of them. From my understanding, they are not entirely correct. Here the text snippets:

r.hants:
“Use the file option to analyze large amount of raster maps without hitting open files limit and the size limit of command line arguments. The computation is slower than with the input option method. For every single row in the output map(s) all input maps are opened and closed. The amount of RAM will rise linearly with the number of specified input maps. The input and file options are mutually exclusive. The option input is a comma separated list of raster map names and the option file is a text file with a new line separated list of raster map names. Note that the order of maps in one option or the other is very important.”

r.series.lwr:
“Use the -z flag to analyze large amounts of raster maps without hitting the open files limit and the size limit of command line arguments. This will however increase the processing time. For every single row in the output map(s) all input maps are opened and closed. The amount of RAM used will rise linearly with the number of specified input maps.The input and file options are mutually exclusive. Input is a text file with a new line separated list of raster map names”

I can change them accordingly once we understand correctly the effect of both file option and -z flag.

The manual of r.series is IMHO now correct. The file option avoids hitting the size limit of command line arguments, and the -z flag avoids hitting the limit of the number of open files.

In all three manuals, the sections starting with “The maximum number of raster maps” (in r.series “Number of raster maps to be processed”) should be identical. Ideally, the manual of r.hants or r.series.lwr would be used as template, adding the latest change of r.series with regard to the file option and the -z flag, then sync all manuals.

Which other modules share this -z flag?

r.series.accumulate

t.rast.series and t.rast.aggregate do not have the -z flag, but should probably have it because they call r.series.

Markus M

best,
Vero

On Mon, Jan 15, 2018 at 10:20 PM, Veronica Andreo <veroandreo@gmail.com> wrote:

Perfect! I'll sync them, and also r.series.accumulate :slight_smile:

Great, and then one cumulative manual backport and release 7.4.0 :slight_smile:

best,
markusN

Markus Metz wrote:

Use the -z flag to analyze large amount of raster maps without hitting open
files limit

Or alternatively, see if you can increase the limit.

"ulimit -n" displays the soft limit, "ulimit -Hn" displays the hard
limit. If they aren't the same, you can use "ulimit -n <num>" to
increase the soft limit up to the hard limit (on my system, the soft
limit is 1024, the hard limit is 4096).

If you're at the hard limit, you can ask your sysadmin to increase it,
typically via /etc/security/limits.conf.

Increasing the limit won't have the performance cost of -z.

--
Glynn Clements <glynn@gclements.plus.com>