[GRASS-dev] r.in.gdal and xargs

Dear all,

I am trying to import time series data using a combination of xargs an r.in gdal:

cat current_datasets_age.txt | awk -v U=“myunits” -v N=“name” ‘{ print "r.in.gdal input=$1 “.bil output=” $2 “_tmp title="” N " in " U " at " $3 “" --o --q -o\0”}’ | xargs -P 10 -I {} -0 bash -c {}

In some cases (5 out of ~20k) I get:

ERROR: Unable to make mapset element .tmp/HOST (/grassdata/ETRS_33N/timseries/.tmp): File exists

It does not happen regularly. Can there be race conditions?

I am using GRASS 7.2 (r70188) on Ubuntu 14.04 LTS.

I am grateful for any hint.

Kind regards,

Stefan

Hello Stefan,

just a suggestion, inspired from how I usually use xargs:

did you try to put the arguments list in a file (say
my_ringdal_args.txt), then run :
xargs -a my_ringdal_args.txt -P10 -n 3 r.in.gdal

(-n 3 indicating xargs to read 3 arguments for each call to r.in.gdal,
i.e. input= output= title=)

Hope this helps!
Vincent.

Le lundi 06 mars 2017 à 11:02 +0000, Blumentrath, Stefan a écrit :

Dear all,

I am trying to import time series data using a combination of xargs an
r.in gdal:

cat current_datasets_age.txt | awk -v U="myunits" -v N="name" '{ print
"r.in.gdal input=$1 ".bil output=" $2 "_tmp title=\"" N " in " U " at
" $3 "\" --o --q -o\0"}' | xargs -P 10 -I {} -0 bash -c {}

In some cases (5 out of ~20k) I get:

ERROR: Unable to make mapset element .tmp/HOST
(/grassdata/ETRS_33N/timseries/.tmp): File exists

It does not happen regularly. Can there be race conditions?

I am using GRASS 7.2 (r70188) on Ubuntu 14.04 LTS.

I am grateful for any hint.

Kind regards,

Stefan

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

Thanks Vincent for your swift reply.

In principle the pipe to xargs works, as 99% of the data is imported properly. And also in the case were I get roors, the command is started properly...

Thus, I suspect that r.in.gdal can have issues when run in parallel; or the creation of temp files, in particular the creation of the .tmp/HOST directory within the mapset...

Cheers
Stefan

Yes, perhaps something to do with r.in.gdal temp files handling.

And what if you try to reduce P value ?
well, of course it will slow down the bulk import...

Le lundi 06 mars 2017 à 14:02 +0000, Blumentrath, Stefan a écrit :

Thanks Vincent for your swift reply.

In principle the pipe to xargs works, as 99% of the data is imported properly. And also in the case were I get roors, the command is started properly...

Thus, I suspect that r.in.gdal can have issues when run in parallel; or the creation of temp files, in particular the creation of the .tmp/HOST directory within the mapset...

Cheers
Stefan

I can parallelize on a higher level, so i can set P to 1 here.
If it is a bug I can open a ticket...

________________________________________
Von: Vincent Bain [bain@toraval.fr]
Gesendet: Montag, 6. März 2017 15:07
An: Blumentrath, Stefan
Cc: GRASS developers list (grass-dev@lists.osgeo.org)
Betreff: Re: [GRASS-dev] r.in.gdal and xargs

Yes, perhaps something to do with r.in.gdal temp files handling.

And what if you try to reduce P value ?
well, of course it will slow down the bulk import...

Le lundi 06 mars 2017 à 14:02 +0000, Blumentrath, Stefan a écrit :

Thanks Vincent for your swift reply.

In principle the pipe to xargs works, as 99% of the data is imported properly. And also in the case were I get roors, the command is started properly...

Thus, I suspect that r.in.gdal can have issues when run in parallel; or the creation of temp files, in particular the creation of the .tmp/HOST directory within the mapset...

Cheers
Stefan

Hi again,

Now, I investigated a bit more:

When I use just one process in xargs (P 1) I get no error. r.mapcalc works nicely together with the same xargs approach and 10 processes.
I think this can be tracked down to the creation of the directory .tmp/HOST which seems to fail if more processes try to create it at (almost) the same time...

The temp-file creation behavior seems affects also other modules like e.g. r.null.

Opened a ticket on trac:
https://trac.osgeo.org/grass/ticket/3309

Cheers
Stefan

-----Original Message-----
From: grass-dev [mailto:grass-dev-bounces@lists.osgeo.org] On Behalf Of Blumentrath, Stefan
Sent: mandag 6. mars 2017 15.57
To: Vincent Bain <bain@toraval.fr>
Cc: GRASS developers list (grass-dev@lists.osgeo.org) <grass-dev@lists.osgeo.org>
Subject: Re: [GRASS-dev] r.in.gdal and xargs

I can parallelize on a higher level, so i can set P to 1 here.
If it is a bug I can open a ticket...

________________________________________
Von: Vincent Bain [bain@toraval.fr]
Gesendet: Montag, 6. März 2017 15:07
An: Blumentrath, Stefan
Cc: GRASS developers list (grass-dev@lists.osgeo.org)
Betreff: Re: [GRASS-dev] r.in.gdal and xargs

Yes, perhaps something to do with r.in.gdal temp files handling.

And what if you try to reduce P value ?
well, of course it will slow down the bulk import...

Le lundi 06 mars 2017 à 14:02 +0000, Blumentrath, Stefan a écrit :

Thanks Vincent for your swift reply.

In principle the pipe to xargs works, as 99% of the data is imported properly. And also in the case were I get roors, the command is started properly...

Thus, I suspect that r.in.gdal can have issues when run in parallel; or the creation of temp files, in particular the creation of the .tmp/HOST directory within the mapset...

Cheers
Stefan

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev