[GRASS-dev] Copying Landsat MTL file(s) inside the cell_misc directory -- Python scripting

Dear list,

whose creation is the the import_landsat.py script featured in
<http://grasswiki.osgeo.org/wiki/LANDSAT#Automated_data_import&gt;?

I would like to expand it a bit in copying the respective metadata (*MTL.txt)
files over to the (respective) "cell_misc" directory? I guess that using the
"shutil" is the way to go, like:

# in the beginning
import shutil

# later on... but where?
env=grass.gisenv()
shutil.copy ( metafile, env['GISDBASE'] + '/' + env['LOCATION_NAME'] + '/' +
env['MAPSET'] + '/cell_misc')

Is that correct? It seems it works interactively. I am ignorant, however,
about where to insert the above instructions. I wrongly tried within the
"import_tifs()" function, inside the 1st for loop, right after the
creation of
a Mapset of interest.

Thanks for any hints, Nikos

Nikos Alexandris wrote:

whose creation is the the import_landsat.py script featured in
<http://grasswiki.osgeo.org/wiki/LANDSAT#Automated_data_import&gt;?

This question is still valid :-). If my 2 proposed changes don't do something
seriously wrong, I would like to update the script in the wiki.

I would like to expand it a bit in copying the respective metadata
(*MTL.txt) files over to the (respective) "cell_misc" directory? I guess
that using the "shutil" is the way to go, like:

1) I managed to integrate another small function (thanks to Luca D for off-
list support), actually a clone of one small function already found in the
script. It works like:

def copy_metafile(mapset):

    # get the metafile
    try:
        metafile = glob.glob(mapset + '/*MTL.txt')[0]
        print '\nThe identified metadata file is:\n %s\n' %
metafile.split('/')[1]

    except IndexError:
        return

    # get environment variables & define path to "cell_misc"
    gisenv=grass.gisenv()
    CELL_MISC_DIR = gisenv['GISDBASE'] + '/' + gisenv['LOCATION_NAME'] + '/' +
gisenv['MAPSET'] + '/cell_misc'
    print 'The identified metadata file will be copied at:\n %s\n' %
CELL_MISC_DIR

    # copy the metadata file
    shutil.copy (metafile, CELL_MISC_DIR)

I call this small function in the end of the "core" function
"import_tifs(mapset)" to copy the MTL file in the respective "cell_misc"
directory, i.e.:

copy_metafile(mapset)

2) The current script checks inside the MTL file for the string
'ACQUISITION_DATE' to get the date of acquisition. The code (lines 56-57) goes
like:

    if 'ACQUISITION_DATE' in line:
        result['date'] = line.strip().split('=')[1].strip()

Since newer MTL files contain the string 'DATE_ACQUIRED', another check is
required. The script certainly works by adding an extra if statement, like:

    if 'DATE_ACQUIRED' in line:
        result['date'] = line.strip().split('=')[1].strip()

Nevertheless, I want to merge the two "if" statements in one. So, my attempt
goes like

    if 'DATE_ACQUIRED' or 'ACQUISITION_DATE' in line:
        result['date'] = line.strip().split('=')[1].strip()

This is wrong. I have no idea, though, on why the former "simple" if
statements return a single line, while the latter if statement, which contains
the "or" operator, does return the complete MTL file content!

And, finally, the following works:

    if 'DATE_ACQUIRED' in line or 'ACQUISITION_DATE' in line:
        result['date'] = line.strip().split('=')[1].strip()

I know the following is a pure python question -- still, can someone shed some
light on why is there such a difference between

    if 'DATE_ACQUIRED' or 'ACQUISITION_DATE' in line:

and

    if 'DATE_ACQUIRED' in line or 'ACQUISITION_DATE' in line:

?

Thanks, Nikos

Hi Nikos,

On Tue, May 14, 2013 at 1:25 PM, Nikos Alexandris
<nik@nikosalexandris.net> wrote:

I know the following is a pure python question -- still, can someone shed some
light on why is there such a difference between

    if 'DATE_ACQUIRED' or 'ACQUISITION_DATE' in line:

and

    if 'DATE_ACQUIRED' in line or 'ACQUISITION_DATE' in line:

?

Your error is due to the precedence of the operator, what it is really
written in the first row is:

if ( ('DATE_ACQUIRED') or ('ACQUISITION_DATE' in line) ):

Therefore the first statement 'DATE_ACQUIRED' is always True because
is a string that it is not empty.
Let me define a function like:

def istrue(obj):

... if obj:
... return True
... return False
...

Now If you try with an empty string you have:

istrue('')

False

If you try with a string that contains something you get:

istrue('DATE_ACQUIRED')

True

Therefore the second is correct because you check if the first string
is contained or the second string is contained in the line.

An easy way to debug this kind of problems is to use python debugger
[1] (possibly the ipython dubugger! [2]) in the line before, like:

import ipdb; ipdb.set_trace() # or import pdb; pdb.set_trace()
# set_trace will open a terminal to debug your program from this point...
if 'DATE_ACQUIRED' in line or 'ACQUISITION_DATE' in line:
[etc]

Have fun with python! :slight_smile:

All the best.

Pietro

[0] http://www.isthe.com/chongo/tech/comp/c/c-precedence.html
[1] http://docs.python.org/2/library/pdb.html
[2] https://pypi.python.org/pypi/ipdb

Hi Pietro,

thank you for taking the time to clarify this -- Perfect!
Note to self: use parentheses in complex if statements (and not only!).

Nikos

--%<---
On Tuesday 14 of May 2013 14:43:55 Pietro wrote:

Hi Nikos,

On Tue, May 14, 2013 at 1:25 PM, Nikos Alexandris

z<nik@nikosalexandris.net> wrote:
> I know the following is a pure python question -- still, can someone shed
> some light on why is there such a difference between
>
> if 'DATE_ACQUIRED' or 'ACQUISITION_DATE' in line:
> and
>
> if 'DATE_ACQUIRED' in line or 'ACQUISITION_DATE' in line:
> ?

Your error is due to the precedence of the operator, what it is really
written in the first row is:

if ( ('DATE_ACQUIRED') or ('ACQUISITION_DATE' in line) ):

Therefore the first statement 'DATE_ACQUIRED' is always True because
is a string that it is not empty.

Let me define a function like:
>>> def istrue(obj):
... if obj:
... return True
... return False
...

Now If you try with an empty string you have:
>>> istrue('')

False

If you try with a string that contains something you get:
>>> istrue('DATE_ACQUIRED')

True

Therefore the second is correct because you check if the first string
is contained or the second string is contained in the line.

An easy way to debug this kind of problems is to use python debugger
[1] (possibly the ipython dubugger! [2]) in the line before, like:

import ipdb; ipdb.set_trace() # or import pdb; pdb.set_trace()
# set_trace will open a terminal to debug your program from this point...
if 'DATE_ACQUIRED' in line or 'ACQUISITION_DATE' in line:
[etc]

Have fun with python! :slight_smile:

All the best.

Pietro

[0] http://www.isthe.com/chongo/tech/comp/c/c-precedence.html
[1] http://docs.python.org/2/library/pdb.html
[2] https://pypi.python.org/pypi/ipdb

Pietro wrote:

> I know the following is a pure python question -- still, can someone shed some
> light on why is there such a difference between
>
> if 'DATE_ACQUIRED' or 'ACQUISITION_DATE' in line:
>
> and
>
> if 'DATE_ACQUIRED' in line or 'ACQUISITION_DATE' in line:
>
> ?

Your error is due to the precedence of the operator, what it is really
written in the first row is:

if ( ('DATE_ACQUIRED') or ('ACQUISITION_DATE' in line) ):

It's not just the precedence, but also the semantics of the "or" and
"in" operators. If the first expression had been parenthesised
according to the presumed intent of the expression, i.e.:

  if ('DATE_ACQUIRED' or 'ACQUISITION_DATE') in line:

the result would have been equivalent to:

  if 'DATE_ACQUIRED' in line:

as the "or" operator returns the first argument if it is considered
true and the second argument otherwise. As all non-empty strings are
considered true, the "or" operator would have simply returned the
left-hand string, which would be used as the argument to the "in"
operator.

In general: False, None, 0, 0.0, the empty string and any empty
container (list, tuple, set, dictionary) are considered false, while
most other values are considered true (classes can customise their
behaviour by defining __nonzero__ or __len__ methods).

The "in" operator uses the __contains__ method of the right-hand
argument, i.e. "x in y" evalutes "y.__contains__(x)". For lists,
tuples and sets, "x in y" is true if x is an element of the list or
set; for dictionaries, "x in y" is true if x is a key in the
dictionary y. For strings, the expression is true if the left-hand
argument is a substring of the right-hand argument.

If you want to find out whether any one of a number of candidates are
contained within an object, you need to explicitly iterate over them,
e.g.:

  if any(x in line for x in ('DATE_ACQUIRED', 'ACQUISITION_DATE')):

--
Glynn Clements <glynn@gclements.plus.com>

Pietro wrote:

> > I know the following is a pure python question -- still, can someone
> > shed some light on why is there such a difference between
> >
> > if 'DATE_ACQUIRED' or 'ACQUISITION_DATE' in line:
> > and
> >
> > if 'DATE_ACQUIRED' in line or 'ACQUISITION_DATE' in line:
> > ?
>
> Your error is due to the precedence of the operator, what it is really
> written in the first row is:

> if ( ('DATE_ACQUIRED') or ('ACQUISITION_DATE' in line) ):

Glynn wrote:

It's not just the precedence, but also the semantics of the "or" and
"in" operators. If the first expression had been parenthesised
according to the presumed intent of the expression, i.e.:

  if ('DATE_ACQUIRED' or 'ACQUISITION_DATE') in line:

the result would have been equivalent to:

  if 'DATE_ACQUIRED' in line:

as the "or" operator returns the first argument if it is considered
true and the second argument otherwise. As all non-empty strings are
considered true, the "or" operator would have simply returned the
left-hand string, which would be used as the argument to the "in"
operator.

This explains why time-stamping did not work correctly, in the script
mentioned in the first post of this thread, when parsing an old(er) Landsat
metadata file (i.e. LE71610762013070PFS00_MTLold.txt which contains the string
'ACQUISITION_DATE').

In general: False, None, 0, 0.0, the empty string and any empty
container (list, tuple, set, dictionary) are considered false, while
most other values are considered true (classes can customise their
behaviour by defining __nonzero__ or __len__ methods).

The "in" operator uses the __contains__ method of the right-hand
argument, i.e. "x in y" evalutes "y.__contains__(x)". For lists,
tuples and sets, "x in y" is true if x is an element of the list or
set; for dictionaries, "x in y" is true if x is a key in the
dictionary y. For strings, the expression is true if the left-hand
argument is a substring of the right-hand argument.

If you want to find out whether any one of a number of candidates are
contained within an object, you need to explicitly iterate over them,
e.g.:

  if any(x in line for x in ('DATE_ACQUIRED', 'ACQUISITION_DATE')):

Works with any MTL file now, i.e. with both the LE71610762013070PFS00_MTL.txt
and the old(er) LE71610762013070PFS00_MTLold.txt Landsat metadata files.

Thanks, Nikos

On Wednesday 15 of May 2013 21:35:41 Nikos Alexandris wrote:
..

Works with any MTL file now, i.e. with both the
LE71610762013070PFS00_MTL.txt and the old(er)
LE71610762013070PFS00_MTLold.txt Landsat metadata files.

I slightly modified the "import_landsat.py" script in the GRASS-Wiki:
<http://grasswiki.osgeo.org/wiki/LANDSAT#Automated_data_import&gt;

to be aware about Landsat8 scenes. If there are no objections, I'd like to
"update" it. Here the modified version:
<http://grasswiki.osgeo.org/wiki/User:NikosA/Landsat&gt;\. Even better, are there
corrections?

Wouldn't a dedicated import script for Landsat products be useful? Like
r.in.aster for example -- though, maybe it would be better called
i.landsat.import?

Thanks, Nikos

Nikos

Wouldn't a dedicated import script for Landsat products be useful? Like
r.in.aster for example -- though, maybe it would be better called
i.landsat.import?

see also r.in.modis

I think r.in.* is the correct pattern to follow.

regards,
Hamish