[GRASS-dev] [GRASS GIS] #2326: Command functions in grass.script.core miss a correct error reporting

#2326: Command functions in grass.script.core miss a correct error reporting
-------------------------+--------------------------------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.1.0
Component: Python | Version: svn-trunk
Keywords: script | Platform: All
      Cpu: Unspecified |
-------------------------+--------------------------------------------------
There is a lot of functions in grass.core which can run a GRASS module.
One is allowing this the other that but none of them is actually providing
the convenient interface to a (the most?) common case where you not only
want stdout as a string (and perhaps stdin as a string too) but also you
want an error message to be reported in program(mer)-friendly way.

The later is actually a motivation for this ticket because I see this as a
critical issue which is very dangerous for
[https://gis.stackexchange.com/questions/99773/problems-running-grass-
mapcalc/ beginners] (writing scripts with `run_command` or others) and not
checking the return code or stderr with an expectation that the function
will report the error (in Python sense, thus raising an exception). And
this issue is valid also for advanced GRASS Python programmers/users
because there is a need to still check output of each command and report
error manually. Moreover, the current state goes against the philosophy of
C library which takes the burden of dealing with errors from the
programmer.

The fact is that you then and up with implementing your own
`start_command` wrappers. For example,
[http://trac.osgeo.org/grass/browser/grass/trunk/gui/wxpython/animation/provider.py#L734
animation tool] uses its own function based on `start_command` returning
return code (`int`), stdout (`str`), stderr (`str`):

{{{
#!python
def read2_command(*args, **kwargs):
     kwargs['stdout'] = gcore.PIPE
     kwargs['stderr'] = gcore.PIPE
     ps = gcore.start_command(*args, **kwargs)
     stdout, stderr = ps.communicate()
     return ps.returncode, stdout, stderr
}}}

I recently used this code inside some function which gets stdin as string,
uses stdout and in case of non-zero return code throws/raises an exception
(`RuntimeError`) with error message containing module name and stderr:

{{{
#!python
proc = gcore.start_command('m.proj', input='-', separator=' , ',
                            flags='od',
                            stdin=gcore.PIPE,
                            stdout=gcore.PIPE,
                            stderr=gcore.PIPE)
proc.stdin.write(proj_in)
proc.stdin.close()
proc.stdin = None
proj_out, errors = proc.communicate()
if proc.returncode:
     raise RuntimeError("m.proj error: %s" % errors)
}}}

I would probably just commit the combination of the code samples above as
a new function but I want to be sure that it will be right this time. I
don't know whether my requirements are the same of the majority and
finally, I don't know what name to choose for the new function since it
seems that functions in `grass.script.core` already used every possible
name. Also, I'm not sure what is the PyGRASS's answer to this problem.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2326&gt;
GRASS GIS <http://grass.osgeo.org>

#2326: Command functions in grass.script.core miss a correct error reporting
-------------------------+--------------------------------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.1.0
Component: Python | Version: svn-trunk
Keywords: script | Platform: All
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by glynn):

Replying to [ticket:2326 wenzeslaus]:

> The later is actually a motivation for this ticket because I see this as
a critical issue which is very dangerous for
[https://gis.stackexchange.com/questions/99773/problems-running-grass-
mapcalc/ beginners] (writing scripts with `run_command` or others) and not
checking the return code or stderr with an expectation that the function
will report the error (in Python sense, thus raising an exception).

Personally, I'd suggest just modifying the existing functions to raise
exceptions if the command fails.

Scripts which want to do their own error handling can just use
start_command() directly; the higher-level functions are only trivial
wrappers around this.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2326#comment:1&gt;
GRASS GIS <http://grass.osgeo.org>

#2326: Command functions in grass.script.core miss a correct error reporting
-------------------------+--------------------------------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.1.0
Component: Python | Version: svn-trunk
Keywords: script | Platform: All
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by zarch):

Replying to [ticket:2326 wenzeslaus]:
> Also, I'm not sure what is the PyGRASS's answer to this problem.

[[BR]]

At the moment in the Module class the returncode is not checked at all.

{{{
#!python
>>> from subprocess import PIPE
>>> from grass.pygrass.modules import Module
>>>
>>> greg = Module('g.region', flags='p', rast='ELEV', stdout_=PIPE,
stderr_=PIPE)
>>> greg.popen.returncode
1
>>> greg.outputs.stderr
'ERROR: Raster map <ELEV> not found\n'
}}}

[[BR]]

I can modify the Module class behaviour changing the default parameters
for stdout_ and stderr_ to PIPE (at the moment are None), and raise a
RuntimeError if the returncode is not 0.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2326#comment:2&gt;
GRASS GIS <http://grass.osgeo.org>

#2326: Command functions in grass.script.core miss a correct error reporting
-------------------------+--------------------------------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.1.0
Component: Python | Version: svn-trunk
Keywords: script | Platform: All
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by glynn):

Replying to [comment:2 zarch]:

> I can modify the Module class behaviour changing the default parameters
for stdout_ and stderr_ to PIPE

That's probably unwise. It would force the caller to either explicitly set
them to None or to consume the output, e.g. via the .communicate() method.

I wouldn't be surprised if many scripts do neither; the result being that
the call works fine unless it writes more than a buffer's worth of output,
in which case it will just hang.

I also wouldn't be surprised if scripts try to re-implement logic similar
to .communicate() but get it wrong. You need to use either threads or non-
blocking I/O. If you perform a blocking read on either pipe and more than
a buffer's worth of data is written to the other pipe, you get deadlock
(i.e. the script hangs).

Also, when both stdout and stderr are associated with different pipes,
it's impossible to reconstruct the "normal" output because there's no way
to determine the relative order. Meaning that you can't e.g. associate an
error message with any particular line of progress output. This is why the
support for "stdout=PIPE, stderr=STDOUT" exists.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2326#comment:3&gt;
GRASS GIS <http://grass.osgeo.org>

#2326: Command functions in grass.script.core miss a correct error reporting
-------------------------+--------------------------------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.1.0
Component: Python | Version: svn-trunk
Keywords: script | Platform: All
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by glynn):

Replying to [ticket:2326 wenzeslaus]:

I missed this issue in my previous reply:

>
{{{
proc.stdin.write(proj_in)
proc.stdin.close()
proc.stdin = None
proj_out, errors = proc.communicate()
}}}

You need to use
{{{
proj_out, errors = proc.communicate(proj_in)
}}}

If you try to .write() the data to stdin, and the child process prints
progress/error/etc messages as it processes the input, you'll get deadlock
if the amount of data written to either pipe exceeds the amount which will
fit in the pipe's buffer.

The .communicate() method uses either non-blocking I/O (Unix) or threads
(Windows) in order to consume output as it becomes available, rather than
waiting until all data has been written to stdin and hoping that the
pipe's buffer can contain everything written to stdout/stderr in the
meantime.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2326#comment:4&gt;
GRASS GIS <http://grass.osgeo.org>

#2326: Command functions in grass.script.core miss a correct error reporting
-------------------------+--------------------------------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.1.0
Component: Python | Version: svn-trunk
Keywords: script | Platform: All
      Cpu: Unspecified |
-------------------------+--------------------------------------------------
Changes (by wenzeslaus):

  * priority: normal => major

Comment:

Here is the usage of `*_command` functions from `grass.script.core` and
how their output is checked. The statistics is alarming, so raising
priority.

|| function || checks rc || does not check ||
|| `run_command` || 109+33 || 285+501 ||
|| `write_command` || 7+2 || 17+33 ||

Both `run_command` (just runs the command synchronously) and
`write_command` (puts string to stdin) returns a subprocess return code
but most of the code does not check it (0.7 in trunk and 0.9 in addons).
We must expect that user scripts might be same or worse than addons.

|| function || usages ||
|| `start_command` || 4+2 ||
|| `pipe_command` || 24+1 ||
|| `feed_command` || 18+1 ||
|| `read_command` || 119+76 ||
|| `parse_command` || 66+9 ||

For `start_command`, `pipe_command` and `feed_command` it is hard to tell
if the return code is checked without really looking to the source code.

In case of `read_command` (returns stdout as string) and `parse_command`
(applies function to `read_command` result) we know that the return code
is not checked since the function should do this and it does not. In case
of `parse_command` the error might be at least generated during the
parsing. In case of `read_command`, the standard output might checked too.
In case the checking accepts things like empty sting or `None` or no error
is generated during parsing, it is completely the same as with
`run_command`.

Note that results are approximate, e.g., `(grass\.)?` might not be
completely correct. Used grep commands:

{{{
grep --exclude="*svn*" --exclude-dir="*dist.*" -IrnE
'(grass\.)?run_command' . | grep -E "(if | = ).*run_command"
grep --exclude="*svn*" --exclude-dir="*dist.*" -IrnE
'^\s*(grass\.)?run_command' .
grep --exclude="*svn*" --exclude-dir="*dist.*" -IrnE 'pipe_command' .
grep --exclude="*svn*" --exclude-dir="*dist.*" -IrnE 'feed_command' .
grep --exclude="*svn*" --exclude-dir="*dist.*" -IrnE 'read_command' .
grep --exclude="*svn*" --exclude-dir="*dist.*" -IrnE 'parse_command' .
grep --exclude="*svn*" --exclude-dir="*dist.*" -IrnE 'write_command' . |
grep -v "test_rcategory_doctest" | grep -E '(if | = ).*write_command'
grep --exclude="*svn*" --exclude-dir="*dist.*" -IrnE 'write_command' . |
grep -v "test_rcategory_doctest" | grep -vE '(if | = ).*write_command'
}}}

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2326#comment:5&gt;
GRASS GIS <http://grass.osgeo.org>

#2326: Command functions in grass.script.core miss a correct error reporting
--------------------------------+-------------------------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.1.0
Component: Python | Version: svn-trunk
Keywords: script, exceptions | Platform: All
      Cpu: Unspecified |
--------------------------------+-------------------------------------------
Changes (by wenzeslaus):

  * keywords: script => script, exceptions

Comment:

Here are my suggestions for changes in `grass.script.core`.

The `read_command` function is similar to
[https://docs.python.org/2/library/subprocess.html#subprocess.check_output
subprocess.check_output] function, so I used the implementation from
Python subprocess
([http://hg.python.org/cpython/file/2145593d108d/Lib/subprocess.py#l508
cpython: Lib/subprocess.py]). It check the `stderr` parameter, uses
`communicate()` and `poll()` and raises on non-zero return code. The main
difference from `subprocess.check_output` (expect from GRASS parameter
handing) is that stderr is stored to exception to provide a detailed
message at the right place.

{{{
#!python
def read_command(*args, **kwargs):
     """Run command and return its output."""
     # implemenation inspired by subprocess.check_output() function
     if 'stderr' in kwargs:
         raise ValueError('stderr argument not allowed, it would be
overridden.')
     kwargs['stderr'] = PIPE
     process = pipe_command(*args, **kwargs)
     output, errors = process.communicate()
     returncode = process.poll()
     if returncode:
         cmd = kwargs.get("args")
         if cmd is None:
             cmd = args[0]
         raise CalledCommandError(returncode, cmd, errors=errors)
     return output
}}}

Note that `read_command()` is used by `parse_command()` function and that
`pipe_command()` might be replaced by `start_command()` and `stdout=PIPE`.

`write_command()` is similar to `read_command()` but without stdin and win
stdout as a string:

{{{
#!python
def write_command(*args, **kwargs):
     """!Passes all arguments to feed_command, with the string specified
     by the 'stdin' argument fed to the process' stdin.
     """
     # implemenation inspired by subprocess.check_output() function
     if 'stdin' not in kwargs:
         raise ValueError('stdin argument is required.')
     if 'stderr' in kwargs:
         raise ValueError('stderr argument not allowed, it would be
overridden.')
     stdin = kwargs['stdin'] # get a string for stdin
     kwargs['stdin'] = PIPE # to be able to send data to stdin
     kwargs['stderr'] = PIPE
     process = start_command(*args, **kwargs)
     unused, errors = process.communicate(stdin)
     returncode = process.poll()
     if returncode:
         cmd = kwargs.get("args")
         if cmd is None:
             cmd = args[0]
         raise CalledCommandError(returncode, cmd, erros=errors)
     # subprocess.check_call() function returns 0 for success
     # but this would create only confusion since we don't return for
failure
}}}

`write_read_command()` is a new proposed function which is a combination
of `write_command()` and `read_command()` functions.

{{{
#!python
def write_read_command(*args, **kwargs):
     """Uses stdin string as stdin and returns stdout"""
     # implemenation inspired by subprocess.check_output() function
     if 'stdin' not in kwargs:
         raise ValueError('stdin argument is required.')
     if 'stdout' in kwargs:
         raise ValueError('stdout argument not allowed, it would be
overridden.')
     if 'stderr' in kwargs:
         raise ValueError('stderr argument not allowed, it would be
overridden.')
     stdin = kwargs['stdin'] # get a string for stdin
     kwargs['stdin'] = PIPE # to be able to send data to stdin
     kwargs['stdout'] = PIPE
     kwargs['stderr'] = PIPE
     process = start_command(*args, **kwargs)
     output, errors = process.communicate(stdin)
     returncode = process.poll()
     if returncode:
         cmd = kwargs.get("args")
         if cmd is None:
             cmd = args[0]
         raise CalledCommandError(returncode, cmd, erros=errors)
     return output
}}}

I'm not sure with limitations of `write_command()`, `read_command()` and
`write_read_command()` considering large inputs and outputs but they are
using `out, err = p.communicate(stdin)` and `p.poll()` which should be the
best way according to what Glynn is saying, library (subprocess) is using,
and documentation is suggesting. Moreover, they are not worse that the
existing functions and I expect that for special cases you need to use
`start_command()` directly anyway.

The missing function is now a function which would [comment:3 connect
stdout and stderr] (`stdout=PIPE, stderr=STDOUT`). But this is again
something special (e.g. used in GUI) which might not need a convenient
function and user should use `start_command()` directly.

Considering how `run_command()` function is used I would suggest to change
it to raise an exception. This will break the code which was using it
correctly and this code will have to be fixed manually and perhaps adding
try-except will be necessary. However, most of the code will improve since
it will at least fail with proper error message at proper place and not
few lines further with hard-to-understand message. Adding `return 0` to
the end of the function may provide a compatibility with previous behavior
for cases when the return value was used. It will not create an error when
everything is OK but it will not use the error handling at that place when
the command fails. At the end, `return 0` should be removed anyway.

{{{
#!diff
- ps = start_command(*args, **kwargs)
- return ps.wait()
+ if 'stdin' not in kwargs:
+ raise ValueError('stdin argument is required.')
+ else:
+ stdin = kwargs['stdin']
+ del kwargs['stdin'] # clean non-standard argument
+ process = feed_command(*args, **kwargs)
+ unused, errors = process.communicate(stdin)
+ returncode = process.poll()
+ if returncode:
+ cmd = kwargs.get("args")
+ if cmd is None:
+ cmd = args[0]
+ raise CalledCommandError(returncode, cmd, erros=errors)
+ return 0
}}}

I don't see particular advantage in functions `pipe_command()` (sets
`kwargs['stdout'] = PIPE`) and `feed_command()` (sets `kwargs['stdin'] =
PIPE`). The the parameters can be set directly with the function
`start_command()` and for me the functions does not contribute to
readability because the ''feed'' does not tell me if the string is the
input (as for `write_command()` function) or it justs sets `stdin=PIPE`.
Similarly, ''pipe'' does not tell me if everything is piped (stdin, stdout
and stderr), stdin and stdout are piped (as for `write_read_command()`
function) or just one of stdin and stdout is piped. It is even more
confusing name when I consider piping in command line where I connect
multiple processes (which is not done by PIPE constant in
[https://docs.python.org/2/library/subprocess.html#replacing-shell-
pipeline subprocess]).

I wanted to change (instead of adding) the functions, especially
`run_command()`, originally but then I was afraid of problems with
releasing 7.0 and changing API. However, the usage statistics and the fact
that Glynn [comment:1 suggested the same] convinced me that we should
really do it. However, it is challenging, we have to change some code (at
least 150 occurrences of `run_command()` and `write_command()` in trunk
and addons but maybe also other functions) and I'm still not sure with
some detail such as if stderr should be always captured to get the right
error message (consider e.g. the result with `DEBUG=1`).

Another consideration is raising exception versus calling `fatal()`
function which can be set to raise exception but this exception would have
to contain everything in a message while a custom exception (I used one
derived from
[https://docs.python.org/2/library/subprocess.html#subprocess.CalledProcessError
subprocess.CalledProcessError]) can store return code, command (or
command/module name) and stderr separately.

For the 7.0 this actually should not be applied completely but the API
should be changed to fit the improved implementation/new API, so
`run_command()` function should have `return 0` but a waring in
documentation. Then perhaps `pipe_command()` and `feed_command()`
functions should be marked as depreciated.

Adding (not tested) draft of proposed changes.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2326#comment:6&gt;
GRASS GIS <http://grass.osgeo.org>

#2326: Command functions in grass.script.core miss a correct error reporting
--------------------------------+-------------------------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.1.0
Component: Python | Version: svn-trunk
Keywords: script, exceptions | Platform: All
      Cpu: Unspecified |
--------------------------------+-------------------------------------------

Comment(by wenzeslaus):

Replying to [comment:6 wenzeslaus]:
>
> Another consideration is raising exception versus calling `fatal()`
function which can be set to raise exception but this exception would have
to contain everything in a message while a custom exception (I used one
derived from
[https://docs.python.org/2/library/subprocess.html#subprocess.CalledProcessError
subprocess.CalledProcessError]) can store return code, command (or
command/module name) and stderr separately.
>

To take advantage of custom exception with additional info and keeping
`fatal()` with exit as a default behavior we can use the following
function in the `*_command` functions.

{{{
#!python
def called_command_error(cmd, returncode, errors)
     global raise_on_error
     if raise_on_error:
         raise CalledCommandError(returncode, cmd, erros=errors)
     # the same as CalledCommandError.__str__ is doing
     msg = _("Command '{cmd}' returned non-zero exit status {rc}"
             " and the following errors.\n"
             "%{errors}").format(cmd=cmd, rc=returncode,
                                 errors=errors)
     # fatal function will test raise_on_error again
     # the alternative is to mirror code from fatal function completely
     # which would be: error(msg); sys.exit(1)
     fatal(msg)
}}}

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2326#comment:7&gt;
GRASS GIS <http://grass.osgeo.org>

#2326: Command functions in grass.script.core miss a correct error reporting
--------------------------------+-------------------------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.1.0
Component: Python | Version: svn-trunk
Keywords: script, exceptions | Platform: All
      Cpu: Unspecified |
--------------------------------+-------------------------------------------

Comment(by glynn):

Replying to [comment:6 wenzeslaus]:

> Here are my suggestions for changes in `grass.script.core`.

These changes are excessive. All that is required is to check the exit
code and generate an error if it is non-zero. If the child process returns
a zero exit code, the existing behaviour should not be affected.

In particular, stderr should not be captured. It isn't limited to errors
(e.g. messages and percentages are written to stderr), and such
information should normally be sent to the terminal as its generated.

Also, checking kwargs!["args"] isn't useful.

In most cases, the failure to check exit codes was inherited from the
original shell script. run_command() replaces "simple" command execution,
read_command() replaces backticks. pipe_command() and feed_command() are
used for a GRASS command at one end of a pipeline. write_command()
replaces "echo data | command".

My suggestion would be that the functions which wait for the process to
terminate (run_command, read_command, write_command) should call a
check_status() function, e.g.
{{{
def check_status(p, args, kwargs):
     if p.wait() == 0:
         return 0
     raise CalledCommandError(p.returncode, args, kwargs)
}}}

run_command() and write_command() would replace
{{{
     return p.wait()
}}}
with
{{{
     return check_status(p)
}}}

This usage pattern would allow check_status() to be modified to provide
more options regarding error handling, e.g. raise an exception, call
fatal(), or just return the exit code, with the behaviour controlled by a
variable or a keyword argument.

Alternatively, we could just modify the Popen wrapper to provide a
modified .wait() method which does this automatically. This would probably
"do the right thing" for scripts which use start_command() etc and
subsequently call p.wait() themselves.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2326#comment:8&gt;
GRASS GIS <http://grass.osgeo.org>

#2326: Command functions in grass.script.core miss a correct error reporting
--------------------------------+-------------------------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.1.0
Component: Python | Version: svn-trunk
Keywords: script, exceptions | Platform: All
      Cpu: Unspecified |
--------------------------------+-------------------------------------------

Comment(by wenzeslaus):

Replying to [comment:8 glynn]:
> Replying to [comment:6 wenzeslaus]:
>
> > Here are my suggestions for changes in `grass.script.core`.
>
> These changes are excessive. All that is required is to check the exit
code and generate an error if it is non-zero. If the child process returns
a zero exit code, the existing behaviour should not be affected.
>
> In particular, stderr should not be captured. It isn't limited to errors
(e.g. messages and percentages are written to stderr), and such
information should normally be sent to the terminal as its generated.
>

That's true but I don't know how to solve the following case.

I'm currently testing the testing framework. When I make a mistake in my
testing code, an exception (`ValueError` in this case) is raised:

{{{
> python -m unittest discover sandbox/wenzeslaus/gunittest
..DBMI-SQLite driver error:
Error in sqlite3_prepare():
SELECT cat, YEAR_BUILTX FROM bridges
no such column: YEAR_BUILTX

DBMI-SQLite driver error:
Error in sqlite3_prepare():
SELECT cat, YEAR_BUILTX FROM bridges
no such column: YEAR_BUILTX

ERROR: Column type not supported
E............................

ERROR: test_assertVectorFitsUnivar
(testsuite.test_assertions.TestRasterMapAssertations)
----------------------------------------------------------------------
Traceback (most recent call last):
   File
"/home/vasek/dev/grass/sandbox/wenzeslaus/gunittest/testsuite/test_assertions.py",
line 80, in test_assertVectorFitsUnivar
     reference=V_UNIVAR_BRIDGES_WIDTH_SUBSET)
   File "/usr/lib/python2.7/unittest/case.py", line 475, in assertRaises
     callableObj(*args, **kwargs)
   File "/home/vasek/dev/grass/sandbox/wenzeslaus/gunittest/case.py", line
102, in assertVectorFitsUnivar
     reference=reference, msg=msg, sep='=')
   File "/home/vasek/dev/grass/sandbox/wenzeslaus/gunittest/case.py", line
66, in assertCommandKeyValue
     ": %s\n" % (module, ", ".join(missing)))
ValueError: v.univar output does not contain the following keys provided
in reference: max, mean, min, n, nmissing, nnull, range, sum

----------------------------------------------------------------------
Ran 31 tests in 1.227s

FAILED (errors=1)
}}}

But of course the error report is misleading because the problem is not in
the "key provided" or "v.univar output", the problem is that I used
current version of `read_command` which does not raise. If the
`read_command` function raise, I would get `ScriptError` (or whatever)
with an error message saying that `v.univar` failed. But this is not
enough to fix the problem.

If we catch the stderr we can report what has happened. In this case I
would get a message about wrong column name. However, if we will let
stderr be printed to the console, we will have to search in the output for
the errors which does not contain any information about command which
caused them (unfortunately, in this case they are even wrong and not in
GRASS format).

The only option I see is to have both functions. One capturing the stderr
for cases when the module (command) is used more like a function and one
letting the stderr go wherever it is directed. But it don't like it
because this applies at least to three functions which would create 6
different functions. Parameter, as noted bellow, might be a more
acceptable solution.

> Also, checking kwargs!["args"] isn't useful.

If you mean `cmd = kwargs.get("args")`, this is from `Popen`, source code,
I don't know what exactly they are trying to accomplish.
>
> In most cases, the failure to check exit codes was inherited from the
original shell script. run_command() replaces "simple" command execution,
read_command() replaces backticks. pipe_command() and feed_command() are
used for a GRASS command at one end of a pipeline. write_command()
replaces "echo data | command".
>
Beginning and end of the pipe line still does not convince me about
usefulness of the functions. I still see them as unnecessary complication
of interface. If one need something like this, he or she can use
`start_commnad` directly. Direct working with `Popen` object cannot be
avoided in any case.

> My suggestion would be that the functions which wait for the process to
terminate (run_command, read_command, write_command) should call a
check_status() function, e.g.
> {{{
> def check_status(p, args, kwargs):
> if p.wait() == 0:
> return 0
> raise CalledCommandError(p.returncode, args, kwargs)
> }}}
>
> run_command() and write_command() would replace
> {{{
> return p.wait()
> }}}
> with
> {{{
> return check_status(p)
> }}}
>
I don't agree with the name. It does not `check_status` it waits and then
checks status, so I would suggest `_wait_check_status`.

> This usage pattern would allow check_status() to be modified to provide
more options regarding error handling, e.g. raise an exception, call
fatal(), or just return the exit code, with the behaviour controlled by a
variable or a keyword argument.
>
Sure, this is the way to go (`called_command_error` from comment:7 is
doing something like that).

> Alternatively, we could just modify the Popen wrapper to provide a
modified .wait() method which does this automatically. This would probably
"do the right thing" for scripts which use start_command() etc and
subsequently call p.wait() themselves.

I vote against. The lower level API (when using `Popen` object) should
behave in the same way as Python `Popen` to allow users/programmers switch
between them easily. Moreover, it would be a violation of some OOP
principles, although in case of Python not so big violation, I believe.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2326#comment:9&gt;
GRASS GIS <http://grass.osgeo.org>

#2326: Command functions in grass.script.core miss a correct error reporting
--------------------------------+-------------------------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.1.0
Component: Python | Version: svn-trunk
Keywords: script, exceptions | Platform: All
      Cpu: Unspecified |
--------------------------------+-------------------------------------------

Comment(by wenzeslaus):

When trying to implement new version of what we are talking about I run
into the following problem. G7:g.message which is called by `core.fatal`
exits with non-zero return code.

{{{
# test runs of g.message
> g.message "no flag"; echo "$?"
no flag
0
> g.message -v "flag -v"; echo "$?"
0
> g.message -i "flag -i"; echo "$?"
flag -i
0
> g.message -w "flag -w"; echo "$?"
WARNING: flag -w
0
> g.message -e "flag -e"; echo "$?"
ERROR: flag -e
1
}}}

This causes `run_command` which is used to call `g.message` to call
`core.fatal` and infinite recursion goes on. `g.message` calls
`G_fatal_error` which in causes the exit with return code 1.

This can be fixed in `core.fatal` (`core.error`) for example by using
`start_command` directly. However, the behavior of `g.message` is not
expected. When it successfully prints the error, it should end
successfully. Wouldn't be better to change `g.message` to return 0 with
`-e` flag?

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2326#comment:10&gt;
GRASS GIS <http://grass.osgeo.org>

#2326: Command functions in grass.script.core miss a correct error reporting
--------------------------------+-------------------------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.1.0
Component: Python | Version: svn-trunk
Keywords: script, exceptions | Platform: All
      Cpu: Unspecified |
--------------------------------+-------------------------------------------

Comment(by wenzeslaus):

I've added new patch. Functions call one of the following error handing
functions:

{{{
#!python
def _called_command_error(returncode, args, kwargs):
     msg = _("Command %s %s returned non-zero exit status %s") % (args,
kwargs, returncode)
     if _RAISE_ON_COMMAND_ERROR:
         raise CalledCommandError(msg, returncode, args, kwargs)
     fatal(msg) # this exits or raises

def _called_command_return(returncode, args, kwargs):
     if _RETURN_ON_COMMAND_ERROR:
         return returncode
     elif returncode:
         _called_command_error(returncode, args, kwargs)
}}}

Examples of functions:

{{{
#!python
# we can return return code, raise or exit
def run_command(*args, **kwargs):
     ps = start_command(*args, **kwargs)
     returncode = ps.wait()
     return _called_command_return(returncode, args, kwargs)
}}}

{{{
#!python
# we want to always return output here, so raise or exit
def read_command(*args, **kwargs):
     process = pipe_command(*args, **kwargs)
     output = process.communicate()[0]
     returncode = process.poll()
     if returncode:
         _called_command_error(returncode, args, kwargs)
     return output
}}}

  * I don't call `wait()` or `poll()` in the checking method, this is done
in the functions in different (appropriate) ways.
  * There is no interface yet for the global variables which controls if
the functions will raise, exit/fatal or return (if applicable).
  * I'm using `message(msg, flag='w')` instead of `message(msg, flag='w')`
to avoid issue described in comment:10.
  * Besides global variables there may or probably should be also a
parameter which will apply to individual functions. But I still have a
feeling that I would like to use special function, no a function with
parameter.
  * Functions `read_command` and `write_read_command` capture stdout, so I
don't see a reason to not capture also stderr and print it (perhaps
begging and end) in the error message, except for inconsistency with other
functions. Again I still have a feeling that I want this also for other
functions.
  * I was looking to [source:grass/trunk/gui/wxpython/core/gcmd.py#L553
gui/wxpython/core/gcmd.py] and I'm wondering why there is so much code
around `subprocess.Popen` call and why it is not needed in `script.core`.
  * I'm still thinking how to include this changes into trunk and 7_0. It
is a change of API and it requires to go to almost 200 places in trunk and
addons (see comment:5).

Download the test, e.g. to `/lib/python/script/testsuite` directory, and
run it (inside GRASS session):
{{{
python -m unittest discover lib/python/script/testsuite
}}}

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2326#comment:11&gt;
GRASS GIS <http://grass.osgeo.org>

#2326: Command functions in grass.script.core miss a correct error reporting
--------------------------------+-------------------------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.1.0
Component: Python | Version: svn-trunk
Keywords: script, exceptions | Platform: All
      Cpu: Unspecified |
--------------------------------+-------------------------------------------

Comment(by wenzeslaus):

To start GUI with the changes I had to add

{{{
#!python
import grass.script.core as gcore
gcore._RAISE_ON_COMMAND_ERROR = True
}}}

to `wxgui.py` which is expected but to make GUI start I had to add try-
except to `core.find_file()`:

{{{
#!python
def find_file(name, element='cell', mapset=None):
     if element == 'raster' or element == 'rast':
         verbose(_('Element type should be "cell" and not "%s"') % element)
         element = 'cell'
     try:
         s = read_command("g.findfile", flags='n', element=element,
file=name,
                          mapset=mapset)
     except CalledCommandError:
         return {'name': ''}
     return parse_key_val(s)
}}}

The try-except makes sense only when `_RAISE_ON_COMMAND_ERROR = True`, so
I would say that it should not be there. It's sure that function
`find_file` should not cause exit or raise when the file was not found. It
should return `False` or perhaps `None` in this case.

It seems that the fact that you cannot rely on the interface of the
functions hardens the implementation of new functions which are supposed
to be general. This disqualifies the global switching from being usable.

I think that this is different from `raise_on_error` which changes
behavior of `core.fatal()`. GRASS modules/program calls in general behaves
in in different ways considering error codes from error states in
functions. They are generally less predictable, although GRASS modules in
most of the cases behaves quite nicely and translating error code to
exception raise makes sense.

Considering these difficulties (g.findfile, g.message and number of
usages) it seems to me that the functions (their default behavior) cannot
be changed before the 7.0 release and also that global settings
_RAISE_ON_COMMAND_ERROR and _RETURN_ON_COMMAND_ERROR should not be part of
API and should not be used except for some testing purposes.

Thus, I now think that the best solution is to create new functions (in
different module but based on `start_command`) or to add a parameter to
existing functions but something tells me that I will anyway create a
wrapper function with the parameter set and creating new functions all
over again was what was motivation for this ticket.

Appendix: G7:g.findfile behavior with not existing map is to produce
output with empty values and return code 1. `find_file` ignores return
code and caller checks if some of the values is an empty string (or
`None`).

{{{
> g.findfile -n element=cell file=aaabbbcc; echo $?
name=
mapset=
fullname=
file=
1
}}}

Appendix: Usage of `raise_on_error`. The usage in GUI seems to be not well
settled.

{{{
$ cd gui/wxpython
$ grep --color=auto --exclude-dir={.svn,.git,.OBJ,locale,dist.*} -IrnE
"raise_on"
animation/dialogs.py:1646: gcore.set_raise_on_error(True)
animation/frame.py:46:gcore.set_raise_on_error(True)
iclass/g.gui.iclass.py:120: grass.set_raise_on_error(False)
vdigit/g.gui.vdigit.py:84: grass.set_raise_on_error(False)
psmap/dialogs.py:63:# grass.set_raise_on_error(True)
psmap/frame.py:1056: grass.set_raise_on_error(False)
dbmgr/g.gui.dbmgr.py:43: grass.set_raise_on_error(False)
}}}

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2326#comment:12&gt;
GRASS GIS <http://grass.osgeo.org>

#2326: Command functions in grass.script.core miss a correct error reporting
--------------------------------+-------------------------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.1.0
Component: Python | Version: svn-trunk
Keywords: script, exceptions | Platform: All
      Cpu: Unspecified |
--------------------------------+-------------------------------------------

Comment(by glynn):

Replying to [comment:9 wenzeslaus]:

> That's true but I don't know how to solve the following case.
>
> I'm currently testing the testing framework.

Your testing framework probably shouldn't be using functions designed for
normal scripts. Conversely (and more importantly), features which are
necessary or useful for a testing framework shouldn't be imposed upon
normal scripts.

Write your own "run_command" function atop grass.script.start_command(),
and use that.

> > Also, checking kwargs!["args"] isn't useful.
>
> If you mean `cmd = kwargs.get("args")`, this is from `Popen`, source
code, I don't know what exactly they are trying to accomplish.

Python's Popen() is at a lower level than the grass.script functions. The
latter don't accept an "args" parameter, but use grass.script.make_command
to generate the argument list from the keyword arguments.

> > In most cases, the failure to check exit codes was inherited from the
original shell script. run_command() replaces "simple" command execution,
read_command() replaces backticks. pipe_command() and feed_command() are
used for a GRASS command at one end of a pipeline. write_command()
replaces "echo data | command".
> >
> Beginning and end of the pipe line still does not convince me about
usefulness of the functions. I still see them as unnecessary complication
of interface. If one need something like this, he or she can use
`start_commnad` directly.

All of the functions (except for exec_command) can be replaced by
start_command, as all of those functions are implemented using
start_command. They exist as convenience interfaces for the most common
cases (e.g. being able to use "data = read_command(...)" rather than
having to explicitly manage a Popen object).

The cases of a single pipe to either stdin or stdout are simpler (for the
caller) than those which require multiple pipes, as they avoid the
potential for deadlock. This is why Unix' popen() lets you read stdout or
write stderr but not both.

These interfaces could perhaps be simplified further by supporting the
context manager interface, so that scripts could use e.g.
{{{
     with grass.script.pipe_command(...) as p:
         for line in p.stdout:
             # whatever
}}}
This would avoid the need for an explicit check of the exit code (the
check would be performed in the context manager's `__exit__` method).

> > My suggestion would be that the functions which wait for the process
to terminate (run_command, read_command, write_command) should call a
check_status() function, e.g.
> >
> I don't agree with the name. It does not `check_status` it waits and
then checks status, so I would suggest `_wait_check_status`.

The "status" in question is the process' exit status, which doesn't exist
until the process has terminated. How about check_exit_status()?

> > Alternatively, we could just modify the Popen wrapper to provide a
modified .wait() method which does this automatically. This would probably
"do the right thing" for scripts which use start_command() etc and
subsequently call p.wait() themselves.
>
> I vote against. The lower level API (when using `Popen` object) should
behave in the same way as Python `Popen` to allow users/programmers switch
between them easily. Moreover, it would be a violation of some OOP
principles, although in case of Python not so big violation, I believe.

Another alternative is the same thing under a different name, e.g.
.finish(). This would still require modifying existing scripts to use that
rather than .wait(), but at least it keeps it as a method rather than a
function.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2326#comment:13&gt;
GRASS GIS <http://grass.osgeo.org>

#2326: Command functions in grass.script.core miss a correct error reporting
--------------------------------+-------------------------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.1.0
Component: Python | Version: svn-trunk
Keywords: script, exceptions | Platform: All
      Cpu: Unspecified |
--------------------------------+-------------------------------------------

Comment(by glynn):

Replying to [comment:12 wenzeslaus]:

> Thus, I now think that the best solution is to create new functions (in
different module but based on `start_command`) or to add a parameter to
existing functions but something tells me that I will anyway create a
wrapper function with the parameter set and creating new functions all
over again was what was motivation for this ticket.

I suggest adding an extra keyword parameter to control error checking. The
default should be to generate an error (raise an exception or call
fatal()) for a non-zero exit code.

The value should be stored in the grass.script.Popen() object so that non-
blocking functions (start_command, pipe_command, feed_command) behave
consistently with blocking functions (run_command, read_command,
write_command). I.e. the error-checking behaviour would always be
specified in the call which creates the process regardless of whether that
call waits for termination or returns a Popen object.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2326#comment:14&gt;
GRASS GIS <http://grass.osgeo.org>

#2326: Command functions in grass.script.core miss a correct error reporting
--------------------------------+-------------------------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.1.0
Component: Python | Version: svn-trunk
Keywords: script, exceptions | Platform: All
      Cpu: Unspecified |
--------------------------------+-------------------------------------------

Comment(by wenzeslaus):

Replying to [comment:13 glynn]:
> Replying to [comment:9 wenzeslaus]:
>
> > That's true but I don't know how to solve the following case.
> >
> > I'm currently testing the testing framework.
>
> Your testing framework probably shouldn't be using functions designed
for normal scripts. Conversely (and more importantly), features which are
necessary or useful for a testing framework shouldn't be imposed upon
normal scripts.
>
> Write your own "run_command" function atop grass.script.start_command(),
and use that.
>
This is what I wanted to avoid, writing some other wrappers around
`start_command()` again and again. But now I think that at least this
time, for now, it it the best option. So, here they are
source:sandbox/wenzeslaus/gunittest/gmodules.py?rev=60979

The most interesting function is `call_module` (which might replace some
of the other functions):

{{{
#!python
def call_module(module, stdin=None, merge_stderr=False, **kwargs):
     if stdin:
         kwargs['stdin'] = subprocess.PIPE # to be able to send data to
stdin
     elif 'input' in kwargs and kwargs['input'] != '-':
         raise ValueError(_("stdin must be set when input='-'"))
     if 'stdout' in kwargs:
         raise ValueError(_("stdout argument not allowed, it would be
overridden"))
     if 'stderr' in kwargs:
         raise ValueError(_("stderr argument not allowed, it would be
overridden"))

     kwargs['stdout'] = subprocess.PIPE
     if merge_stderr:
         kwargs['stderr'] = subprocess.STDOUT
     else:
         kwargs['stderr'] = subprocess.PIPE
     process = start_command(module, **kwargs)
     # input=None means no stdin (our default)
     # for stderr=STDOUT errors in None which is fine for CalledModuleError
     output, errors = process.communicate(input=stdin)
     returncode = process.poll()
     if returncode:
         raise CalledModuleError(returncode, module, kwargs, errors)
     return output
}}}

I should probably add to the documentation that you should not use it for
huge I/O. If stdin is used it feeds the process with it (I should add a
check if it is a string). It always returns stdout. By default it captures
stderr (to be used in a exception) and with merge_stderr=True it redirects
stderr to stdout.

I made an experiment and used the word "module" rather than "command".

About your comment:14, it makes sense. I can try to implement it some time
later. But we probably cannot apply this to 7 since the change might be
too invasive as I realized. But it would be good to ensure right usages in
the future of 7, so I don't know what to do.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2326#comment:15&gt;
GRASS GIS <http://grass.osgeo.org>

#2326: Command functions in grass.script.core miss a correct error reporting
--------------------------------+-------------------------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.1.0
Component: Python | Version: svn-trunk
Keywords: script, exceptions | Platform: All
      Cpu: Unspecified |
--------------------------------+-------------------------------------------

Comment(by wenzeslaus):

In r60993 I have added more parameters to `call_module()` function, so the
interface is more complicated but I expect (me) to use the defaults most
of the time and just add one or two parameters time to time. The other
functions (`run_`, `write_`, `read_`) are not so needed now, I may
consider removing them. Or at least reimplementing them using the general
`call_module()` function.

I hope that using
[https://docs.python.org/2/library/subprocess.html#subprocess.Popen.communicate
communicate()] will be safe most of the time. They say that ''The data
read is buffered in memory, so do not use this method if the data size is
large or unlimited.'' So, I should note this limitation in `call_module()`
too. Otherwise, they seem to believe that if you use `communicate()`, it
is safe (looking to the documentation and source code). What makes me not
sure are the things like
source:grass/trunk/gui/wxpython/core/gcmd.py?rev=60817#L553 and some
discussions on the internet.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2326#comment:16&gt;
GRASS GIS <http://grass.osgeo.org>

#2326: Command functions in grass.script.core miss a correct error reporting
--------------------------------+-------------------------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.1.0
Component: Python | Version: svn-trunk
Keywords: script, exceptions | Platform: All
      Cpu: Unspecified |
--------------------------------+-------------------------------------------

Comment(by glynn):

Replying to [comment:16 wenzeslaus]:

> They say that ''The data read is buffered in memory, so do not use this
method if the data size is large or unlimited.''

If the data would otherwise be sent to the terminal, memory consumption
won't be an issue. But there are a few modules which can produce vast
amounts of data on stdout (e.g. r.out.ascii, r.stats).

Apart from memory consumption, many modules print progress output, and you
want that to appear as it's generated, not all at once when the module
terminates.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2326#comment:17&gt;
GRASS GIS <http://grass.osgeo.org>

#2326: Command functions in grass.script.core miss a correct error reporting
--------------------------------+-------------------------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.1.0
Component: Python | Version: svn-trunk
Keywords: script, exceptions | Platform: All
      Cpu: Unspecified |
--------------------------------+-------------------------------------------

Comment(by wenzeslaus):

What do you (all) think about moving the function `call_module()` from
testing framework into the `grass.script.core` module?

The main point of this function is that it raises exception when return
code of the module is non-zero. Additionally, it provides an convenient
interface for capturing stdout and stderr and also for (optional)
providing stdin (as string, or stream). It uses the safest possible way to
achieve this which is `Popen.communicate()` method.

By default it captures stdout (stdout=PIPE) and returns it as string and
captures stderr (stderr=PIPE) and puts it into the exception when return
code was non-zero. This can be disabled or stderr can be merged to stdout
(stderr=STDOUT).

  * source:sandbox/wenzeslaus/gunittest/gmodules.py?rev=61134#L84

The function is universal and it can be used to implement the other
alternatives (`run_`, `write_`, and `read_`).

We can change the name to `call_command()` to be consistent with other
functions or we can use `call_module()` to emphasize that it have
different interface.

It raises exception derived from `subprocess`'s `CalledCommandError`. But
there is not special need for that. This is just for consistency with
`subprocess`'s `check_call` function which is not used by `call_command`
or in GRASS.

  * source:sandbox/wenzeslaus/gunittest/gmodules.py?rev=61134#L

It has tests to prove that it behaves how it should.

  *
source:sandbox/wenzeslaus/gunittest/testsuite/test_gmodules.py?rev=61134#L26

This of course does not solve the problems of wrong usage of `run_command`
and it does not enforce usage of fatal error. However, it allows to write
new code for libraries and scripts in a better, Python-friendly, way. The
usage for this function is where the module is used more like a function
then as a subprocess, in this case we don't care much about module
progress or messages unless there was an error. Typical usage are the
modules providing some stdout with key-value pairs.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2326#comment:18&gt;
GRASS GIS <http://grass.osgeo.org>

#2326: Command functions in grass.script.core miss a correct error reporting
--------------------------------+-------------------------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: major | Milestone: 7.1.0
Component: Python | Version: svn-trunk
Keywords: script, exceptions | Platform: All
      Cpu: Unspecified |
--------------------------------+-------------------------------------------

Comment(by glynn):

Replying to [comment:18 wenzeslaus]:
> What do you (all) think about moving the function `call_module()` from
testing framework into the `grass.script.core` module?

My concern is that people might use it when the existing commands (once
the error handling is fixed) would be more appropriate.

> The main point of this function is that it raises exception when return
code of the module is non-zero.

The existing commands should be changed to do that.

> Additionally, it provides an convenient interface for capturing stdout
and stderr and also for (optional) providing stdin (as string, or stream).
It uses the safest possible way to achieve this which is
`Popen.communicate()` method.
>
> By default it captures stdout (stdout=PIPE) and returns it as string

This shouldn't be the default. The default should be to inherit the
script's stdout. And if capturing stdout is the only feature required
(compared to run_command), read_command should be used.

> and captures stderr (stderr=PIPE) and puts it into the exception when
return code was non-zero.

This should never be done for normal scripts. Child process should inherit
the script's stderr.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2326#comment:19&gt;
GRASS GIS <http://grass.osgeo.org>