[GRASS-dev] [GRASS GIS] #2134: Create a general exit-safe interface to C libraries

#2134: Create a general exit-safe interface to C libraries
--------------------------------------------------+-------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Python ctypes | Version: svn-trunk
Keywords: G_fatal_error, exit, multiprocessing | Platform: All
      Cpu: Unspecified |
--------------------------------------------------+-------------------------
After discussion with Soeren, I'm opening a ticket with a request for a
general exit-safe interface to C libraries.

The purpose of this is (citing Soeren)
> to reduce the direct ctypes usage in the temporal
> framework, so that GUI elements can easily be implemented on top of
> it, without the risk to be exited by a `G_fatal_error()` call. But i
> don't want to use the script interface to avoid its module execution
> and parsing overhead, which slows things massively down.
Moreover, it protects the main application in case of segmentation faults.

The r58201 (#2127) and r58249 revisions provide `Messenger`
(source:grass/trunk/lib/python/pygrass/messages/__init__.py) and
`CLibrariesInterface`
(source:grass/trunk/lib/python/temporal/c_libraries_interface.py) classes
which implements the idea of server (a separate process) executing ctypes
(C library) functions (remote procedure call (RPC)). One is PyGRASS
interface for GRASS messages the other is the interface for GRASS C
functions needed by temporal framework.

One issue I see is that we need to implement such a class for each use
case. One for PyGRASS messages, other maybe for PyGRASS itself, for
temporal library, for GUI vector digitizer, for scatter plot... Is there a
way to create a general interface, or generated interface or make the
implementation simple?

The other issue is with implementation itself. Each class use a bit
different implementation to call functions. However, generally, some
identifiers are mapped to the ctypes functions. One issue is that it slows
down the function call. The other is that it is tedious (and error prone)
to implement (you need to create the mapping). And finally what is always
judged in Python code: it is not Pythonic (whatever that means, e.g. nice,
intuitive, straightforward). This of course highly relates to the first
issue.

Related documentation currently accesible from:
  *
http://grass.osgeo.org/programming7/namespacepython_1_1temporal_1_1c__libraries__interface.html
  *
http://grass.osgeo.org/programming7/classpython_1_1temporal_1_1c__libraries__interface_1_1CLibrariesInterface.html
  *
http://docs.python.org/2/library/multiprocessing.html#multiprocessing.Connection

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2134&gt;
GRASS GIS <http://grass.osgeo.org>

#2134: Create a general exit-safe interface to C libraries
--------------------------------------------------+-------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Python ctypes | Version: svn-trunk
Keywords: G_fatal_error, exit, multiprocessing | Platform: All
      Cpu: Unspecified |
--------------------------------------------------+-------------------------

Comment(by huhabla):

I have implemented a new RPC server that can call any kind of Python
function directly in the server subprocess. The server subprocess will
restart in case the provided Python function was terminated (segfault,
exception, G_fatal_error, ...).

The Python file with the new RPC design is attached. Please have a look at
the doctest examples that demonstrate how to use the RPCServer.

The RPC server supports two methods to call remote functions {{{
call(function, args) }}} and {{{ call_no_wait(function, args)}}}. The
{{{call()}}} function will always wait for the server to commit the return
value of the called Python function although the function may have no
return value. The {{{call_no_wait()}}} function will not wait for the
Python function to finish and therefor not return any return value of the
called Python function.

Example:
{{{
def info(args):
     message = args[0]
     libgis.G_message(message)

def raster_map_exists(args)
     name = args[0]
     mapset = args[1]
     mapset = libgis.G_find_raster(name, mapset)

     if mapset:
         return True

     return False

rpc = RPCServer()
rpc.call_no_wait(function=info, args=("This is a message",))

ret = rpc.call(function=raster_map_exists, args=("test", "PERMANENT"))
}}}

I am still not sure about the best Python caller function design. Should
the error return values of GRASS C-library functions be returned, or
should an exception be raised?

Any suggestions are welcome.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2134#comment:1&gt;
GRASS GIS <http://grass.osgeo.org>

#2134: Create a general exit-safe interface to C libraries
--------------------------------------------------+-------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Python ctypes | Version: svn-trunk
Keywords: G_fatal_error, exit, multiprocessing | Platform: All
      Cpu: Unspecified |
--------------------------------------------------+-------------------------

Comment(by wenzeslaus):

Thanks Soeren, this is what I hoped for and I'm looking forward how this
will continue. I just quickly looked at this topic for now. So, just some
small notes now.

I needed to do this changes to get it working:
{{{
#!diff
@@ -19,12 +19,13 @@
  from multiprocessing import Process, Lock, Pipe
  import logging
  from ctypes import *
-from core import *
+from grass.script.core import *
  import grass.lib.gis as libgis
  import grass.lib.raster as libraster
  import grass.lib.vector as libvector
  import grass.lib.date as libdate
  import grass.lib.raster3d as libraster3d
+from datetime import datetime
}}}
I though I understand how it works with exceptions and doctest but
apparently not, even I tried verbose mode:
{{{
python -m doctest -v c_library_interface.py
}}}
Anyway test passes. Would it be possible to write a test also for the case
of waiting, killing and restarting the server? I mean the case when I run
something with `call_no_wait(function=aaa)`, it runs for a long time,
meanwhile I call `call(function=bbb)` but then `bbb` fails and kills the
server. Does this make sense?

About the interface. I'm not sure about how to report errors. The
functions might need to accept keyword arguments in some way. How to deal
with objects, e.g. how to use pyGRASS through this?

I was not thinking about how to deal with wrapping all C functions. But
maybe manual wrapping it is inevitable if we want really Python(ic)
interface. We already have pyGRASS for this reason.

The last not is about naming. Although it is a well established practice
in GRASS that things are named with two or more different names, I would
suggest to use only one name for newly created things. We are at the
beginning, we can always rename but it would be safer to use one name at
time. `*C*Interface` is a nice name (although it it might by also
`*Python*Interface`) but it is more for the particular implementation
(interface for temporal framework or to messages) than for the general
thing (server). `RPCServer` is very general, not sure about that. What
about `CFunctionCaller`? I also heard suggestion `NoGFatal_Huray!!!` from
someone here but that's not a valid Python identifier, although, with
unicode identifiers, there might be a way. No strong opinion here.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2134#comment:2&gt;
GRASS GIS <http://grass.osgeo.org>

#2134: Create a general exit-safe interface to C libraries
--------------------------------------------------+-------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Python ctypes | Version: svn-trunk
Keywords: G_fatal_error, exit, multiprocessing | Platform: All
      Cpu: Unspecified |
--------------------------------------------------+-------------------------

Comment(by wenzeslaus):

I forgot to link related a note on Trac wiki wiki:wxGUIDevelopment/VDigit
where I'm proposing the factory pattern as a way of creation of objects
which are an object-oriented interface to C vector modules but in fact are
connected to the `RPCServer` by that factory, so user does does not have
to pass the `RPCServer` to every object (but he or she has to use the
factory).

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2134#comment:3&gt;
GRASS GIS <http://grass.osgeo.org>

#2134: Create a general exit-safe interface to C libraries
--------------------------------------------------+-------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Python ctypes | Version: svn-trunk
Keywords: G_fatal_error, exit, multiprocessing | Platform: All
      Cpu: Unspecified |
--------------------------------------------------+-------------------------

Comment(by zarch):

Replying to [ticket:2134 wenzeslaus]:
> The r58201 (#2127) and r58249 revisions provide `Messenger`
> (source:grass/trunk/lib/python/pygrass/messages/__init__.py) and
> `CLibrariesInterface`
(source:grass/trunk/lib/python/temporal/c_libraries_interface.py)
> classes which implements the idea of server (a separate process)
> executing ctypes (C library) functions (remote procedure call (RPC)).
One is
> PyGRASS interface for GRASS messages the other is the interface for
GRASS C
> functions needed by temporal framework.
>
> One issue I see is that we need to implement such a class for each
> use case. One for PyGRASS messages, other maybe for PyGRASS itself,
> for temporal library, for GUI vector digitizer, for scatter plot...
> Is there a way to create a general interface, or generated interface
> or make the implementation simple?

I think that a is great to have a RPC integrated in GRASS, and could be
useful for many reasons, but personally, I think that to solve the problem
of the G_fatal, we should handling directly the C signals. Of course it
will not work in case of occurring of a Segmentation Fault, and in this
sense I think that we need a RPC approach, but speaking about pygrass,
that it is a lower API, I think that would be great to be able to manage
the C signals.

We already discuss about this topic (#1646), and I don't know If I should
discuss this here or in the previous ticket, any way, maybe we should
consider to integrate in the GRASS source a fault handler, I found this
project [0] that maybe could be integrated in GRASS, I believe there is
some issue with the License [1] that is under "BSD (2-clause)"[3], so I'm
not sure it is possible to copy directly inside GRASS, but maybe we can
write something from scratch with the same logic.

Or as already suggested by Glynn (#1646) wrap the G_add_error_handler,
maybe given a more abstract and easy to use interface, and possibly that
avoid to expose the users to ctypes. Do you think that would be feasible?

Pietro

[0] https://github.com/haypo/faulthandler
[1] https://github.com/haypo/faulthandler/blob/master/COPYING
[3] http://opensource.org/licenses/BSD-2-Clause

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2134#comment:4&gt;
GRASS GIS <http://grass.osgeo.org>

#2134: Create a general exit-safe interface to C libraries
--------------------------------------------------+-------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Python ctypes | Version: svn-trunk
Keywords: G_fatal_error, exit, multiprocessing | Platform: All
      Cpu: Unspecified |
--------------------------------------------------+-------------------------

Comment(by zarch):

Replying to [comment:4 zarch]:
> I found this project [0] that maybe could be integrated in GRASS,
> I believe there is some issue with the License [1] that is under
> "BSD (2-clause)"[3], so I'm not sure it is possible to copy directly
> inside GRASS, but maybe we can write something from scratch with
> the same logic.

Sorry for the noise, but I found out that is already integrated in the
python standard library, so maybe we should just understand how to use it.

http://docs.python.org/dev/library/faulthandler.html

Pietro

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2134#comment:5&gt;
GRASS GIS <http://grass.osgeo.org>

#2134: Create a general exit-safe interface to C libraries
--------------------------------------------------+-------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Python ctypes | Version: svn-trunk
Keywords: G_fatal_error, exit, multiprocessing | Platform: All
      Cpu: Unspecified |
--------------------------------------------------+-------------------------

Comment(by glynn):

Replying to [comment:4 zarch]:

> Or as already suggested by Glynn (#1646) wrap the G_add_error_handler,

Using an error handler allows you to avoid process termination. But once a
fatal error has occurred, you cannot safely call any GRASS function; doing
so may well result in a segfault.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2134#comment:6&gt;
GRASS GIS <http://grass.osgeo.org>

#2134: Create a general exit-safe interface to C libraries
--------------------------------------------------+-------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Python ctypes | Version: svn-trunk
Keywords: G_fatal_error, exit, multiprocessing | Platform: All
      Cpu: Unspecified |
--------------------------------------------------+-------------------------

Comment(by huhabla):

In my opinion the RPC approach is only meaningful for persistent
applications that need fast access to C-library functions, or that need
low level API access for data modification (like digitizing).

My intention to write the RPC server was to make the temporal framework
usable in persistent applications and to be as fast as possible.

Extending the PyGRASS framework with signal handling will not solve the
problem: that memory structures may be corrupted in case of a
G_fatal_error() call or a segfault, leading to incorrect computation
results or to segmentation faults in the persistent application.

Implementation of persistent applications that use the GRASS C-library
functions in their process memory is IMHO not possible. We have had this
discussion for years.

The PyGRASS interface is well designed for module programming not for
persistent applications. Otherwise each C-function call should be handled
via RPC. From my point of view and some tests that i made slows the RPC
approach the processing significantly down.

Example:

{{{
from multiprocessing import Process, Pipe, Lock
import time

num = 50000

def func(a):
     b = a + 1
     c = b + 1
     d = c + b
     return d

# DICT
def dict_func():
     a = {}
     for i in xrange(num):
         a["%i%i%i%i"%(i,i,i,i)] = func

     start = time.time()

     for i in xrange(num):
         a["%i%i%i%i"%(i,i,i,i)](5)

     end = time.time()

     print end - start

def target(lock, conn):
     a = [func]*num
     while True:
         # Avoid busy waiting
         conn.poll(4)
         data = conn.recv()
         lock.acquire()
         ret = a[data[0]](data[1])
         conn.send(ret)
         lock.release()

def rpc_func():
     client_conn, server_conn = Pipe()
     lock = Lock()
     server = Process(target=target, args=(lock, server_conn))
     server.daemon = True

     server.start()

     start = time.time()
     for i in xrange(num):
         if server.is_alive() is True:
             data = [i, 5]
             client_conn.send(data)
             client_conn.recv()

     end = time.time()

     print end - start

if __name__ == "__main__":
     # Measuring the time to call of 50000 functions that are
     # registered in a dictionary
     dict_func()
     # Measuring the time to call 50000 functions
     # using an RPC approach
     rpc_func()
}}}

Running the script will show that calling functions from a dict is 50 time
faster than using a pipe with a subprocess.
{{{
python rpc_test.py
0.0984511375427
4.59985399246
}}}

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2134#comment:7&gt;
GRASS GIS <http://grass.osgeo.org>

#2134: Create a general exit-safe interface to C libraries
--------------------------------------------------+-------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Python ctypes | Version: svn-trunk
Keywords: G_fatal_error, exit, multiprocessing | Platform: All
      Cpu: Unspecified |
--------------------------------------------------+-------------------------

Comment(by zarch):

Replying to [comment:7 huhabla]:
> [big cut]

Thank you Soeren for the clarification and to share your knowledge, I
didn't understood the problem, sorry for the noise.

Pietro

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2134#comment:8&gt;
GRASS GIS <http://grass.osgeo.org>

#2134: Create a general exit-safe interface to C libraries
--------------------------------------------------+-------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Python ctypes | Version: svn-trunk
Keywords: G_fatal_error, exit, multiprocessing | Platform: All
      Cpu: Unspecified |
--------------------------------------------------+-------------------------

Comment(by wenzeslaus):

Replying to [comment:6 glynn]:
> Replying to [comment:4 zarch]:
>
> > Or as already suggested by Glynn (#1646) wrap the G_add_error_handler,
>
> Using an error handler allows you to avoid process termination. But once
a fatal error has occurred, you cannot safely call any GRASS function;
doing so may well result in a segfault.

One of the issues which `G_add_error_handler` is trying to solve is to
provide meaningful error message to the user. For example, failing to open
some temporary file causes `exit` with "`No such file /tmp/kjewbf8d38dj`".
This does not help user nor programmer to understand that the error
occurred (when does this happened, what is the stack trace, what are
consequences and what are suggestions to solve it). In other words,
sometimes the message provided by `G_fatal_error` caller is too low level.

Python `RPCServer` with wrapper functions throwing exceptions would help
to solve this issue. But it seems to me that #1646 remains valid for
pyGRASS (and possibly others) and C code itself.

Replying to [comment:7 huhabla]:
> In my opinion the RPC approach is only meaningful for persistent
applications that need fast access to C-library functions, or that need
low level API access for data modification (like digitizing).

And this is not only vector and raster digitizing, this is also new
scatter plot tool and in fact the whole `g.gui.iclass`, `nviz` (which is
unfortunately more complicated) and of course everything temporal-related
(everything started to be temporal-related).

> My intention to write the RPC server was to make the temporal framework
usable in persistent applications and to be as fast as possible.

I'm not sure how the speed of `RPCServer` compares to module call but the
speed is not the only advantage. Fine control of what is called and a
smoother interface (possibly, depending on wrappers) is the other
advantage. Calling subprocess from GUI for every single task and parsing
its output is cumbersome.

> The pyGRASS interface is well designed for module programming not for
persistent applications. Otherwise each C-function call should be handled
via RPC. From my point of view and some tests that i made slows the RPC
approach the processing significantly down.

So, in the next step, we need pyGRASS-like interface which is fail safe
and temporal library which is faster?

> Running the script will show that calling functions from a dict is 50
time faster than using a pipe with a subprocess.

It seems that this is something we need to take care of. And this is
something what my factory pattern suggestions is trying to address.

We would need to create two sets of class with identical interface. One
using `RPCServer` (safe) the other calling ctypes directly (fast). Objects
should be created by factory, so that the factory will put the `RPCServer`
into the objects, so the user does not take care of it. Maybe in Python we
can go beyond the classic factory pattern and create also the
`RPCServer`-dependent classes from the classes calling ctypes directly.

I realize that ''some code'' would be appreciated but I cannot dive into
this more now.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2134#comment:9&gt;
GRASS GIS <http://grass.osgeo.org>

#2134: Create a general exit-safe interface to C libraries
--------------------------------------------------+-------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Python ctypes | Version: svn-trunk
Keywords: G_fatal_error, exit, multiprocessing | Platform: All
      Cpu: Unspecified |
--------------------------------------------------+-------------------------

Comment(by huhabla):

Replying to [comment:9 wenzeslaus]:
> Replying to [comment:6 glynn]:
> > Replying to [comment:4 zarch]:
> >
> > > Or as already suggested by Glynn (#1646) wrap the
G_add_error_handler,
> >
> > Using an error handler allows you to avoid process termination. But
once a fatal error has occurred, you cannot safely call any GRASS
function; doing so may well result in a segfault.
>
> One of the issues which `G_add_error_handler` is trying to solve is to
provide meaningful error message to the user. For example, failing to open
some temporary file causes `exit` with "`No such file /tmp/kjewbf8d38dj`".
This does not help user nor programmer to understand that the error
occurred (when does this happened, what is the stack trace, what are
consequences and what are suggestions to solve it). In other words,
sometimes the message provided by `G_fatal_error` caller is too low level.
>
> Python `RPCServer` with wrapper functions throwing exceptions would help
to solve this issue. But it seems to me that #1646 remains valid for
pyGRASS (and possibly others) and C code itself.

I have re-designed the RPC interface, now the Python function wrapper will
return an exception and the result of the function calls, so that the RPC
server interface that provides the {{{call()}}} functions can raise these
exceptions (exceptions raised in the subprocess will kill the subprocess
and will not be catch'd in the parent process). Hence, the Python wrapper
functions transform the C-function return values into meaningful
exceptions that will be raised in the parent process.

While re-designing i concluded that a no wait function call
{{{call_no_wait()}}} is not meaningful when mixed with calls that wait to
receive data. There is only a limited number of C-functions that do not
return values or return states. It is better to wait for a function call
to finish, than risking a race condition in case a fatal error occur'd
meanwhile. An exception is the messaging interface, which should stay as
is.

However, maybe two RPC interfaces are meaningful: one that waits for
functions to return (expecting return values including exceptions) and one
that does not wait?

> Replying to [comment:7 huhabla]:
> > In my opinion the RPC approach is only meaningful for persistent
applications that need fast access to C-library functions, or that need
low level API access for data modification (like digitizing).
>
> And this is not only vector and raster digitizing, this is also new
scatter plot tool and in fact the whole `g.gui.iclass`, `nviz` (which is
unfortunately more complicated) and of course everything temporal-related
(everything started to be temporal-related).
>
> > My intention to write the RPC server was to make the temporal
framework usable in persistent applications and to be as fast as possible.
>
> I'm not sure how the speed of `RPCServer` compares to module call but
the speed is not the only advantage. Fine control of what is called and a
smoother interface (possibly, depending on wrappers) is the other
advantage. Calling subprocess from GUI for every single task and parsing
its output is cumbersome.

I have added benchmark runs to the rpc server script, to get an idea what
the performance loss and gain of the RPC interface is:
{{{
GRASS 7.0.svn (Test XY):~ > python c_library_interface.py
##############################################################
TESTS
ERROR: A fatal error
WARNING:root:Needed to restart the rpc server
ERROR: A fatal error
WARNING:root:Needed to restart the rpc server
##############################################################
##############################################################
Raster map exists benchmark
Time to call 1000 functions directly: 0.017043s
Time to call 1000 functions via RPC: 0.178600s
Time to perform 1000 g.findfile module runs: 30.343877s
##############################################################
##############################################################
Raster map info benchmark
Time to call 10000 functions directly: 0.856104s
Time to call 10000 functions via RPC: 7.189188s
Time to perform 10000 r.info module runs: 120.261683s
##############################################################
}}}

As you can see the RPC interface is for the two tested functions about 10
times slower then the direct Python function calls that wrap the GRASS
C-functions. But the RPC interface is about 17 to 600 times faster then
using the grass.script interface that calls GRASS modules (g.findfile and
r.info).

> > The pyGRASS interface is well designed for module programming not for
persistent applications. Otherwise each C-function call should be handled
via RPC. From my point of view and some tests that i made slows the RPC
approach the processing significantly down.
>
> So, in the next step, we need pyGRASS-like interface which is fail safe
and temporal library which is faster?

So you want to wrap all C-function calls in PyGRASS to be wrapped using
the RPC interface?

> > Running the script will show that calling functions from a dict is 50
time faster than using a pipe with a subprocess.
>
> It seems that this is something we need to take care of. And this is
something what my factory pattern suggestions is trying to address.
>
> We would need to create two sets of class with identical interface. One
using `RPCServer` (safe) the other calling ctypes directly (fast). Objects
should be created by factory, so that the factory will put the `RPCServer`
into the objects, so the user does not take care of it. Maybe in Python we
can go beyond the classic factory pattern and create also the
`RPCServer`-dependent classes from the classes calling ctypes directly.
>
> I realize that ''some code'' would be appreciated but I cannot dive into
this more now.

I am not sure if i understand your approach, so code examples would be
very helpful here. :slight_smile:

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2134#comment:10&gt;
GRASS GIS <http://grass.osgeo.org>

#2134: Create a general exit-safe interface to C libraries
--------------------------------------------------+-------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Python ctypes | Version: svn-trunk
Keywords: G_fatal_error, exit, multiprocessing | Platform: All
      Cpu: Unspecified |
--------------------------------------------------+-------------------------

Comment(by huhabla):

Replying to [comment:2 wenzeslaus]:
> Thanks Soeren, this is what I hoped for and I'm looking forward how this
will continue. I just quickly looked at this topic for now. So, just some
small notes now.
>
> I needed to do this changes to get it working:
> {{{
> #!diff
> @@ -19,12 +19,13 @@
> from multiprocessing import Process, Lock, Pipe
> import logging
> from ctypes import *
> -from core import *
> +from grass.script.core import *
> import grass.lib.gis as libgis
> import grass.lib.raster as libraster
> import grass.lib.vector as libvector
> import grass.lib.date as libdate
> import grass.lib.raster3d as libraster3d
> +from datetime import datetime
> }}}

Thanks for the fix, the new version should work now in any directory.

> I though I understand how it works with exceptions and doctest but
apparently not, even I tried verbose mode:
> {{{
> python -m doctest -v c_library_interface.py
> }}}

Please try:

{{{
python c_library_interface.py -v
}}}

> Anyway test passes. Would it be possible to write a test also for the
case of waiting, killing and restarting the server? I mean the case when I
run something with `call_no_wait(function=aaa)`, it runs for a long time,
meanwhile I call `call(function=bbb)` but then `bbb` fails and kills the
server. Does this make sense?

Yes it does, but i was not able to fix the deadlock that appeared when
mixing "waiting calls" with "no waiting calls" and a fatal error killed
the subprocess. So i removed the no wait calls from the server interface.

> About the interface. I'm not sure about how to report errors. The
functions might need to accept keyword arguments in some way. How to deal
with objects, e.g. how to use pyGRASS through this?

In the latest implementation the subprocess creates an exception that is
send through the pipe to the server and is raised there.
If an object is picklable then it can be send between processes.

>
> I was not thinking about how to deal with wrapping all C functions. But
maybe manual wrapping it is inevitable if we want really Python(ic)
interface. We already have pyGRASS for this reason.
>
> The last not is about naming. Although it is a well established practice
in GRASS that things are named with two or more different names, I would
suggest to use only one name for newly created things. We are at the
beginning, we can always rename but it would be safer to use one name at
time. `*C*Interface` is a nice name (although it it might by also
`*Python*Interface`) but it is more for the particular implementation
(interface for temporal framework or to messages) than for the general
thing (server). `RPCServer` is very general, not sure about that. What
about `CFunctionCaller`? I also heard suggestion `NoGFatal_Huray!!!` from
someone here but that's not a valid Python identifier, although, with
unicode identifiers, there might be a way. No strong opinion here.

The latest RPC server incarnation does not care what kind of functions it
should call, hence it is very generic and takes care about killed
subprocesses. You can define any arbitrary function with arguments that
are picklable and pass it to the RPC server to call it in the subprocess.

But maybe it would be better to use sharedctypes memory in case of
digitizing rather than a pipe?

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2134#comment:11&gt;
GRASS GIS <http://grass.osgeo.org>

#2134: Create a general exit-safe interface to C libraries
--------------------------------------------------+-------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Python ctypes | Version: svn-trunk
Keywords: G_fatal_error, exit, multiprocessing | Platform: All
      Cpu: Unspecified |
--------------------------------------------------+-------------------------

Comment(by wenzeslaus):

Replying to [comment:10 huhabla]:
> Replying to [comment:9 wenzeslaus]:
> > Replying to [comment:7 huhabla]:
> > > Running the script will show that calling functions from a dict is
50 time faster than using a pipe with a subprocess.
> >
> > It seems that this is something we need to take care of. And this is
something what my factory pattern suggestions is trying to address.
> >
> > We would need to create two sets of class with identical interface.
One using `RPCServer` (safe) the other calling ctypes directly (fast).
Objects should be created by factory, so that the factory will put the
`RPCServer` into the objects, so the user does not take care of it. Maybe
in Python we can go beyond the classic factory pattern and create also the
`RPCServer`-dependent classes from the classes calling ctypes directly.
> >
> > I realize that ''some code'' would be appreciated but I cannot dive
into this more now.
>
> I am not sure if i understand your approach, so code examples would be
very helpful here. :slight_smile:

My original idea does not involved RCP server, it was about ctypes calls
versus module calls. It would require rewrite the whole pyGRASS using
modules. Here is sample pseudo code:

{{{
#!python
# in module/package pygrass
class Point(object):
     def __init__(self, x=0, y=0, z=None, **kargs):
         # ...
         libvect.Vect_append_point(self.c_points, x, y, z)

# in module/package failsafegrass
class Point(object):
     def __init__(self, x=0, y=0, z=None)
         data = "P 1 1\n{x} {y}\n1 1".format(x=x, y=y)
         # ... (fill file /tmp/ef5s with data)
         run_command('v.edit' -n tool='add' map=vectmap input="/tmp/ef5s")

# in some other modules/packages
class FailSafeFactory(object):
     def create_Point(self, args, kwargs):
         return failsafegrass.Point(*args, **kwargs)

class FastFactory(object):
     def create_Point(self, args, kwargs):
         return pygrass.Point(*args, **kwargs)

# in the user code (probably GUI)

# at the beginning
if be_safe:
     factory = FailSafeFactory()
else:
     factory = FastFactory()

# usage in code
point1 = factory.create_Point(x=500, y=600)

# now I don't know which point it is but the interface is the same
}}}

I'm not sure if factory pattern is the right thing to use with RCP server
(to hide it). But I think that it is the option which should be
considered. There are two things I don't understand yet. First, how to
work objects. I can pass them but how I call the methods and access
attributes in general?

{{{
#!python
class Point:
     def get_x(self):
        return rcpserver.call(self._unsafe_get_x())
}}}

But this cannot work for object containing pointers, so there actually
cannot be any `_unsafe_get_x` method. So, the second thing is that I
probably need some `ProxyPoint` which can be passed between processes. And
using some identifier it would be associated with the object on the other
site.

{{{
#!python
# in the GUI process
class ProxyPoint:
     def get_x(self):
        return rcpserver.method_call(self, 'get_x')

# somewhere in the other unsafe process
# proxy_object == self from above
# method == 'get_x' from above
real_object = objects[proxy_object.uid]
return getattr(real_object, method)()

# maybe pass only the uid is better idea,
# since server should not be passed through itself
}}}

I don't have more now. It seems to me that this might be possible. But my
understanding of problem is not complete and even now I see some problems
such as function/method parameters which are proxy objects.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2134#comment:12&gt;
GRASS GIS <http://grass.osgeo.org>

#2134: Create a general exit-safe interface to C libraries
--------------------------------------------------+-------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Python ctypes | Version: svn-trunk
Keywords: G_fatal_error, exit, multiprocessing | Platform: All
      Cpu: Unspecified |
--------------------------------------------------+-------------------------

Comment(by zarch):

Replying to wenzeslaus:

Replying to huhabla:

I am not sure if i understand your approach, so code examples would be
very helpful here. :slight_smile:

My original idea does not involved RCP server, it was about ctypes calls
versus module calls. It would require rewrite the whole pyGRASS using
modules. Here is sample pseudo code:

(IMHO) I don't see the point to make a python API using the modules
interface. It will be only slow and useless for any real task, already
pygrass for some operations it consume too much memory (probably it is due
to some circular references, that need to be solve...) or is too slow.

off-topic { Personally if I have to do the effort of rewriting pygrass I
would use Cython (or CPython), moreover I think that we should work to
reduce and clean the high redundancy of the code, a lot of Python objects
and functions are duplicate between script/pygrass/temporal/wxgui simply
using slightly different logic and this is unmaintainable, unfortunately I
have not the global view of the ~111000 rows (removing blank lines and
comments) of the code. But I do think that we should work on that
direction. }

Why not simply define your function (containing ctypes or ctypes-based
function/objects) and call through the RPC interface? In this way, from my
little understanding of the problem, you should be able to preserve the
GUI from a crash, avoiding at the same time to make another duplicate,
don't you?

Something like (not tested):

{{{
from grass.pygrass.vector import VectorTopo
from grass.pygrass.vector.geometry import Point
from grass.pygrass.function import get_mapset_vector

def add_points(vname, vmapset='', *points):
     """
     >>> add_points('new', (1, 2), (2, 3), (3, 4))
     """
     mapset = get_mapset_vector(vname, vmapset) # chek if already exist
     with VectorTopo(vname, mapset, mode='rw' if mapset else 'w') as vct:
         for x, y in points:
             vct.write(Point(x, y))

ciface = RPCServer()
check = ciface.call(function=add_points, args=('new', (1, 2), (2, 3), (3,
4)))
}}}

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2134#comment:13&gt;
GRASS GIS <http://grass.osgeo.org>

#2134: Create a general exit-safe interface to C libraries
--------------------------------------------------+-------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Python ctypes | Version: svn-trunk
Keywords: G_fatal_error, exit, multiprocessing | Platform: All
      Cpu: Unspecified |
--------------------------------------------------+-------------------------

Comment(by zarch):

Sorry the track lost the quote characters.

Replying to wenzeslaus:
> Replying to huhabla:
> > I am not sure if i understand your approach, so code examples would
> > be very helpful here. :slight_smile:
>
> My original idea does not involved RCP server, it was about ctypes
> calls versus module calls. It would require rewrite the whole pyGRASS
> using modules. Here is sample pseudo code:
> [cut]

(IMHO) I don't see the point to make a python API using the modules
interface. It will be only slow and useless for any real task, already
pygrass for some operations it consume too much memory (probably it is due
to some circular references, that need to be solve...) or is too slow.

off-topic { Personally if I have to do the effort of rewriting pygrass I
would use Cython (or CPython), moreover I think that we should work to
reduce and clean the high redundancy of the code, a lot of Python objects
and functions are duplicate between script/pygrass/temporal/wxgui simply
using slightly different logic and this is unmaintainable, unfortunately I
have not the global view of the ~111000 rows (removing blank lines and
comments) of the code. But I do think that we should work on that
direction. }

Why not simply define your function (containing ctypes or ctypes-based
function/objects) and call through the RPC interface? In this way, from my
little understanding of the problem, you should be able to preserve the
GUI from a crash, avoiding at the same time to make another duplicate,
don't you?

Something like (not tested):

{{{
from grass.pygrass.vector import VectorTopo
from grass.pygrass.vector.geometry import Point
from grass.pygrass.function import get_mapset_vector

def add_points(vname, vmapset='', *points):
     """
     >>> add_points('new', (1, 2), (2, 3), (3, 4))
     """
     mapset = get_mapset_vector(vname, vmapset) # check if already exist
     with VectorTopo(vname, mapset, mode='rw' if mapset else 'w') as vct:
         for x, y in points:
             vct.write(Point(x, y))

ciface = RPCServer()
check = ciface.call(function=add_points, args=('new', (1, 2), (2, 3), (3,
4)))
}}}

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2134#comment:14&gt;
GRASS GIS <http://grass.osgeo.org>

#2134: Create a general exit-safe interface to C libraries
--------------------------------------------------+-------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Python ctypes | Version: svn-trunk
Keywords: G_fatal_error, exit, multiprocessing | Platform: All
      Cpu: Unspecified |
--------------------------------------------------+-------------------------

Comment(by huhabla):

I have attached a new RPC server that is able to instantiate a persistent
PyGRASS object in a remote process that can be accessed using an RPC
interface, have a look at the attached file
[[attachment:pygrass_rpc_interface.py]].

To get it work i needed to patch the PyGRASS raster info object to avoid
C-pointer that can not be pickled, see [[attachment:pygrass_raster.diff]].
IMHO PyGRASS should avoid pointer and should use ctypes.byref() to pass
pointers to GRASS C-functions. I think (i am not sure about this) that
avoiding ctypes.pointer() may reduce memory leaks as well if objects like
Cell_head() of Range() are managed by the Python garbage collector?

Here an example how to use the PyGRASS RPC interface, reading some raster
map info's and all rows:

{{{
#!python
if __name__ == '__main__':
     import grass.script as grassscript
     import grass.pygrass.raster as pygrass

     grassscript.use_temp_region()
     grassscript.run_command("g.region", n=80.0, s=0.0, e=120.0, w=0.0,
                       t=1.0, b=0.0, res=10.0, res3=10.0)
     grassscript.run_command("r.mapcalc", expression="test = row()",
                             overwrite=True, quiet=True)

     # Start the server and create a RasterRow remote object
     # that is initialized with the raster map name
     kargs = {"name":"test"}
     server = RPCServer(pygrass.RasterRow, kargs)

     # Get some raster map info
     info = server.get_property("info")
     print "Rows:", info.rows
     print "Cols:", info.cols
     print "N:", info.north
     print "S:", info.south
     print "E:", info.east
     print "W:", info.west

     # Open the raster map for row access
     kargs = {}
     server.call("open", kargs)

     # Read all rows
     for row in xrange(info.rows):
         kargs = {"row":row}
         ret = server.call("get_row", kargs)
         print "Row:", row, ret

     # Close the raster map
     kargs = {}
     server.call("close", kargs)

     grassscript.del_temp_region()
}}}

Result:

{{{
GRASS 7.0.svn (nc_spm_08):~/src > python pygrass_rpc_interface.py
Rows: 8
Cols: 12
N: 80.0
S: 0.0
E: 120.0
W: 0.0
Row: 0 [1 1 1 1 1 1 1 1 1 1 1 1]
Row: 1 [2 2 2 2 2 2 2 2 2 2 2 2]
Row: 2 [3 3 3 3 3 3 3 3 3 3 3 3]
Row: 3 [4 4 4 4 4 4 4 4 4 4 4 4]
Row: 4 [5 5 5 5 5 5 5 5 5 5 5 5]
Row: 5 [6 6 6 6 6 6 6 6 6 6 6 6]
Row: 6 [7 7 7 7 7 7 7 7 7 7 7 7]
Row: 7 [8 8 8 8 8 8 8 8 8 8 8 8]
}}}

IMHO the RPC interface design should provide a blueprint how to implement
a remote digitizer using PyGRASS to access vector or raster maps. I hope
that most of the PyGRASS functionality will be accessible using this
approach.

Note: with this approach you will create a subprocess for each raster or
vector map that should be modified. Each subprocess will manage only one
object. The subprocess can manage several objects by using a session id.
But IMHO this is not a good idea. In case one object calls G_fatal_error()
or segfaults, all other objects will be killed as well.

PyGRASS may need more patching to make the objects that will be exchanged
between client and server picklable (see PyGRASS raster info object patch
[[attachment:pygrass_raster.diff]]).

This RPC interface will not replace the new PyGRASS messenger interface
nor the tgis specific raster info access RPC implementation. Calling
simple C-functions for each map in a space time raster datasets (STRDS)
using an RPC interface is much faster than creating a subprocess and a
raster object for each map in a STRDS.

Hence, there are different solutions for different needs. :slight_smile:

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2134#comment:15&gt;
GRASS GIS <http://grass.osgeo.org>

#2134: Create a general exit-safe interface to C libraries
--------------------------------------------------+-------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Python ctypes | Version: svn-trunk
Keywords: G_fatal_error, exit, multiprocessing | Platform: All
      Cpu: Unspecified |
--------------------------------------------------+-------------------------

Comment(by zarch):

Hi Soeren,

sorry for the late reply.

Replying to [comment:15 huhabla]:
> I have attached a new RPC server that is able to instantiate a
persistent
> PyGRASS object in a remote process that can be accessed using an RPC
> interface

Thank you for that, I think it is a good solution to the problem.

> To get it work i needed to patch the PyGRASS raster info object to
> avoid C-pointer that can not be pickled,
> IMHO PyGRASS should avoid pointer and should use ctypes.byref() to
> pass pointers to GRASS C-functions. I think (i am not sure about this)
> that avoiding ctypes.pointer() may reduce memory leaks as well
> if objects like Cell_head() of Range() are managed by the Python
> garbage collector?

Yes, exactly, remove the pointer to the C structs will make independent
all the objects to each other, so should solve the problem of the circular
reference and memory consumption that I cited in a previous email.

So I can remove all the pointers, or leave the pointer but instantiate an
independent C struct internally on the object, like looking at your diff
[[pygrass_raster.diff]], I could do in the init:

{{{

def __init__(self, ...):
     ...
     self.cs_region = libgis.Cell_head() # C struct region
     self.c_region = ctypes.pointer(self.cs_region) # C pointer
     ...

}}}

And all the time that I use a C struct pointer to instantiate a class I
need to copy the struct internally with memcpy. In this way, when the
python object is dereferenced, the garbage collector should be able to
manage and free the memory.
Of course from a performance point of view we loose something because
every time that we instantiate a geometry feature we are copying the
"MapInfo" struct.

Do you have a better idea?

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2134#comment:16&gt;
GRASS GIS <http://grass.osgeo.org>

#2134: Create a general exit-safe interface to C libraries
--------------------------------------------------+-------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.0
Component: Python ctypes | Version: svn-trunk
Keywords: G_fatal_error, exit, multiprocessing | Platform: All
      Cpu: Unspecified |
--------------------------------------------------+-------------------------

Comment(by huhabla):

Pietro and i investigated the possibility of a PyGRASS vector RPC
interface in more detail off-list. We came to the conclusion that the
design of PyGRASS does not allow the implementation of a RPC interface as
suggested in this ticket. The main problem is that the most important
PyGRASS vector geometry classes make use of C-library structures
internally to massively speed up processing. The vector C-library
structures (Map_info, line_pnts, line_cats, ...) are managed by the vector
C-library using dynamic memory allocation. Such structures can not be
automatically serialized (pickled) by Python. But instances of the
geometry classes (Point, Line, Isle, Area, ...) must be exchanged by the
RPC interface for reading and writing of vector maps. To allow the
exchange of geometries all these classes must be rewritten and reduced in
their functionality. All functions that may call G_fatal_error() must be
removed from the geometry classes. An additional abstraction layer must be
implemented to convert the line_pnts and line_cats data into picklable
Python objects and vice verse.

Making PyGRASS RPC conform will result in much slower vector processing
and reduced functionality. The only benefit would be that persistent
applications, like the GUI, can use PyGRASS via RPC.

However, Pietro and i came to the conclusion that the effort to modify
PyGRASS to be RPC conform is to large and the benefits to low. We suggest
a dedicated vector editing RPC interface that calls the vector C-library
functions in the RPC subprocess directly, or that make use of PyGRASS in
the RPC subprocess but converts the geometry objects into picklebale
Python objects.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2134#comment:17&gt;
GRASS GIS <http://grass.osgeo.org>

#2134: Create a general exit-safe interface to C libraries
----------------------------+-----------------------------------------------
  Reporter: wenzeslaus | Owner: grass-dev@…
      Type: enhancement | Status: closed
  Priority: normal | Milestone: 7.0.0
Component: Python ctypes | Version: svn-trunk
Resolution: wontfix | Keywords: G_fatal_error, exit, multiprocessing
  Platform: All | Cpu: Unspecified
----------------------------+-----------------------------------------------
Changes (by wenzeslaus):

  * status: new => closed
  * resolution: => wontfix

Comment:

The simpler solution with specialized interfaces for each use case was
implemented (e.g., for reporting messages to user in PyGRASS) and used in
r58201, r58555, r58386, r58575, r58579 and others.

According to comment:17 closing as wontfix. Use specialized solutions as
suggested in comment:17. Thanks Soeren and Pietro for the hard work and
thinking.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2134#comment:18&gt;
GRASS GIS <http://grass.osgeo.org>