[GRASS-dev] GsoC2012: High level map interaction with python

Hi everyone!

I'm Pietro Zambelli a ph.D student of Trento University, I would like
to apply to the GSoC, my idea in short is: extend the python GRASS API
to make it more pythonic :-).

I would like to interact with region, raster and vector maps as
object, using and interacting with the map and the other GRASS
functionality in a more higher and abstract way.

For people used to `numpy` I would like to interact with the map with
the same simplicity that I interact with matrix using `numpy`.

For curious people, that want to understand my insane idea I add a new
page in the Wiki [0] to try to explain what I mean. I have used the
python doctest [1]: to give an brief idea of how we could interact
with the new python module, and in case of consensus, as the basis for
a test-driven development approach for the GSoC.

There are many aspect that could be improved and that must be
resolved, I try to highlight possible problems. But, of course, I need
some help to find out more and to understand if the idea, is or not,
good to contribute to the great GRASS software.
Please feel free to critique, add new examples, raise problems,
doubts, etcetera. In particular I would like to invite all developers
to check to find incoherence/inconsistency with the GRASS and/or
Python philosophy[2].

Any suggestions and criticism are more than welcome!

Pietro

[http://grass.osgeo.org/wiki/GRASS_SoC_Ideas_2012/High_level_map_interaction\]
[http://docs.python.org/library/doctest.html\]
[http://www.python.org/dev/peps/pep-0020/\]

2012/4/1 Pietro <peter.zamb@gmail.com>:

Hi everyone!

Ciao Pietro

Any suggestions and criticism are more than welcome!

I like the idea, I have no suggestions or criticism about your proposal.
I could be the co-mentor or the mentor, if anyone with more
experiences propose his self us mentor. I don't know very well ctypes
but I can study it

Pietro

--
ciao
Luca

http://gis.cri.fmach.it/delucchi/
www.lucadelu.org

On Sun, Apr 1, 2012 at 12:16 AM, Pietro <peter.zamb@gmail.com> wrote:

Hi everyone!

I'm Pietro Zambelli a ph.D student of Trento University, I would like
to apply to the GSoC, my idea in short is: extend the python GRASS API
to make it more pythonic :-).

I would like to interact with region, raster and vector maps as
object, using and interacting with the map and the other GRASS
functionality in a more higher and abstract way.

For people used to `numpy` I would like to interact with the map with
the same simplicity that I interact with matrix using `numpy`.

For curious people, that want to understand my insane idea I add a new
page in the Wiki [0] to try to explain what I mean. I have used the
python doctest [1]: to give an brief idea of how we could interact
with the new python module, and in case of consensus, as the basis for
a test-driven development approach for the GSoC.

There are many aspect that could be improved and that must be
resolved, I try to highlight possible problems. But, of course, I need
some help to find out more and to understand if the idea, is or not,
good to contribute to the great GRASS software.
Please feel free to critique, add new examples, raise problems,
doubts, etcetera. In particular I would like to invite all developers
to check to find incoherence/inconsistency with the GRASS and/or
Python philosophy[2].

Any suggestions and criticism are more than welcome!

I like the general concept of making the GRASS python API more
python-like. But I am not so sure about the raw data access in a numpy
way with a matrix holding raster data. Actually I am against it,
because GIS data handling is, because of their potential size,
different from say statistical data handling. When working with raster
data, there are a number of ways how to deal with them in chunks or
rows to avoid memory issues. You can have a look at r.example, the
cache used by r.proj, the segment lib and the rowio lib in order to
better understand how GIS raster algorithm development works. Also the
file-based spatial index in the GRASS 7 vector lib can give you an
idea about GIS data handling. See also the recent post of Alex Mandel
in the user ml [0]. Raw data access should IMHO be left to existing
modules, in particular because you intend to provide an easy to use
GRASS Python API primarily for users.

Markus M

[0] http://osgeo-org.1560.n6.nabble.com/Re-ArcGIS-vs-GRASS-notes-td4679637.html

Hi Pietro,
this is an interesting and very useful submission. A numpy matrix
interface would be a valuable addition, but as Markus Metz said, we
need also direct row access without reading the whole raster or vector
map into the memory.

Actually with the vtk-grass-bridge there is already an object
orientated interface to the grass raster, vector and voxel libraries
providing convenient high level access classes and methods for C++,
Java and Python[1]. You may orient on this interface when designing
the Python GRASS classes?

Best regards
Soeren

[1] http://code.google.com/p/vtk-grass-bridge

Example vector and raster processing in C++, Java and Python:
http://code.google.com/p/vtk-grass-bridge/source/browse/trunk/Modules/Cxx/v.sample.rast.cxx
http://code.google.com/p/vtk-grass-bridge/source/browse/trunk/Modules/Java/v_sample_rast.java
http://code.google.com/p/vtk-grass-bridge/source/browse/trunk/Modules/Python/v.sample.rast.py

vtk-grass-bridge Python unit tests:
http://code.google.com/p/vtk-grass-bridge/source/browse/trunk/Raster/Testing/Python/GRASSRasterMapReaderWriterTest.py
http://code.google.com/p/vtk-grass-bridge/source/browse/trunk/Vector/Testing/Python/GRASSVectorMapTopoReaderWriter.py
http://code.google.com/p/vtk-grass-bridge/source/browse/trunk/Raster3d/Testing/Python/GRASSRaster3dMapReaderWriterTest.py

2012/4/1 Pietro <peter.zamb@gmail.com>:

Hi everyone!

I'm Pietro Zambelli a ph.D student of Trento University, I would like
to apply to the GSoC, my idea in short is: extend the python GRASS API
to make it more pythonic :-).

I would like to interact with region, raster and vector maps as
object, using and interacting with the map and the other GRASS
functionality in a more higher and abstract way.

For people used to `numpy` I would like to interact with the map with
the same simplicity that I interact with matrix using `numpy`.

For curious people, that want to understand my insane idea I add a new
page in the Wiki [0] to try to explain what I mean. I have used the
python doctest [1]: to give an brief idea of how we could interact
with the new python module, and in case of consensus, as the basis for
a test-driven development approach for the GSoC.

There are many aspect that could be improved and that must be
resolved, I try to highlight possible problems. But, of course, I need
some help to find out more and to understand if the idea, is or not,
good to contribute to the great GRASS software.
Please feel free to critique, add new examples, raise problems,
doubts, etcetera. In particular I would like to invite all developers
to check to find incoherence/inconsistency with the GRASS and/or
Python philosophy[2].

Any suggestions and criticism are more than welcome!

Pietro

[http://grass.osgeo.org/wiki/GRASS_SoC_Ideas_2012/High_level_map_interaction\]
[http://docs.python.org/library/doctest.html\]
[http://www.python.org/dev/peps/pep-0020/\]
_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Hi Pietro,
more below:

2012/4/4 Pietro <peter.zamb@gmail.com>:

Hi Sören,

2012/4/4 Sören Gebbert <soerengebbert@googlemail.com>:

Hi Pietro,
this is an interesting and very useful submission. A numpy matrix
interface would be a valuable addition, but as Markus Metz said, we
need also direct row access without reading the whole raster or vector
map into the memory.

Exactly, I'm not interesting to the numpy interface, I'm interested to
use the grass data library in a more pythonic way, similar to numpy
but using all the smart things that GRASS have.

Actually with the vtk-grass-bridge there is already an object
orientated interface to the grass raster, vector and voxel libraries
providing convenient high level access classes and methods for C++,
Java and Python[1]. You may orient on this interface when designing
the Python GRASS classes?
[1] http://code.google.com/p/vtk-grass-bridge

Thank you for the link, I didn't know this interface. It seem quite
far from a python style: the PEP 8 style guide (Python) says
methodnames should be lowercase and that sometimes method_names may
have embedded underscores ...

[snip]

The VTK GRASS bridge uses the VTK style[1]. This assures the same
convincing coding style in
C++, Java and Python. Therefore it is not Python specific but it
integrates itself into the VTK
pipeline framework to use the more than 500 existing algorithms of VTK
for image, voxel and vector processing.

For example the Delaunay triangulation is now as simple as it can be:
{{{
# Init grass variables
init = vtkGRASSInit()
init.Init("VectorDelaunayTriangulation")

# Build the VTK pipeline
# This reader does not need topology information
reader = vtkGRASSVectorPolyDataReader()
reader.SetVectorName("elev_lid792_randpts")

# The Delaunay triangulation
delaunay = vtkDelaunay2D()
delaunay.SetInputConnection(reader.GetOutputPort())

# Start the processing (udpate) and write the resulting
# grass vector map into the grass database
writer = vtkGRASSVectorPolyDataWriter()
writer.SetVectorName("delaunay_triangulation")
writer.SetInputConnection(delaunay.GetOutputPort())
writer.BuildTopoOn()
writer.Update()
}}}

The idea why i am pointing you to this interface is not use use its
coding style, but to get an idea what
classes are needed and how they should be designed to access the low
level GRASS library functions.
You will have to think about this aspect too, since these access
methods are needed to implement
more complex and massive data processing ready algorithms. So maybe
the vtk-grass-bridge will give you
some interesting insights how the C-level of GRASS works?

Best regards
Soeren

[1] http://www.vtk.org/Wiki/VTK_Coding_Standards

btw.:
I included developer list in CC to get this discussion back to the list

Hi Sören,

2012/4/4 Sören Gebbert <soerengebbert@googlemail.com>:

[cut]
The VTK GRASS bridge uses the VTK style[1]. This assures the same
convincing coding style in
C++, Java and Python. Therefore it is not Python specific but it
integrates itself into the VTK
pipeline framework to use the more than 500 existing algorithms of VTK
for image, voxel and vector processing.

Thank you Sören. I didn't know!

[cut]
The idea why i am pointing you to this interface is not use use its
coding style, but to get an idea what
classes are needed and how they should be designed to access the low
level GRASS library functions.
You will have to think about this aspect too, since these access
methods are needed to implement
more complex and massive data processing ready algorithms. So maybe
the vtk-grass-bridge will give you
some interesting insights how the C-level of GRASS works?

Thank you! Although I'm new to C/C++, but I will give a closer look at
the source code of vtk-grass-bridge!
Do you think that this project could achieved during the Google Summer
of Code or do you think that is not realistic, because there are a lot
of things in the project idea that are missing?
Would you be interested in being my mentor?

Best regards.

Pietro

Hi Pietro,

... [snip]

Thank you! Although I'm new to C/C++, but I will give a closer look at
the source code of vtk-grass-bridge!
Do you think that this project could achieved during the Google Summer
of Code or do you think that is not realistic, because there are a lot
of things in the project idea that are missing?
Would you be interested in being my mentor?

I would like to be your Mentor in this project. But actually i do not
know the GSoC procedure in detail, maybe an
experienced mentor may help me with this?

But i have to warn you, in my opinion the Python interface should use
the C-library function using the ctype-wrapper
to implement higher level functionality. That means you need to have a
deep understanding about the core libraries
and how to use the library functions, since the ctype interface simply
wraps the C-library functions. But don't worry,
i have some knowledge about this and can help you in deep detail.

I would suggest to focus first on the core and raster libraries and
all of its features. That means to create Python classes
for important concepts like modules, raster maps, rows, regions,
categories, color, segmentation and so on.
Covering this we can think about the Pythonization of these concepts.
Very simple example:

{{{
import grass.script as grass
import grass.obj as obj

region = obj.region(n=10, s=0, e=10, w=0, res=0.01)

region.use_as_temp_region()
region.set_as_current()
print(region)

map1 = obj.raster("precip")
map2 = obj.raster("evapo")
map3 = obj.raster("recharge")

# Simple math operation as suggested in the GSoC wiki page
# but using the ctypes interface internally

map3 = map1 - map2

# Implementing the math using the map interfaces
# which wraps ctypes functionality

map1.open()
map2.open()
map3.open_as_new()

for i in range(region.rows):
    # map[i] returns an object of type raster_row
    map3[i] = map1[i] - map2[i]

# Copy categories and color table
map3.categories = map1.categories
map3.colors = map1.colors

map1.close()
map2.close()
map3.close()

# Opening a map in segment mode for random access
# would allow map[i][j] addressing

# Simple module chaining

slap = obj.module("r.slope.aspect")
rinfo = obj.module("r.info")

# All modules have inputs, flags and outputs
# generated from the xml module description
# Inputs and outputs are internally type checked
# Standard options are set by default

print(slap.description)
slap.in.elevation = "elevation" # string
slap.out.slope = "tmp" # string
slap.flag.a = True # boolean
slap.flag.overwrite = True # boolean
slap.run()

print(slap.exitstat) # integer
print(slap.stdout) # string
print(slap.stderr) # string

rinfo.in.input = slap.out.slope
rinfo.flag.g = True
rinfo.run()

kv = grass.parse_key_value(rinfo.stdout)

}}}

This is just a suggestion and can of course be modified.

With the focus on core and raster functionality i would say this is
doable in a GSoC project.

What do you think?

Best regards
Soeren

Best regards.

Pietro

Pietro wrote:

I'm Pietro Zambelli a ph.D student of Trento University, I would like
to apply to the GSoC, my idea in short is: extend the python GRASS API
to make it more pythonic :-).

I would like to interact with region, raster and vector maps as
object, using and interacting with the map and the other GRASS
functionality in a more higher and abstract way.

For people used to `numpy` I would like to interact with the map with
the same simplicity that I interact with matrix using `numpy`.

Note that the GRASS Python scripting library already has a module
(grass.script.array) for reading and writing raster maps as
memory-mapped NumPy arrays.

While this avoids the issue of reading entire maps into memory,
applying almost any NumPy operations to a memory-mapped array will
create an in-memory array.

At one point, I looked into implementing lazy evaluation, so that you
could use NumPy-style operations but with r.mapcalc-like sequential
I/O.

The general idea is that values would be expressions; performing
operations on expressions would yield new expressions, essentially
creating a tree describing the overall operation. Evaluation would be
deferred until the the result was actually required (e.g. writing an
output map), at which point the expression would be evaluated for
reasonably-sized blocks rather than for the entire map.

The main problem with this approach is that it isn't possible to wrap
it all up as a subclass of ndarray which could be passed to existing
NumPy functions. You would have to re-implement the entire API, even
if most of the functions can be reduced to a single expression (e.g.
once you create a ufunc class for lazy arrays, each individual ufunc
would just be an instance with the corresponding NumPy ufunc as a
parameter).

Also, making it look like NumPy may result in people expecting to be
able to use it like NumPy. E.g. performing a ufunc.reduce() operation
over the vertical axis could be made to work, but there's really no
way to make it practical for large maps.

--
Glynn Clements <glynn@gclements.plus.com>

Hi Sören,

2012/4/5 Sören Gebbert <soerengebbert@googlemail.com>:
... [snip]

Would you be interested in being my mentor?

I would like to be your Mentor in this project. But actually i do not
know the GSoC procedure in detail, maybe an
experienced mentor may help me with this?

But i have to warn you, in my opinion the Python interface should use
the C-library function using the ctype-wrapper
to implement higher level functionality. That means you need to have a
deep understanding about the core libraries
and how to use the library functions, since the ctype interface simply
wraps the C-library functions. But don't worry,
i have some knowledge about this and can help you in deep detail.

I'm happy to learn and to understand more on C-library functions!
:slight_smile:

I would suggest to focus first on the core and raster libraries and
all of its features. That means to create Python classes
for important concepts like modules, raster maps, rows, regions,
categories, color, segmentation and so on.
Covering this we can think about the Pythonization of these concepts.

We are going to implement something new, therefore I would like to be
consistent, or at least more consistent that we can with GRASS and
Python.
I think that using API examples to make test/doctest is one of the
best tool to highlight weakness and limits of the implementation.
Furthermore we can use this material as a starting point for the
development.

Very simple example:

...[snip]

slap = obj.module("r.slope.aspect")
slap.in.elevation = "elevation" # string
slap.out.slope = "tmp" # string
slap.flag.a = True # boolean
slap.flag.overwrite = True # boolean
slap.run()

I like your example, I think is clearer than what I wrote in the wikipage!
The only thing that I would like to make it different is the part on
the `module`
your idea it seems to be easier to implement, but it is more verbose...
But perhaps there is not a better solution...

With the focus on core and raster functionality i would say this is
doable in a GSoC project.

Ok I'm going to modify the project page on melange to follow your advice.

What do you think?

I think that you are pushing the idea forward and in the right direction.
Thanks

Best regards

Pietro