[GRASS-dev] Addons: Python error when checking if package is installed - r.randomforest

Hi,

I was checking why r.randomforest is not showing up on the Web site:
The log file gives an error here:

https://trac.osgeo.org/grass/browser/grass-addons/grass7/raster/r.randomforest/r.randomforest.py#L158
        grass.fatal("Python package <%s> not installed
(python-sklearn). Exiting" % module_name)

but the sklearn package *is* installed.

Trying it manually on the server:

neteler@osgeo6:~$ python
Python 2.7.9 (default, Mar 1 2015, 12:57:24)

import atexit, random, string, imp, re
imp.find_module("sklearn")

(None, '/usr/lib/python2.7/dist-packages/sklearn', ('', '', 5))

... how come?

Checking if it is really installed:
neteler@osgeo6:~$ ls /usr/lib/python2.7/dist-packages/
antlr.py lxml/ Pyste/
[...]
defusedxml-0.4.1.egg-info PIL/ six.pyc
dns/ PILcompat/ sklearn/

--> installed.

neteler@osgeo6:~$ ls /usr/lib/python2.7/dist-packages/sklearn/
base.py feature_extraction/ lda.pyc qda.py
base.pyc feature_selection/ linear_model/ qda.pyc
[...]

What's wrong on the server? Or is the test wrong?
(I noted that various GRASS scripts use different ways for this check,
the test for 3rd party packages should be standardized.)

Thanks for any hints.

Markus

On Mon, Jun 27, 2016 at 7:18 AM, Markus Neteler <neteler@osgeo.org> wrote:

(I noted that various GRASS scripts use different ways for this check,
the test for 3rd party packages should be standardized.)

I so far promoted lazy import* as a solution to these dependencies. The
important part is that import happens after calling grass.script.parser(),
so for manual and GUI, you don't need to have the dependencies at all. For
simple modules (with dependency usage limited to main() function), the
import** is done in main() after parser() call and try-except with
ImportError is used to catch the missing dependency and report an error to
the user (example: r.colors.cubehelix***). More complicated modules (with
explicit dependency usage outside of main() function) must be solved on a
case-to-case basis (examples: v.class.ml and v.class.mlpy).

Older discussions:

https://lists.osgeo.org/pipermail/grass-dev/2015-February/073734.html
https://lists.osgeo.org/pipermail/grass-dev/2016-March/079610.html

Examples:

https://trac.osgeo.org/grass/browser/grass-addons/grass7/raster/r.colors.cubehelix/r.colors.cubehelix.py#L186
https://trac.osgeo.org/grass/browser/grass-addons/grass7/raster/r.flexure/r.flexure.py?rev=64452
https://trac.osgeo.org/grass/changeset/66482/

Vaclav

* By lazy import I mean an import which is not done at the beginning of the
file, but somewhere in some function only right before it is actually
needed.

** I think that import with try-except is just more straightforward than
testing the presence of the module/package before the import. It also
follows "it is easier to ask for forgiveness than for permission" (EAFP),
so one can say that it is more Pythonic.

*** r.colors.cubehelix is actually not the basic example, because it
provides fallback when the dependency is not available and shows just
warning and an error but without exiting (uses warning() and error() and it
does not use not fatal()).

Thanks for this Vaclav. I’m confused as to why sklearn is not being found by testing for the presence of the module, but I can make an update to use the try-except method to get around this.

···

On Mon, Jun 27, 2016 at 8:00 AM, Vaclav Petras <wenzeslaus@gmail.com> wrote:

On Mon, Jun 27, 2016 at 7:18 AM, Markus Neteler <neteler@osgeo.org> wrote:

(I noted that various GRASS scripts use different ways for this check,
the test for 3rd party packages should be standardized.)

I so far promoted lazy import* as a solution to these dependencies. The important part is that import happens after calling grass.script.parser(), so for manual and GUI, you don’t need to have the dependencies at all. For simple modules (with dependency usage limited to main() function), the import** is done in main() after parser() call and try-except with ImportError is used to catch the missing dependency and report an error to the user (example: r.colors.cubehelix***). More complicated modules (with explicit dependency usage outside of main() function) must be solved on a case-to-case basis (examples: v.class.ml and v.class.mlpy).

Older discussions:

https://lists.osgeo.org/pipermail/grass-dev/2015-February/073734.html
https://lists.osgeo.org/pipermail/grass-dev/2016-March/079610.html

Examples:

https://trac.osgeo.org/grass/browser/grass-addons/grass7/raster/r.colors.cubehelix/r.colors.cubehelix.py#L186
https://trac.osgeo.org/grass/browser/grass-addons/grass7/raster/r.flexure/r.flexure.py?rev=64452
https://trac.osgeo.org/grass/changeset/66482/

Vaclav

  • By lazy import I mean an import which is not done at the beginning of the file, but somewhere in some function only right before it is actually needed.

** I think that import with try-except is just more straightforward than testing the presence of the module/package before the import. It also follows “it is easier to ask for forgiveness than for permission” (EAFP), so one can say that it is more Pythonic.

*** r.colors.cubehelix is actually not the basic example, because it provides fallback when the dependency is not available and shows just warning and an error but without exiting (uses warning() and error() and it does not use not fatal()).

Hi,

On Mon, Jun 27, 2016 at 1:18 PM, Markus Neteler <neteler@osgeo.org> wrote:

Hi,

I was checking why r.randomforest is not showing up on the Web site:

FWIW:

After my installation of the needed Python packages on the server +
Steven's fixes in the Python script it now shows up:
https://grass.osgeo.org/grass70/manuals/addons/r.randomforest.html

Congrats,
Markus

Thank you Markus and Vaclav for all of the suggestions and I’m glad that it appears to be working now.

Unfortunately I did not think far enough ahead when I named ‘r.randomforest’. Once I have a relatively stable template for the machine learning classification/regression, I was planning to implement some other methods in the scikit-learn toolbox that are commonly applied in the geoscientific, realm, e.g., logistic regression, svm, boosted regression trees, nearest neighbor, calling these r.scikit.randomforest, r.scikit.svn, r.scikit.nn, etc.

Alternatively I could incorporate them within the same add on so that there are several different tabs for the parameters that are associated with each method, and have a ‘classifier’ option in the required section. I’m guessing that this is the best option although its might still necessitate a name change to something like ‘r.scikit.learn’.

Steve

On Jul 8, 2016, at 3:04 AM, Markus Neteler <neteler@osgeo.org> wrote:

FWIW