[GRASS-user] scripting efficiency

hi GRASS community!

I’m amazed how well this software handles vector operations - especially the overlay operation seems unparalleled in open source software. Thank you very much to everyone who has been involved in the development process!

I would like to script a workflow where I apply the same set of operations on a few hundred sets of shapefiles, consisting of v.in.ogr, several sets of v.overlay, some database operations and v.out.ogr. The shapefiles are 20-30MB apiece, containing many polygons, each with many vertices.

Is there a difference in speed or processor efficiency between the different scripting approaches? By which I mean python vs bash shell, and within the GRASS environment vs calling the functions from outside the environment (like via python grass.script).

Thank you for any opinions or advice!

-Wiley

On 16/10/13 07:59, Wiley Bogren wrote:

hi GRASS community!

I'm amazed how well this software handles vector operations - especially
the overlay operation seems unparalleled in open source software. Thank
you very much to everyone who has been involved in the development process!

I would like to script a workflow where I apply the same set of
operations on a few hundred sets of shapefiles, consisting of v.in.ogr,
several sets of v.overlay, some database operations and v.out.ogr. The
shapefiles are 20-30MB apiece, containing many polygons, each with many
vertices.

Is there a difference in speed or processor efficiency between the
different scripting approaches? By which I mean python vs bash shell,
and within the GRASS environment vs calling the functions from outside
the environment (like via python grass.script).

Thank you for any opinions or advice!

I see that no one ever answered this. AFAIK, speed will depend on how much you do in the script and how much you have GRASS modules do. The latter don't change their speed depending on the scripting language calling them, so if you only use the script to link different GRASS modules, I would think that time difference will be negligible between languages.

Moritz

[Please leave discussions on the list]

On 18/10/13 15:40, Wiley Bogren wrote:

Thank you. Just to check, there's nothing like a startup cost each time
you call grass from an external (bash or python) script?

I think this "startup" cost is minimal, as all that "starting up" GRASS actiually means is setting a few environment variables.

Moritz

-Wiley

On Fri, Oct 18, 2013 at 3:00 AM, Moritz Lennert
<mlennert@club.worldonline.be <mailto:mlennert@club.worldonline.be>> wrote:

    On 16/10/13 07:59, Wiley Bogren wrote:

        hi GRASS community!

        I'm amazed how well this software handles vector operations -
        especially
        the overlay operation seems unparalleled in open source
        software. Thank
        you very much to everyone who has been involved in the
        development process!

        I would like to script a workflow where I apply the same set of
        operations on a few hundred sets of shapefiles, consisting of
        v.in.ogr,
        several sets of v.overlay, some database operations and
        v.out.ogr. The
        shapefiles are 20-30MB apiece, containing many polygons, each
        with many
        vertices.

        Is there a difference in speed or processor efficiency between the
        different scripting approaches? By which I mean python vs bash
        shell,
        and within the GRASS environment vs calling the functions from
        outside
        the environment (like via python grass.script).

        Thank you for any opinions or advice!

    I see that no one ever answered this. AFAIK, speed will depend on
    how much you do in the script and how much you have GRASS modules
    do. The latter don't change their speed depending on the scripting
    language calling them, so if you only use the script to link
    different GRASS modules, I would think that time difference will be
    negligible between languages.

    Moritz

Hi Moritz,

Thank you. Just to check, there’s nothing like a startup cost each time you call grass from an external (bash or python) script?

-Wiley

···

On Fri, Oct 18, 2013 at 3:00 AM, Moritz Lennert <mlennert@club.worldonline.be> wrote:

On 16/10/13 07:59, Wiley Bogren wrote:

hi GRASS community!

I’m amazed how well this software handles vector operations - especially
the overlay operation seems unparalleled in open source software. Thank
you very much to everyone who has been involved in the development process!

I would like to script a workflow where I apply the same set of
operations on a few hundred sets of shapefiles, consisting of v.in.ogr,
several sets of v.overlay, some database operations and v.out.ogr. The
shapefiles are 20-30MB apiece, containing many polygons, each with many
vertices.

Is there a difference in speed or processor efficiency between the
different scripting approaches? By which I mean python vs bash shell,
and within the GRASS environment vs calling the functions from outside
the environment (like via python grass.script).

Thank you for any opinions or advice!

I see that no one ever answered this. AFAIK, speed will depend on how much you do in the script and how much you have GRASS modules do. The latter don’t change their speed depending on the scripting language calling them, so if you only use the script to link different GRASS modules, I would think that time difference will be negligible between languages.

Moritz

Hi Wiley,

On Wed, Oct 16, 2013 at 6:59 AM, Wiley Bogren <wiley.bogren@gmail.com> wrote:

hi GRASS community!

I'm amazed how well this software handles vector operations - especially the
overlay operation seems unparalleled in open source software. Thank you
very much to everyone who has been involved in the development process!

I would like to script a workflow where I apply the same set of operations
on a few hundred sets of shapefiles, consisting of v.in.ogr, several sets of
v.overlay, some database operations and v.out.ogr. The shapefiles are
20-30MB apiece, containing many polygons, each with many vertices.

Is there a difference in speed or processor efficiency between the different
scripting approaches? By which I mean python vs bash shell, and within the
GRASS environment vs calling the functions from outside the environment
(like via python grass.script).

Sorry for the late response...
I've imported several files, using a multiprocessing approach in python, with:

{{{
from multiprocessing import Queue, Process, cpu_count
from os.path import split
from subprocess import Popen

from grass.pygrass.functions import findfiles

def spawn(func):
    def fun(q_in, q_out):
        while True:
            path, cmdstr = q_in.get()
            if path is None:
                break
            q_out.put(func(path, cmdstr))
    return fun

def mltp_importer(dirpath, match, cmdstr, func, nprocs=cpu_count()):
    q_in = Queue(1)
    q_out = Queue()
    procs = [Process(target=spawn(func), args=(q_in, q_out))
             for _ in range(nprocs)]
    for proc in procs:
        proc.daemon = True
        proc.start()

    # set the parameters
    sent = [q_in.put((path, cmdstr)) for path in findfiles(dirpath, match)]
    # set the end of the cycle
    [q_in.put((None, None)) for proc in procs]
    [proc.join() for proc in procs]
    return [q_out.get() for _ in range(len(sent))]

def importer(path, cmdstr):
    name = split(path)[-1][:-4]
    popen = Popen(cmdstr.format(path=path, name=name), shell=True)
    popen.wait()
    return path, name, False if popen.returncode else True

DIR = '/data/gis/data/Aviemore/shp'
CMD = 'v.in.ogr dsn={path} layer={name} output={name} -o --o'

processed = mltp_importer(DIR, '*.shp', CMD, importer)
# check for errors
errors = [p for p in processed if not p[2]]
if errors:
    # do something
    pass
}}}

I hope that this could help you...
the code is freely inspired by: http://stackoverflow.com/a/16071616

On Fri, Oct 18, 2013 at 5:27 PM, Pietro <peter.zamb@gmail.com> wrote:

On Wed, Oct 16, 2013 at 6:59 AM, Wiley Bogren <wiley.bogren@gmail.com> wrote:

I'm amazed how well this software handles vector operations - especially the
overlay operation seems unparalleled in open source software. Thank you
very much to everyone who has been involved in the development process!

I would like to script a workflow where I apply the same set of operations
on a few hundred sets of shapefiles,

...

I've imported several files, using a multiprocessing approach in python, with:

For the record: I have added Pietro's cool example to

http://grasswiki.osgeo.org/wiki/Python/pygrass#Sample_PyGRASS_scripts

Markus