[Geoserver-devel] WPS getting asynchronous execution

Hi,
in the next weeks I'll be working to add asynchronous execution
support for GeoServer.
I'd like to give you some heads up and discuss some design details.

The specification allows asynchronous requests when the caller asks for
storeResponse=true and status=true, meaning the actual response document
is stored somewhere and the status contained in it is updated while the process
proceeds.
(by spec, If status="true" and storeExecuteResponse is "false" then the service
shall raise an exception)

The location of the document is reported in the execute response and then
shall be updated while the process performs the computation.
The spec does not say where this document should be located, but for ease
of implementation I propose to make it into another service call, looking like:

wps?service=WPS&version=1.0&request=executionStatus&identifier=xyz

This makes it rather natural to implement with our current framework.

Process wise, we already have the factories pass down a ProgressListener
among the call arguments, so the process can update its status.

Now, how to handle the process asynch execution and tracking?
I was thinking to have a ProcessManager interface that the WPS service
code submits processes to and can ask about their status too.
It might look roughly like this:

interface ProcessManager {
  /**
   * Submits the process for execution, returns a id to refer to the
execution later
   */
  String submit(String processName, Map<String, Object> inputs);

  Status getStatus(String executionId);
}

Where Status is:

Status {
  StatusType status; /* queued, paused, executing, complete, ... */
  double progress;
  Map<String, Object> output;
}

The default implementation of process manager would use a fixed size thread
pool, callable and futures to handle the execution, but the interface will allow
to plugin (from spring context) other custom managers.
For example people might want to roll very long processes (several
hours or more)
that can be restarted from known checkpoints, in that case the manager
would also
need persistent storage of the processes in flight to allow the same
to be resumed in case
of a crash and restart of the WPS server, or enforce their own
particular execution
policies (e.g, link priority and amount of processes executed to the user).

Now, the above might look fine but there is a trouble: streaming execution and
result persistence.

Streaming execution means that most vector processes, and raster ones too,
calculate the result as data gets pulled from them via iterators or tile access,
so the process will actually exit from asynch execution without having computed
anything, and potentially taking its dear time to actually compute
when the results
are finally accessed.
Also, the inputs might not be there anymore when the result is being accessed
(think a source layer that was removed in the meantime).

If the result is not streaming, but fully loaded in memory, there is
the problem of
how many results we can keep in memory (and for how long, this should
be configurable
too).

Long story short, imho we want to write out the results on disk as
soon as possible, and
I guess include that into the "execution" phase from the user pont of view.

This changes the process manager, which at this point should take care of laying
out the results on disk and returning not a map of outputs when the
process is done,
but a link to the file that contains the response xml, which in turn might link
to other documents, which happens if the user asked the output to be returned
as references (common if you are generating a tiff, you probably would
not like it
being base64 encoded inline in the xml).

This "lay out on the disk" part would be pretty common among various
implementations
so I guess I'll make a helper object for that part that various
ProcessManager implementations
can reuse.

Opinions, suggestions?

Cheers
Andrea

--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------

Sounds pretty interesting. The general approach all sounds good. Although you lost me at the part about streaming and result persistence. Admittedly I know very little about how the WPS works. How is it a process will exit without doing anything? Is the data not being “pulled” in that thread?

Aside from that, there is a general comment that the number of components that do their own thread/job/task management seems to be growing… Although a few them are still just community modules. I wonder if its worth at some point trying to come up with a central task manager of sorts. If the task manager you come up with for WPS processes is generic enough it might be worth trying to throw it in the core for other components to use.

On Fri, Oct 28, 2011 at 2:44 PM, Andrea Aime <andrea.aime@anonymised.com> wrote:

Hi,
in the next weeks I’ll be working to add asynchronous execution
support for GeoServer.
I’d like to give you some heads up and discuss some design details.

The specification allows asynchronous requests when the caller asks for
storeResponse=true and status=true, meaning the actual response document
is stored somewhere and the status contained in it is updated while the process
proceeds.
(by spec, If status=“true” and storeExecuteResponse is “false” then the service
shall raise an exception)

The location of the document is reported in the execute response and then
shall be updated while the process performs the computation.
The spec does not say where this document should be located, but for ease
of implementation I propose to make it into another service call, looking like:

wps?service=WPS&version=1.0&request=executionStatus&identifier=xyz

This makes it rather natural to implement with our current framework.

Process wise, we already have the factories pass down a ProgressListener
among the call arguments, so the process can update its status.

Now, how to handle the process asynch execution and tracking?
I was thinking to have a ProcessManager interface that the WPS service
code submits processes to and can ask about their status too.
It might look roughly like this:

interface ProcessManager {
/**

  • Submits the process for execution, returns a id to refer to the
    execution later
    */
    String submit(String processName, Map<String, Object> inputs);

Status getStatus(String executionId);
}

Where Status is:

Status {
StatusType status; /* queued, paused, executing, complete, … */
double progress;
Map<String, Object> output;
}

The default implementation of process manager would use a fixed size thread
pool, callable and futures to handle the execution, but the interface will allow
to plugin (from spring context) other custom managers.
For example people might want to roll very long processes (several
hours or more)
that can be restarted from known checkpoints, in that case the manager
would also
need persistent storage of the processes in flight to allow the same
to be resumed in case
of a crash and restart of the WPS server, or enforce their own
particular execution
policies (e.g, link priority and amount of processes executed to the user).

Now, the above might look fine but there is a trouble: streaming execution and
result persistence.

Streaming execution means that most vector processes, and raster ones too,
calculate the result as data gets pulled from them via iterators or tile access,
so the process will actually exit from asynch execution without having computed
anything, and potentially taking its dear time to actually compute
when the results
are finally accessed.
Also, the inputs might not be there anymore when the result is being accessed
(think a source layer that was removed in the meantime).

If the result is not streaming, but fully loaded in memory, there is
the problem of
how many results we can keep in memory (and for how long, this should
be configurable
too).

Long story short, imho we want to write out the results on disk as
soon as possible, and
I guess include that into the “execution” phase from the user pont of view.

This changes the process manager, which at this point should take care of laying
out the results on disk and returning not a map of outputs when the
process is done,
but a link to the file that contains the response xml, which in turn might link
to other documents, which happens if the user asked the output to be returned
as references (common if you are generating a tiff, you probably would
not like it
being base64 encoded inline in the xml).

This “lay out on the disk” part would be pretty common among various
implementations
so I guess I’ll make a helper object for that part that various
ProcessManager implementations
can reuse.

Opinions, suggestions?

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf



The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@anonymised.com Self-Assessment and learn
about Cisco certifications, training, and career opportunities.
http://p.sf.net/sfu/cisco-dev2dev


Geoserver-devel mailing list
Geoserver-devel@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

On Fri, Oct 28, 2011 at 5:06 PM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

Sounds pretty interesting. The general approach all sounds good. Although
you lost me at the part about streaming and result persistence. Admittedly I
know very little about how the WPS works. How is it a process will exit
without doing anything? Is the data not being "pulled" in that thread?

Nope, good processes return a "processing feature collection", which actually
is an empty container which will process the results only as you iterate over it
(to scale up, if we keep all of the results in memory we'll OOM with large
result set).
So many process return instantly, they actually start doing stuff as you call
next() on the iterator (possibly only computing the next feature), which happens
only while encoding the results in xml or whatever chosen output format.

Aside from that, there is a general comment that the number of components
that do their own thread/job/task management seems to be growing... Although
a few them are still just community modules. I wonder if its worth at some
point trying to come up with a central task manager of sorts. If the task
manager you come up with for WPS processes is generic enough it might be
worth trying to throw it in the core for other components to use.

I see... hum... due to handling the encoding and some other WPS specific
stuff I do not believe it's reusable.

I believe that if we see significant similarities between the various approaches
we can roll a base/helper/reusable class later

Cheers
Andrea

--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------

Good stuff Andrea.
FWIW, you may want to look at some prior art:
<https://github.com/GeoNode/geonode/tree/synth/src/geoserver-geonode-ext/src/main/java/org/geonode/process/control&gt;
<https://github.com/GeoNode/geonode/tree/synth/src/geoserver-geonode-ext/src/main/java/org/geonode/process/batchdownload&gt;

That never found the funding to be pushed onto wps, but was thought
with that in mind.

Cheers,
Gabriel
On Fri, Oct 28, 2011 at 10:44 AM, Andrea Aime
<andrea.aime@anonymised.com> wrote:

Hi,
in the next weeks I'll be working to add asynchronous execution
support for GeoServer.
I'd like to give you some heads up and discuss some design details.

The specification allows asynchronous requests when the caller asks for
storeResponse=true and status=true, meaning the actual response document
is stored somewhere and the status contained in it is updated while the process
proceeds.
(by spec, If status="true" and storeExecuteResponse is "false" then the service
shall raise an exception)

The location of the document is reported in the execute response and then
shall be updated while the process performs the computation.
The spec does not say where this document should be located, but for ease
of implementation I propose to make it into another service call, looking like:

wps?service=WPS&version=1.0&request=executionStatus&identifier=xyz

This makes it rather natural to implement with our current framework.

Process wise, we already have the factories pass down a ProgressListener
among the call arguments, so the process can update its status.

Now, how to handle the process asynch execution and tracking?
I was thinking to have a ProcessManager interface that the WPS service
code submits processes to and can ask about their status too.
It might look roughly like this:

interface ProcessManager {
/**
* Submits the process for execution, returns a id to refer to the
execution later
*/
String submit(String processName, Map<String, Object> inputs);

Status getStatus(String executionId);
}

Where Status is:

Status {
StatusType status; /* queued, paused, executing, complete, ... */
double progress;
Map<String, Object> output;
}

The default implementation of process manager would use a fixed size thread
pool, callable and futures to handle the execution, but the interface will allow
to plugin (from spring context) other custom managers.
For example people might want to roll very long processes (several
hours or more)
that can be restarted from known checkpoints, in that case the manager
would also
need persistent storage of the processes in flight to allow the same
to be resumed in case
of a crash and restart of the WPS server, or enforce their own
particular execution
policies (e.g, link priority and amount of processes executed to the user).

Now, the above might look fine but there is a trouble: streaming execution and
result persistence.

Streaming execution means that most vector processes, and raster ones too,
calculate the result as data gets pulled from them via iterators or tile access,
so the process will actually exit from asynch execution without having computed
anything, and potentially taking its dear time to actually compute
when the results
are finally accessed.
Also, the inputs might not be there anymore when the result is being accessed
(think a source layer that was removed in the meantime).

If the result is not streaming, but fully loaded in memory, there is
the problem of
how many results we can keep in memory (and for how long, this should
be configurable
too).

Long story short, imho we want to write out the results on disk as
soon as possible, and
I guess include that into the "execution" phase from the user pont of view.

This changes the process manager, which at this point should take care of laying
out the results on disk and returning not a map of outputs when the
process is done,
but a link to the file that contains the response xml, which in turn might link
to other documents, which happens if the user asked the output to be returned
as references (common if you are generating a tiff, you probably would
not like it
being base64 encoded inline in the xml).

This "lay out on the disk" part would be pretty common among various
implementations
so I guess I'll make a helper object for that part that various
ProcessManager implementations
can reuse.

Opinions, suggestions?

Cheers
Andrea

--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@anonymised.com Self-Assessment and learn
about Cisco certifications, training, and career opportunities.
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

On Fri, Oct 28, 2011 at 8:00 PM, Gabriel Roldan <groldan@anonymised.com> wrote:

Good stuff Andrea.
FWIW, you may want to look at some prior art:
<https://github.com/GeoNode/geonode/tree/synth/src/geoserver-geonode-ext/src/main/java/org/geonode/process/control&gt;
<https://github.com/GeoNode/geonode/tree/synth/src/geoserver-geonode-ext/src/main/java/org/geonode/process/batchdownload&gt;

That never found the funding to be pushed onto wps, but was thought
with that in mind.

Interesting. Questions:
- I see you have a "cancelled" status that is not part of the wps
specification, though I believe it makes
  sense if one day we want to expose a way for the admin to terminate
a running process
- why submit a synchronous process? You would queue and pool it anyways?

As far as I can see the interface does not address the issue of
processes exiting right away
and giving back a "processing collection" (one that actually generates
results as the caller scrolls over it).

Cheers
Andrea

--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------

Hi,
in the next weeks I’ll be working to add asynchronous execution

I’d like to give you some heads up and discuss some design details.

support for GeoServer.

That is actually a pretty big job; there is some support in GeoTools for “ProcessExecutor” with the idea
that GeoServer should provide its own implementation (with suitable ability to audit currently running processes
and kill any that get out of control).

In particular a ProcessExecutor works with a subclass of Future called Progress. The Progress object allows status to be returned; with an eye towards fulfilling the Async response document.

If it is not too much trouble I would encourage you to stick with this framework (with a custom GeoServer implementation). Mostly so processes that do chaining can be written at the GeoTools API
level; and as long as they make use of the GeoServer provided ProcessExecutor we can still see what is going on.

The specification allows asynchronous requests when the caller asks for
storeResponse=true and status=true, meaning the actual response document
is stored somewhere and the status contained in it is updated while the process
proceeds.
(by spec, If status=“true” and storeExecuteResponse is “false” then the service
shall raise an exception)

I was under the impression you could also ask that a result be stored to an FTP site or something. Perhaps that is part of “storedRespose=true”.

The location of the document is reported in the execute response and then
shall be updated while the process performs the computation.
The spec does not say where this document should be located, but for ease
of implementation I propose to make it into another service call, looking like:

wps?service=WPS&version=1.0&request=executionStatus&identifier=xyz

This makes it rather natural to implement with our current framework.

See above; I was expecting the ProcessExecutor to track this information; and the Progress data structure be the core of this response document.

Process wise, we already have the factories pass down a ProgressListener
among the call arguments, so the process can update its status.

Now, how to handle the process asynch execution and tracking?
I was thinking to have a ProcessManager interface that the WPS service
code submits processes to and can ask about their status too.
It might look roughly like this:

interface ProcessManager {
/**

  • Submits the process for execution, returns a id to refer to the
    execution later
    */
    String submit(String processName, Map<String, Object> inputs);

Status getStatus(String executionId);
}

Where Status is:

Status {
StatusType status; /* queued, paused, executing, complete, … */
double progress;
Map<String, Object> output;
}

The default implementation of process manager would use a fixed size thread
pool, callable and futures to handle the execution, but the interface will allow
to plugin (from spring context) other custom managers.
For example people might want to roll very long processes (several
hours or more)
that can be restarted from known checkpoints, in that case the manager
would also
need persistent storage of the processes in flight to allow the same
to be resumed in case
of a crash and restart of the WPS server, or enforce their own
particular execution
policies (e.g, link priority and amount of processes executed to the user).

The design is the same; ProcessManager extends ProcessExecutor; Status extends Progress.

Now, the above might look fine but there is a trouble: streaming execution and
result persistence.

Streaming execution means that most vector processes, and raster ones too,
calculate the result as data gets pulled from them via iterators or tile access,
so the process will actually exit from asynch execution without having computed
anything, and potentially taking its dear time to actually compute
when the results
are finally accessed.
Also, the inputs might not be there anymore when the result is being accessed
(think a source layer that was removed in the meantime).

If the result is not streaming, but fully loaded in memory, there is
the problem of
how many results we can keep in memory (and for how long, this should
be configurable
too).

Long story short, imho we want to write out the results on disk as
soon as possible, and
I guess include that into the “execution” phase from the user pont of view.

This changes the process manager, which at this point should take care of laying
out the results on disk and returning not a map of outputs when the
process is done,
but a link to the file that contains the response xml, which in turn might link
to other documents, which happens if the user asked the output to be returned
as references (common if you are generating a tiff, you probably would
not like it
being base64 encoded inline in the xml).

This “lay out on the disk” part would be pretty common among various
implementations
so I guess I’ll make a helper object for that part that various
ProcessManager implementations
can reuse.

Opinions, suggestions?

Exciting that this work is being done; let us try and compare notes to prevent duplication.

Jody

On Sat, Oct 29, 2011 at 1:36 PM, Jody Garnett <jody.garnett@anonymised.com> wrote:

Hi,
in the next weeks I'll be working to add asynchronous execution
I'd like to give you some heads up and discuss some design details.
support for GeoServer.

That is actually a pretty big job; there is some support in GeoTools for
"ProcessExecutor" with the idea
that GeoServer should provide its own implementation (with suitable ability
to audit currently running processes
and kill any that get out of control).
In particular a ProcessExecutor works with a subclass of Future called
Progress. The Progress object allows status to be returned; with an eye
towards fulfilling the Async response document.
If it is not too much trouble I would encourage you to stick with this
framework (with a custom GeoServer implementation). Mostly so processes that
do chaining can be written at the GeoTools API
level; and as long as they make use of the GeoServer provided
ProcessExecutor we can still see what is going on.

Looked at that, seems quite a bit more than what I need, and harder to use
from the point of view of polling.
I'll stick to something more similar to what Gabriel did.

The specification allows asynchronous requests when the caller asks for
storeResponse=true and status=true, meaning the actual response document
is stored somewhere and the status contained in it is updated while the
process
proceeds.
(by spec, If status="true" and storeExecuteResponse is "false" then the
service
shall raise an exception)

I was under the impression you could also ask that a result be stored to an
FTP site or something. Perhaps that is part of "storedRespose=true".

The spec says you have to return a URL with the location of the document,
a FTP is unlikely what they had in mind.
There is a sequence diagram showing a FTP server for the results, but the
response to be updated is shown as stored on a HTTP server.
Both of them seem highly overkill to me in the simple case, though being
able to store the results (especially large ones) on a FTP server could
be a nice idea since it allows to restart downloads.
Anyways, a problem for another day, I'm not getting there right now :slight_smile:

The location of the document is reported in the execute response and then
shall be updated while the process performs the computation.
The spec does not say where this document should be located, but for ease
of implementation I propose to make it into another service call, looking
like:
wps?service=WPS&version=1.0&request=executionStatus&identifier=xyz
This makes it rather natural to implement with our current framework.

See above; I was expecting the ProcessExecutor to track this information;
and the Progress data structure be the core of this response document.

Process wise, we already have the factories pass down a ProgressListener
among the call arguments, so the process can update its status.

Yep, that's what I want to use. But Progress is no good, does not allow
for cancellation, does not have the notion of "paused" state
and forces the usage of Future, when doing restartable processes
against persistent storage and checkpoints people should be free to
use whatever they want imho.
A progress listener is all we need.

Now, how to handle the process asynch execution and tracking?
I was thinking to have a ProcessManager interface that the WPS service
code submits processes to and can ask about their status too.
It might look roughly like this:
interface ProcessManager {
/**
* Submits the process for execution, returns a id to refer to the
execution later
*/
String submit(String processName, Map<String, Object> inputs);
Status getStatus(String executionId);
}
Where Status is:
Status {
StatusType status; /* queued, paused, executing, complete, ... */
double progress;
Map<String, Object> output;
}
The default implementation of process manager would use a fixed size thread
pool, callable and futures to handle the execution, but the interface will
allow
to plugin (from spring context) other custom managers.
For example people might want to roll very long processes (several
hours or more)
that can be restarted from known checkpoints, in that case the manager
would also
need persistent storage of the processes in flight to allow the same
to be resumed in case
of a crash and restart of the WPS server, or enforce their own
particular execution
policies (e.g, link priority and amount of processes executed to the user).

The design is the same; ProcessManager extends ProcessExecutor; Status
extends Progress.

As said above, the interfaces over there do not seem suitable for what I need,
I need interfaces that talk about what can be done in a WPS server allow
for polling state and cancellation.
Afaik the GeoTools ones are probably suitable for a desktop tool instead, but
in practice nothing is using them?
If so maybe we can do like with the processes, port back reusable portions
to GeoTools once the dust is settled.

In any case I'll keep them on the desk along with Gabriel work and see what
I can learn from them while building asynch support for GeoServer

Cheers
Andrea

--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------

Looked at that, seems quite a bit more than what I need, and harder to use

from the point of view of polling.
I’ll stick to something more similar to what Gabriel did.

Not sure I understand that one; you can poll for the current progress? The progress listener is used between the executing code (if it has actually started) and the value stored in the Progress for other code to check on.

See above; I was expecting the ProcessExecutor to track this information;
and the Progress data structure be the core of this response document.

Process wise, we already have the factories pass down a ProgressListener
among the call arguments, so the process can update its status.

Yep, that’s what I want to use. But Progress is no good, does not allow
for cancellation, does not have the notion of “paused” state
and forces the usage of Future, when doing restartable processes
against persistent storage and checkpoints people should be free to
use whatever they want imho.

Good point out “paused”; it does actually allow cancelation last I checked.
In any case I have not started using ProcessExecutor & Progress yet myself
so if a solution is going to be in flux for a while I will hold off taking gt-process
to supported status.

A progress listener is all we need.

Not strictly true as we need something to hold the result (or reference to the result).

The design is the same; ProcessManager extends ProcessExecutor; Status
extends Progress.

As said above, the interfaces over there do not seem suitable for what I need,

I need interfaces that talk about what can be done in a WPS server allow
for polling state and cancellation.

Those two are covered; it is the pausing that is not covered.

Afaik the GeoTools ones are probably suitable for a desktop tool instead, but
in practice nothing is using them?

correct; I have not used them in a desktop tool yet either.

If so maybe we can do like with the processes, port back reusable portions
to GeoTools once the dust is settled.

In any case I’ll keep them on the desk along with Gabriel work and see what
I can learn from them while building asynch support for GeoServer

No worries; what is your timeframe Andrea? I kind of want this stuff settled before
I sink more work into it.

Jody

On Sat, Oct 29, 2011 at 4:09 PM, Jody Garnett <jody.garnett@anonymised.com> wrote:

Looked at that, seems quite a bit more than what I need, and harder to use
from the point of view of polling.
I'll stick to something more similar to what Gabriel did.

Not sure I understand that one; you can poll for the current progress? The
progress listener is used between the executing code (if it has actually
started) and the value stored in the Progress for other code to check on.

See above; I was expecting the ProcessExecutor to track this information;
and the Progress data structure be the core of this response document.
Process wise, we already have the factories pass down a ProgressListener
among the call arguments, so the process can update its status.

Yep, that's what I want to use. But Progress is no good, does not allow
for cancellation, does not have the notion of "paused" state
and forces the usage of Future, when doing restartable processes
against persistent storage and checkpoints people should be free to
use whatever they want imho.

Good point out "paused"; it does actually allow cancelation last I checked.

Ah I see what you mean, Future has cancellation support, though I don't want
to support the boolean argument its provided with.
Still it has at the same time too much and not enough.

The interface forces the following:

    V get(long timeout, TimeUnit unit)

which one might get for free if the thread control is managed inside the VM,
but only gets in the way otherwise (e.g., it's one more thing to implement
that provides no value WPS wise).
I want to be able to support a situation in which the WPS server is
just a frontend
for processes that do run off VM, maybe in a grid, where the progress manager
would actually offload the whole process and not run anything locally,
with the ability to reattach to the process running remotely even in case
of WPS server restart.
I don't want to tie myself to an API that assumes the process is running
locally using java.util.concurrent: although that will be the default
implementation,
I want to be free to support other variations.

In any case I have not started using ProcessExecutor & Progress yet myself
so if a solution is going to be in flux for a while I will hold off taking
gt-process
to supported status.

A progress listener is all we need.

Not strictly true as we need something to hold the result (or reference to
the result).

I should have said "a progress listener is all we need status wise"

The design is the same; ProcessManager extends ProcessExecutor; Status
extends Progress.

As said above, the interfaces over there do not seem suitable for what I
need,
I need interfaces that talk about what can be done in a WPS server allow
for polling state and cancellation.

Those two are covered; it is the pausing that is not covered.

How do I know if a process is queued or running?
Assuming getProgress()=0% implies queued is really lame, what if the process
is very long and the % does not go up for 10 minutes straight, or because of
its structure the process just reports status in very discrete steps?

Besides, as said, an interface that returns me with the map of outputs
won't take into account the need to generate the results on disk before
returning (see my first mail about streaming processes).
What I need here is some WPS server specific way to handle this,
otherwise streaming processes will just return at once without having
compute a thing and will start loading the server only as the external
output form (gml, shapefile, tiff) is being written.
These processes do run in the same vm as the WPS, but also pretty much
defeat the progress control, as said, they return instantly, the real
work is done later.

Afaik the GeoTools ones are probably suitable for a desktop tool instead,
but
in practice nothing is using them?

correct; I have not used them in a desktop tool yet either.

If so maybe we can do like with the processes, port back reusable portions
to GeoTools once the dust is settled.
In any case I'll keep them on the desk along with Gabriel work and see what
I can learn from them while building asynch support for GeoServer

No worries; what is your timeframe Andrea? I kind of want this stuff settled
before
I sink more work into it.

I need to be able to start working on this Monday and close up later in the week
or at most beginning of the next.

Cheers
Andrea

--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------