Hi,
in the next weeks I'll be working to add asynchronous execution
support for GeoServer.
I'd like to give you some heads up and discuss some design details.
The specification allows asynchronous requests when the caller asks for
storeResponse=true and status=true, meaning the actual response document
is stored somewhere and the status contained in it is updated while the process
proceeds.
(by spec, If status="true" and storeExecuteResponse is "false" then the service
shall raise an exception)
The location of the document is reported in the execute response and then
shall be updated while the process performs the computation.
The spec does not say where this document should be located, but for ease
of implementation I propose to make it into another service call, looking like:
wps?service=WPS&version=1.0&request=executionStatus&identifier=xyz
This makes it rather natural to implement with our current framework.
Process wise, we already have the factories pass down a ProgressListener
among the call arguments, so the process can update its status.
Now, how to handle the process asynch execution and tracking?
I was thinking to have a ProcessManager interface that the WPS service
code submits processes to and can ask about their status too.
It might look roughly like this:
interface ProcessManager {
/**
* Submits the process for execution, returns a id to refer to the
execution later
*/
String submit(String processName, Map<String, Object> inputs);
Status getStatus(String executionId);
}
Where Status is:
Status {
StatusType status; /* queued, paused, executing, complete, ... */
double progress;
Map<String, Object> output;
}
The default implementation of process manager would use a fixed size thread
pool, callable and futures to handle the execution, but the interface will allow
to plugin (from spring context) other custom managers.
For example people might want to roll very long processes (several
hours or more)
that can be restarted from known checkpoints, in that case the manager
would also
need persistent storage of the processes in flight to allow the same
to be resumed in case
of a crash and restart of the WPS server, or enforce their own
particular execution
policies (e.g, link priority and amount of processes executed to the user).
Now, the above might look fine but there is a trouble: streaming execution and
result persistence.
Streaming execution means that most vector processes, and raster ones too,
calculate the result as data gets pulled from them via iterators or tile access,
so the process will actually exit from asynch execution without having computed
anything, and potentially taking its dear time to actually compute
when the results
are finally accessed.
Also, the inputs might not be there anymore when the result is being accessed
(think a source layer that was removed in the meantime).
If the result is not streaming, but fully loaded in memory, there is
the problem of
how many results we can keep in memory (and for how long, this should
be configurable
too).
Long story short, imho we want to write out the results on disk as
soon as possible, and
I guess include that into the "execution" phase from the user pont of view.
This changes the process manager, which at this point should take care of laying
out the results on disk and returning not a map of outputs when the
process is done,
but a link to the file that contains the response xml, which in turn might link
to other documents, which happens if the user asked the output to be returned
as references (common if you are generating a tiff, you probably would
not like it
being base64 encoded inline in the xml).
This "lay out on the disk" part would be pretty common among various
implementations
so I guess I'll make a helper object for that part that various
ProcessManager implementations
can reuse.
Opinions, suggestions?
Cheers
Andrea
--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 962313
http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf
-------------------------------------------------------