[Geoserver-devel] Queue position for asynchronous WPS jobs

Hi All,

As per previous correspondence from Julian Atkinson, we are in the
process of implementing our own WPS processes using GeoServer as
described in
http://docs.geoserver.org/latest/en/developer/programming-guide/wps-services/implementing.html.
In particular, we are implementing processes that will allow us to
generate collections of netcdf files for non-gridded data (one netcdf
file per CF feature instance at the moment) and to aggregate and subset
large collections of gridded netcdf files.

Depending on the amount of data selected, generation of output files may
be quite time consuming, so we have been using the asynchronous
processing option to run these processes (noting that netcdf is a
non-streaming format).

In addition to publishing these processes to collaborators to use to
access our data in netcdf format, we also make use of these processes
within our data portal to power download of data in netcdf format.
We'd like to be able to do so in as user friendly a way as possible
within the current constraints of the WPS protocol (1.0 at the moment).

We've raised one usability issue as we see it with the current GeoServer
WPS asynchronous support. In particular, the execution time limit
currently applied for asynchronous job execution currently includes the
time spent in the queue, this means that a few large jobs can prevent
other small jobs from ever being run which doesn't seem fair to the
users that submitted them. It would be good if we could work out a
fairer way of allocating processing resources. We've put up a pull
request with one option for addressing this at
https://github.com/geoserver/geoserver/pull/1188. In this pull request,
we have modified the execution time limit for jobs to exclude queuing
time and added a new total time limit - it seemed more intuitive to us
that the execution time limit would not include queuing time - which is
why we chose to modify the behaviour of this limit. We haven't received
any feedback on that PR so I'm wondering if others don't think that
makes sense and whether we should modify our approach here.

Another area where we'd like to be able to improve the user friendliness
of the process is to be able to give better feedback on how a job is
progressing. At the moment we can do that while the job is executing
using geoServers percentCompleted support, however, we can't do that
while the job is queued (which we've established can be as long as or if
we have our way, longer than a job can actually be executing). Looking
at the WPS 1.0 specification, the only way to do this as far as I can
tell, is to use the ProcessAccepted element. From the WPS 1.0
specification:

"... The contents of this human-readable text string is left open to
definition by each server, but is expected to include any messages the
server wishes to let the clients know. Such information could include
how long the queue is, or any warning conditions that may have been
encountered. The client may display this text to a human user."

Would it be worthwhile us looking at making use of the ProcessAccepted
element to return queue position information? If so, we could have a
look at how we may go about doing this and put up a more detailed
proposal or pull request for consideration. Any other options or
suggestions on how to go about this welcome.

Thanks,
Craig Jones
Integrated Marine Observing System

University of Tasmania Electronic Communications Policy (December, 2014).
This email is confidential, and is for the intended recipient only. Access, disclosure, copying, distribution, or reliance on any of it by anyone outside the intended recipient organisation is prohibited and may be a criminal offence. Please delete if obtained in error and email confirmation to the sender. The views expressed in this email are not necessarily the views of the University of Tasmania, unless clearly intended otherwise.

On Mon, Jan 11, 2016 at 4:45 PM, Craig Jones <Craig.Jones@anonymised.com>
wrote:

We've raised one usability issue as we see it with the current GeoServer
WPS asynchronous support. In particular, the execution time limit
currently applied for asynchronous job execution currently includes the
time spent in the queue, this means that a few large jobs can prevent
other small jobs from ever being run which doesn't seem fair to the
users that submitted them. It would be good if we could work out a
fairer way of allocating processing resources. We've put up a pull
request with one option for addressing this at
https://github.com/geoserver/geoserver/pull/1188. In this pull request,
we have modified the execution time limit for jobs to exclude queuing
time and added a new total time limit - it seemed more intuitive to us
that the execution time limit would not include queuing time - which is
why we chose to modify the behaviour of this limit. We haven't received
any feedback on that PR so I'm wondering if others don't think that
makes sense and whether we should modify our approach here.

Hi,
I believe I'm the reason you're being stuck there... I'm the current
maintainer
of the WPS module, the pull request in question is touching very sensitive
parts of the WPS engine and I need some peace and quiet to review it and
make sure it's not breaking something else... unfortunately I
haven't had much of that in the past few months, and devoted most of it to
handle easier to review pull requests, bug fixing, and preparing
the code base for the wicket 7 upgrade that we'll start working on tomorrow.
https://wiki.osgeo.org/wiki/GeoServer_Code_Sprint_2016

I'll try to find some time once I'm back home, but it may well be sometimes
in February at this point.

Another area where we'd like to be able to improve the user friendliness
of the process is to be able to give better feedback on how a job is
progressing. At the moment we can do that while the job is executing
using geoServers percentCompleted support, however, we can't do that
while the job is queued (which we've established can be as long as or if
we have our way, longer than a job can actually be executing). Looking
at the WPS 1.0 specification, the only way to do this as far as I can
tell, is to use the ProcessAccepted element. From the WPS 1.0
specification:

"... The contents of this human-readable text string is left open to
definition by each server, but is expected to include any messages the
server wishes to let the clients know. Such information could include
how long the queue is, or any warning conditions that may have been
encountered. The client may display this text to a human user."

Would it be worthwhile us looking at making use of the ProcessAccepted
element to return queue position information? If so, we could have a
look at how we may go about doing this and put up a more detailed
proposal or pull request for consideration. Any other options or
suggestions on how to go about this welcome.

Hum hum... to make this work in general you'll probably need to expand
the ProcessManager interface somehow.

In GeoServer you can have multiple ProcessManager, each one taking on
specific processes
and with a different queuing mechanisms. A basic GeoServer will only have
the
DefaultProcessManager of course, but I know of a few installations that
rolled
custom ones for custom processes (e.g., to connect to a computing grid that
is actually running the processes, where GeoServer is mostly a protocol
proxy, except for a few processes that are actually running inside
GeoServer,
and inside the DefaultProcessManager).

In the specific case of DefaultProcessManager, then each of the thread pool
executors
is baked by a LinkedBlockingQueue, that you could set aside so that you can
scroll over
to find how many processes there are before the one that you want to report
about.

So, in general, you'd first have to find which process manager is running
your process,
see if it's still in queued state, then ask it how many processes are
before it in that
particular queue.

Given you're going to modify a public extension point, a formal proposal is
required:
https://github.com/geoserver/geoserver/wiki/Proposals

Hope this helps

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

Hi Andrea,

Yes that helps. We'll put together a proposal for the queue position changes - thanks for pointing us in the right direction.

Thanks
CraigJ

On 18/01/16 16:44, Andrea Aime wrote:
On Mon, Jan 11, 2016 at 4:45 PM, Craig Jones <Craig.Jones@anonymised.com<mailto:Craig.Jones@anonymised.com>> wrote:
We've raised one usability issue as we see it with the current GeoServer
WPS asynchronous support. In particular, the execution time limit
currently applied for asynchronous job execution currently includes the
time spent in the queue, this means that a few large jobs can prevent
other small jobs from ever being run which doesn't seem fair to the
users that submitted them. It would be good if we could work out a
fairer way of allocating processing resources. We've put up a pull
request with one option for addressing this at
https://github.com/geoserver/geoserver/pull/1188. In this pull request,
we have modified the execution time limit for jobs to exclude queuing
time and added a new total time limit - it seemed more intuitive to us
that the execution time limit would not include queuing time - which is
why we chose to modify the behaviour of this limit. We haven't received
any feedback on that PR so I'm wondering if others don't think that
makes sense and whether we should modify our approach here.

Hi,
I believe I'm the reason you're being stuck there... I'm the current maintainer
of the WPS module, the pull request in question is touching very sensitive
parts of the WPS engine and I need some peace and quiet to review it and
make sure it's not breaking something else... unfortunately I
haven't had much of that in the past few months, and devoted most of it to
handle easier to review pull requests, bug fixing, and preparing
the code base for the wicket 7 upgrade that we'll start working on tomorrow.
https://wiki.osgeo.org/wiki/GeoServer_Code_Sprint_2016

I'll try to find some time once I'm back home, but it may well be sometimes
in February at this point.

Another area where we'd like to be able to improve the user friendliness
of the process is to be able to give better feedback on how a job is
progressing. At the moment we can do that while the job is executing
using geoServers percentCompleted support, however, we can't do that
while the job is queued (which we've established can be as long as or if
we have our way, longer than a job can actually be executing). Looking
at the WPS 1.0 specification, the only way to do this as far as I can
tell, is to use the ProcessAccepted element. From the WPS 1.0
specification:

"... The contents of this human-readable text string is left open to
definition by each server, but is expected to include any messages the
server wishes to let the clients know. Such information could include
how long the queue is, or any warning conditions that may have been
encountered. The client may display this text to a human user."

Would it be worthwhile us looking at making use of the ProcessAccepted
element to return queue position information? If so, we could have a
look at how we may go about doing this and put up a more detailed
proposal or pull request for consideration. Any other options or
suggestions on how to go about this welcome.

Hum hum... to make this work in general you'll probably need to expand
the ProcessManager interface somehow.

In GeoServer you can have multiple ProcessManager, each one taking on specific processes
and with a different queuing mechanisms. A basic GeoServer will only have the
DefaultProcessManager of course, but I know of a few installations that rolled
custom ones for custom processes (e.g., to connect to a computing grid that
is actually running the processes, where GeoServer is mostly a protocol
proxy, except for a few processes that are actually running inside GeoServer,
and inside the DefaultProcessManager).

In the specific case of DefaultProcessManager, then each of the thread pool executors
is baked by a LinkedBlockingQueue, that you could set aside so that you can scroll over
to find how many processes there are before the one that you want to report about.

So, in general, you'd first have to find which process manager is running your process,
see if it's still in queued state, then ask it how many processes are before it in that
particular queue.

Given you're going to modify a public extension point, a formal proposal is required:
https://github.com/geoserver/geoserver/wiki/Proposals

Hope this helps

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.

-------------------------------------------------------

University of Tasmania Electronic Communications Policy (December, 2014).
This email is confidential, and is for the intended recipient only. Access, disclosure, copying, distribution, or reliance on any of it by anyone outside the intended recipient organisation is prohibited and may be a criminal offence. Please delete if obtained in error and email confirmation to the sender. The views expressed in this email are not necessarily the views of the University of Tasmania, unless clearly intended otherwise.