[Geoserver-devel] Discussion for GSIP 119, WPS clustering of asynchronous requests

Andrea_Aime4 · October 15, 2014, 1:13pm

Hi,
here is a proposal to modify the WPS module so that it can track asynchronous WPS
requests across a load balanced cluster (without sticky sessions):

https://github.com/geoserver/geoserver/wiki/GSIP%20119%20WPS%20clustering%20of%20asynchronous%20requests

Feedback welcomed.

Cheers
Andrea

–

==

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

==

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.

jive · October 15, 2014, 10:26pm

That is quite the complete proposal Andrea.

During the meeting you mentioned a bit more work (user interfaces, cancel operation, etc…) … or is this the API to enable a series of changes?

How well is the geotools process / future / process listener holding up? I would still like to lock that down when we are happy with it. Instead we keep building more and more functionality on top.

Jody

···

Jody Garnett

On Wed, Oct 15, 2014 at 6:13 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

Hi,
here is a proposal to modify the WPS module so that it can track asynchronous WPS
requests across a load balanced cluster (without sticky sessions):

https://github.com/geoserver/geoserver/wiki/GSIP%20119%20WPS%20clustering%20of%20asynchronous%20requests

Feedback welcomed.

Cheers
Andrea

–

==

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

==

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.

Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho

Geoserver-devel mailing list
Geoserver-devel@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Andrea_Aime4 · October 16, 2014, 6:16am

On Thu, Oct 16, 2014 at 12:26 AM, Jody Garnett <jody.garnett@anonymised.com>
wrote:

That is quite the complete proposal Andrea.

During the meeting you mentioned a bit more work (user interfaces, cancel
operation, etc...) .... or is this the API to enable a series of changes?

Yes, more will come (I have two more proposals related to WPS, but
orthogonal to this one, down the way), but the UI to track running
processes is
part of this one, I forgot

How well is the geotools process / future / process listener holding up? I
would still like to lock that down when we are happy with it. Instead we
keep building more and more functionality on top.

GeoTools process is ok, progress listener is not so great, does too much
and not enough at the same time, that's why I rolled other listeners in
this proposal,
not using GeoTools futures at all, the WPS architecture is not making
assumptions about how a ProcessManager runs a process (it can well
be calling a remote process, we also have a community module incoming that
does exactly that).

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

Alessio_Fabiani · October 16, 2014, 7:52am

Hi Andrea, the proposal seems quite complete to me too. Also there are good ideas for sharing the status and the change of the model from polling to notification.
Even if you said this in a previous email, the proposal does not speak about the UI. As far as I understood there will be an improvement to see the list of running processes, right?
Are also the limits part of this proposal or there will be another one in the future about this topic?

In any case about me is of course → {GSIP 119}++

···

==
GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Alessio Fabiani
@alfa7691
Founder/Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 331 6233686

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.

On Thu, Oct 16, 2014 at 8:16 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho

Geoserver-devel mailing list
Geoserver-devel@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

On Thu, Oct 16, 2014 at 12:26 AM, Jody Garnett <jody.garnett@anonymised.com> wrote:

That is quite the complete proposal Andrea.

During the meeting you mentioned a bit more work (user interfaces, cancel operation, etc…) … or is this the API to enable a series of changes?

Yes, more will come (I have two more proposals related to WPS, but orthogonal to this one, down the way), but the UI to track running processes is
part of this one, I forgot

How well is the geotools process / future / process listener holding up? I would still like to lock that down when we are happy with it. Instead we keep building more and more functionality on top.

GeoTools process is ok, progress listener is not so great, does too much and not enough at the same time, that’s why I rolled other listeners in this proposal,
not using GeoTools futures at all, the WPS architecture is not making assumptions about how a ProcessManager runs a process (it can well
be calling a remote process, we also have a community module incoming that does exactly that).

Cheers

Andrea

–

==

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

==

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.

Brad_Hards · October 16, 2014, 10:21am

On Wed, 15 Oct 2014 03:13:59 PM Andrea Aime wrote:

Hi,
here is a proposal to modify the WPS module so that it can track
asynchronous WPS
requests across a load balanced cluster (without sticky sessions):

https://github.com/geoserver/geoserver/wiki/GSIP%20119%20WPS%20clustering%20
of%20asynchronous%20requests

As I understand this proposal, it really focuses on making sure that the
client interactions return the right thing. I know nothing about WPS or
GeoServer that would make me suggest it isn't a reasonable approach. Maybe
only the first three words of that sentence were needed

However there are a couple of things that occurred to me:
- Would it be possible to avoid the use of the shared stores, which could be
a potential failure mode and maybe disk-space management problem? Instead of
shared status storage, perhaps some kind of cluster-internal message passing
(either a broadcast "hey everyone, I just finished Task (id)", or a query /
response "what is the status of Task (id)" / "its about 40%, come back
later"). For the artifacts store, perhaps the node that gets the request could
just proxy it to the node that has the data (again using the message-based
status broadcasts or query / response to identify who has the data). That
isn't too far from the current interfaces, just a different implementation.

- Is assignment of the actual processing task to a node orthogonal to this
proposal? Or are we always assuming that its going to be run on the node that
takes the initial async request?

Brad

Andrea_Aime4 · October 16, 2014, 10:41am

On Thu, Oct 16, 2014 at 12:21 PM, Brad Hards <bradh@anonymised.com> wrote:

On Wed, 15 Oct 2014 03:13:59 PM Andrea Aime wrote:
> Hi,
> here is a proposal to modify the WPS module so that it can track
> asynchronous WPS
> requests across a load balanced cluster (without sticky sessions):
>
>
https://github.com/geoserver/geoserver/wiki/GSIP%20119%20WPS%20clustering%20
> of%20asynchronous%20requests
As I understand this proposal, it really focuses on making sure that the
client interactions return the right thing. I know nothing about WPS or
GeoServer that would make me suggest it isn't a reasonable approach. Maybe
only the first three words of that sentence were needed

Completely lost you there? What is not reasonable?

However there are a couple of things that occurred to me:
- Would it be possible to avoid the use of the shared stores, which could
be
a potential failure mode and maybe disk-space management problem? Instead
of
shared status storage, perhaps some kind of cluster-internal message
passing
(either a broadcast "hey everyone, I just finished Task (id)", or a query /
response "what is the status of Task (id)" / "its about 40%, come back
later"). For the artifacts store, perhaps the node that gets the request
could
just proxy it to the node that has the data (again using the message-based
status broadcasts or query / response to identify who has the data). That
isn't too far from the current interfaces, just a different implementation.

The disk space management problem cannot be solved, the results need to
stay available for a while, the spec does not say that the first time you
request
them they will be deleted. And some processes do compute data while they
are encoding the output (known as streaming processes, they do return
feature collections that are computing the results as you pull them, or
grid coverages
that are backed by JAI operations that compute tiles as you request them),
we cannot keep
the database connections open until the client asks for the results either.
Also, WPS clients can do something crazy like sending over a 2GB request
with the input data embedded into it, and then ask for ancestry in the
response,
which requires repeating the whole request in the response, so the request
needs to be stored on disk somewhere.

- Is assignment of the actual processing task to a node orthogonal to this
proposal? Or are we always assuming that its going to be run on the node
that
takes the initial async request?

The latter, assignment of a different processing node is out of scope for
this proposal,
but if you have time to work on it, you'd be welcomed

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

jive · October 16, 2014, 4:17pm

How well is the geotools process / future / process listener holding up? I

would still like to lock that down when we are happy with it. Instead we
keep building more and more functionality on top.

GeoTools process is ok, progress listener is not so great, does too much
and not enough at the same time, that's why I rolled other listeners in
this proposal, not using GeoTools futures at all, the WPS architecture is
not making assumptions about how a ProcessManager runs a process (it can
well be calling a remote process, we also have a community module incoming
that does exactly that).

I would prefer if we could feed your requirements back in to GeoTools (so
we can finalise that api). I had expected GeoServer to implement its own
ProcessManager (on the off chance a process chains another process the
chain would show up visible to GeoServer process management UI).

In short if we are not using part of the API it should probably be cut down
to fit.
--
Jody

Andrea_Aime4 · October 16, 2014, 4:21pm

On Thu, Oct 16, 2014 at 6:17 PM, Jody Garnett <jody.garnett@anonymised.com>
wrote:

How well is the geotools process / future / process listener holding up? I

would still like to lock that down when we are happy with it. Instead we
keep building more and more functionality on top.

GeoTools process is ok, progress listener is not so great, does too much
and not enough at the same time, that's why I rolled other listeners in
this proposal, not using GeoTools futures at all, the WPS architecture is
not making assumptions about how a ProcessManager runs a process (it can
well be calling a remote process, we also have a community module incoming
that does exactly that).

I would prefer if we could feed your requirements back in to GeoTools (so
we can finalise that api). I had expected GeoServer to implement its own
ProcessManager (on the off chance a process chains another process the
chain would show up visible to GeoServer process management UI).

I have no time to work on that right now, all my spare time is sucked up by
the CSS module, pull request reviews, bug fixing, and other "mandatory"
community activities (mandatory as in, if nobody does them, the community
basically falls apart)

In short if we are not using part of the API it should probably be cut
down to fit.

Hum hum... not so sure. I would not imply that if GeoServer is not using
something out of GeoTools, then that something is not useful, it's a
dangerous path.

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

jive · October 16, 2014, 4:26pm

I would prefer if we could feed your requirements back in to GeoTools (so

we can finalise that api). I had expected GeoServer to implement its own
ProcessManager (on the off chance a process chains another process the
chain would show up visible to GeoServer process management UI).

I have no time to work on that right now, all my spare time is sucked up
by the CSS module, pull request reviews, bug fixing, and other "mandatory"
community activities (mandatory as in, if nobody does them, the community
basically falls apart)

That is fine Andrea, if you can continue to communicate what is not being
used (or being annoying to workaround) it would be appreciated.

In short if we are not using part of the API it should probably be cut down

to fit.

Hum hum... not so sure. I would not imply that if GeoServer is not using
something out of GeoTools, then that something is not useful, it's a
dangerous path.

How about if it is not being used by GeoServer or uDig it is worth cutting
down in size. The process API is unsupported due to lack of feedback ... so
this is a chance for me to learn more what is needed.
--
Jody

Phil_Scadden · October 16, 2014, 8:01pm

+0 from me. This involves Geoserver usage I am unfamiliar with (cluster) and technical detail that I dont think I have any useful opinion to add.

Notice: This email and any attachments are confidential.
If received in error please destroy and immediately notify us.
Do not copy or disclose the contents.

Jukka_Rahkonen · October 16, 2014, 8:54pm

+0 from me too.
BTW, I have a feeling that some day there may be a demand for Web Coverage Processing Services and clustering and possibility to handle asynchronous requests would be useful also for those. Is this GSIP general enough for accepting coverages as input for WPS?

-Jukka Rahkonen-

________________________________________
Phil Scadden wrote:

+0 from me. This involves Geoserver usage I am unfamiliar with (cluster)
and technical detail that I dont think I have any useful opinion to add.

Simone_Giannecchini3 · October 16, 2014, 8:57pm

+0

···

Regards,
Simone Giannecchini

GeoServer Professional Services from the experts! Visit http://goo.gl/NWWaa2 for more information.

Ing. Simone Giannecchini
@simogeo
Founder/Director

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 333 8128928

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.

On Thu, Oct 16, 2014 at 10:54 PM, Rahkonen Jukka (Tike) <jukka.rahkonen@anonymised.com> wrote:

+0 from me too.
BTW, I have a feeling that some day there may be a demand for Web Coverage Processing Services and clustering and possibility to handle asynchronous requests would be useful also for those. Is this GSIP general enough for accepting coverages as input for WPS?

-Jukka Rahkonen-

Phil Scadden wrote:

+0 from me. This involves Geoserver usage I am unfamiliar with (cluster)
and technical detail that I dont think I have any useful opinion to add.

Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho

Geoserver-devel mailing list
Geoserver-devel@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Brad_Hards · October 16, 2014, 9:42pm

On Thu, 16 Oct 2014 12:41:49 PM Andrea Aime wrote:

On Thu, Oct 16, 2014 at 12:21 PM, Brad Hards <bradh@anonymised.com> wrote:
> On Wed, 15 Oct 2014 03:13:59 PM Andrea Aime wrote:
> > Hi,
> > here is a proposal to modify the WPS module so that it can track
> > asynchronous WPS
>
> > requests across a load balanced cluster (without sticky sessions):
> https://github.com/geoserver/geoserver/wiki/GSIP%20119%20WPS%20clustering%
> 20>
> > of%20asynchronous%20requests
>
> As I understand this proposal, it really focuses on making sure that the
> client interactions return the right thing. I know nothing about WPS or
> GeoServer that would make me suggest it isn't a reasonable approach. Maybe
> only the first three words of that sentence were needed

Completely lost you there? What is not reasonable?

Sorry, I'll try to be clearer:
- I'm clearly the new guy, who doesn't know much about WPS or GeoServer.
- I (obviously) respect your experience.
- Like others, I respect the work you've put into the proposal - its very
detailed.
- From those three things, anything below could well be a stupid idea.

> However there are a couple of things that occurred to me:
> - Would it be possible to avoid the use of the shared stores, which could
>
> be
> a potential failure mode and maybe disk-space management problem? Instead
> of
> shared status storage, perhaps some kind of cluster-internal message
> passing
> (either a broadcast "hey everyone, I just finished Task (id)", or a query
> /
> response "what is the status of Task (id)" / "its about 40%, come back
> later"). For the artifacts store, perhaps the node that gets the request
> could
> just proxy it to the node that has the data (again using the message-based
> status broadcasts or query / response to identify who has the data). That
> isn't too far from the current interfaces, just a different
> implementation.

The disk space management problem cannot be solved, the results need to
stay available for a while, the spec does not say that the first time you
request
them they will be deleted. And some processes do compute data while they
are encoding the output (known as streaming processes, they do return
feature collections that are computing the results as you pull them, or
grid coverages
that are backed by JAI operations that compute tiles as you request them),
we cannot keep
the database connections open until the client asks for the results either.
Also, WPS clients can do something crazy like sending over a 2GB request
with the input data embedded into it, and then ask for ancestry in the
response,
which requires repeating the whole request in the response, so the request
needs to be stored on disk somewhere.

OK. My question is whether it needs to be on *shared* disk?
If we need to have some fast shared disk (maybe a SAN, perhaps some networked
disk with all the locking issues), then the cost in equipment and management /
support goes up.

One alternative would be to only store on the local disk for the server doing
the processing.

The shared part would be a messaging system (e.g ApacheMQ, although the
specific choice may not be important).

So using the activity diagram shown in Figure 2 of the WPS spec, the client
sends the Execute Request to some server in the cluster (#1 for this example),
which sends back the Execute Response, and (perhaps sometime later) the
processing starts on #1.
Now the client sends the "Show me the latest Execute Response" request, and
it hits server #2. That server is not #1, so it needs to find out the status
from #1. Instead of looking on on the shared notification store, Server #2
could just send a message to the cluster asking which server has the job, or
asking for a status by GUID or whatever. Server #1 tells Server #2 that Server
#1 has the job, or provides back the status. Then Server #2 can send back the
"latest Execute Response".
If the job is finished, then the client will send the "Send me the outputs"
request, which hits Server #3, which proxies the request to Server #1 (perhaps
via some caching of which server responded to the status message, or via a new
query).

> - Is assignment of the actual processing task to a node orthogonal to this
> proposal? Or are we always assuming that its going to be run on the node
> that
> takes the initial async request?

The latter, assignment of a different processing node is out of scope for
this proposal,
but if you have time to work on it, you'd be welcomed

I don't really have the time or experience yet, but I can see that some kind
of messaging system might be useful to do the processing node assignment.
However the status store might also be able to do some of that.

As I said (or tried to say) in my last attempt, I don't think the shared
storage vs messaging implementation detail will make much difference in the
interface design.

So if messaging is too much for this proposal, then maybe its just a
consideration in the detailed implementation stage. Or if you're already done,
don't let this "dreaming" stop the development / merging.

Brad

jive · October 17, 2014, 12:58am

+1

Please make a note about the UI when you get a chance.

···

Jody Garnett

On Thu, Oct 16, 2014 at 9:26 AM, Jody Garnett <jody.garnett@anonymised.com> wrote:

That is fine Andrea, if you can continue to communicate what is not being used (or being annoying to workaround) it would be appreciated.

How about if it is not being used by GeoServer or uDig it is worth cutting down in size. The process API is unsupported due to lack of feedback … so this is a chance for me to learn more what is needed.

Jody

I have no time to work on that right now, all my spare time is sucked up by the CSS module, pull request reviews, bug fixing, and other “mandatory” community activities (mandatory as in, if nobody does them, the community basically falls apart)

I would prefer if we could feed your requirements back in to GeoTools (so we can finalise that api). I had expected GeoServer to implement its own ProcessManager (on the off chance a process chains another process the chain would show up visible to GeoServer process management UI).

Hum hum… not so sure. I would not imply that if GeoServer is not using something out of GeoTools, then that something is not useful, it’s a dangerous path.

In short if we are not using part of the API it should probably be cut down to fit.

Andrea_Aime4 · October 17, 2014, 8:42am

On Thu, Oct 16, 2014 at 10:54 PM, Rahkonen Jukka (Tike) <
jukka.rahkonen@anonymised.com> wrote:

+0 from me too.
BTW, I have a feeling that some day there may be a demand for Web Coverage
Processing Services and clustering and possibility to handle asynchronous
requests would be useful also for those. Is this GSIP general enough for
accepting coverages as input for WPS?

This GSIP is completely orthogonal to what inputs a WPS can take, it's
about how we can run asynchornous requests over a clustered installation,
but the nature of the request can be any, indeed one of the reasons why
there is a separate large artifacts store in the proposal is to allow large
inputs and outputs to be passed to, and generated from, processes.

That said, GeoServer WPS can already handle raster data both as an input,
and as outputs, but this does not make it a WCPS.

WCPS is a special language to express operations over (muldimensional)
coverage, as far as I understand
it's almost a 1-1 standardization of rasql, the query language of Rasdaman,
and the network layer is not an extension
on top of WPS, but on top of WCS instead (look for the "processing
extension" in the WCS standard page), where
a new ProcessCoverage operation is added to the base protocol.

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

Andrea_Aime4 · October 17, 2014, 9:16am

On Thu, Oct 16, 2014 at 11:42 PM, Brad Hards <bradh@anonymised.com> wrote:

> The disk space management problem cannot be solved, the results need to
> stay available for a while, the spec does not say that the first time you
> request
> them they will be deleted. And some processes do compute data while they
> are encoding the output (known as streaming processes, they do return
> feature collections that are computing the results as you pull them, or
> grid coverages
> that are backed by JAI operations that compute tiles as you request
them),
> we cannot keep
> the database connections open until the client asks for the results
either.
> Also, WPS clients can do something crazy like sending over a 2GB request
> with the input data embedded into it, and then ask for ancestry in the
> response,
> which requires repeating the whole request in the response, so the
request
> needs to be stored on disk somewhere.
OK. My question is whether it needs to be on *shared* disk?

No, it does not need to, but it's a valid solution.

One of the mandates of a GSIP is that whatever is presented can be worked on
fully by the party that's presenting it, without requiring help from the
outside, and
without leaving the code in a half done state, this requires scope control,
e.g.,
the proposal either stays within the available resources to implement it,
or it's
not done at all.

At the same time, it does have to be general enough that it can grow the
limits
of the initial funding/timelines.
My hope is that the ProcessArtifactsStore interface is general enough to
allow
future extension beyond the shared file system, and be implemented with some
other technology:
https://github.com/geoserver/geoserver/wiki/GSIP%20119%20WPS%20clustering%20of%20asynchronous%20requests#processartifactsstore

If you think it is too limited, then yes, we have a problem I have to
address before
the proposal can pass, but if you simply don't like the shared filesystem
approach, in that case
it's up to you to provide the resources to implement a different solution,
I just need
to make sure you can implement it if you want to.

One alternative would be to only store on the local disk for the server
doing
the processing.

The shared part would be a messaging system (e.g ApacheMQ, although the
specific choice may not be important).

If we think in terms of interface, the specific choice is indeed not
important,
but in terms of the first implementation of this proposal, it is very much
so.
WPS can have large input and large outputs (as in GB or even TB large)
and the solution needs to support large file transfer.

Message passing is normally not a good solution for streaming large results
e..g,
http://activemq.apache.org/can-i-send-really-large-files-over-activemq.html

Also, assuming there is large enough local storage simply does not match my
experience of
large installations (e.g., the ones that do fund these kind of proposals),
where the local disks often do not even get beyond a few hundred GB, while
the network disks often are
pretty large.

Also, we have experience on message passing technology (the configuration
clustering
module we just donated is based on ActiveMQ) and I can tell you it's not
always
welcomed.
First, people see the need to have an external server as a complication,
often too
much of it, so in that solution we put ActiveMQ as integral part of the
plugin by default
(you can still use an external one if you want), it's running as a library
and automatically discovers its peers.
Which is cool, but often does not work because multicast is banned from the
network,
at which point you have to provide a list of TCP addresses for the various
bits
to communicate with each other.

Recently we have also been playing with Hazelcast, and I'm also considering
the usage
of it for the status sharing part (as a replacement for the database), but
besides
its evident coolness it suffers from the same issues as a
embedded/clustered ActiveMQ
solution, it either has to use multicast for the discovery, or needs a list
of TCP
addresses that will form the core of the cluster (and such list needs to be
known
before starting up the cluster).
I can tell you that we've been trying to push this kind of technology for a
while,
but the resistance is strong: it can be a solution, but it clearly cannot
be _the solution_,
it cannot be the only option.
I'm actually trying to push it right now for a sharing small, short live
data across a cluster
in a customer project I'm following, for the task Hazelcast is clearly
easier and faster
than a database, and yet, there's a good chance I'll have to implement the
database one
instead because network and database admins are against it (the maintenance
and politics angles often play a very significant role and can supersede
the technical merits)

That said, I believe that also having a artifact store based on message
passing, as an option,
would be cool, and you're more than welcomed to work alongside this
proposal to implement it,
if you try, just let me know if the current interface shows weaknesses and
we'll
try to address them

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

Andrea_Aime4 · October 20, 2014, 2:09pm

On Fri, Oct 17, 2014 at 2:58 AM, Jody Garnett <jody.garnett@anonymised.com>
wrote:

+1

Please make a note about the UI when you get a chance.

Done. It's just a description, the actual code is yet to be written, so I
don't have a screenshot,
but it will really be the usual GeoServer table

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

[Geoserver-devel] Discussion for GSIP 119, WPS clustering of asynchronous requests

--

GeoServer Professional Services from the experts! Visit http://goo.gl/NWWaa2 for more information.

== GeoServer Professional Services from the experts! Visit http://goo.gl/NWWaa2 for more information.

--

GeoServer Professional Services from the experts! Visit http://goo.gl/NWWaa2 for more information.

--

GeoServer Professional Services from the experts! Visit http://goo.gl/NWWaa2 for more information.

Regards, Simone Giannecchini

GeoServer Professional Services from the experts! Visit http://goo.gl/NWWaa2 for more information.

How about if it is not being used by GeoServer or uDig it is worth cutting down in size. The process API is unsupported due to lack of feedback … so this is a chance for me to learn more what is needed.

--

GeoServer Professional Services from the experts! Visit http://goo.gl/NWWaa2 for more information.

--

GeoServer Professional Services from the experts! Visit http://goo.gl/NWWaa2 for more information.

--

GeoServer Professional Services from the experts! Visit http://goo.gl/NWWaa2 for more information.

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

==
GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Regards,
Simone Giannecchini

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.