[Geoserver-devel] Dealing with raw input/outputs in WPS

Hi,
I would like to add support for “raw” inputs for WPS processes, that is, processes
that have either inputs or outputs that are binary streams, and that do their own
parsing (e.g., let’s say we are wrapping some command line utility such as gdal_translate).

In terms of inputs we need:

  • a way for the process to advertise the supported mime types (so that we can generate
    the process description accordingly)
  • a way for the process to grab the raw data
  • a way for the process to know in which mime type the raw data is in

Advertising wise, I would add a new item in the Parameter metadata map,
called MIME_TYPES, that would contain a static list of accepted mime types.

A process that needs to receive a raw input would then have a special argument type,

let’s call it RawBinary, that conveys both the data and the selected mime type:

interface RawBinary {
String getMimeType();
InputStream getInputStream();
}

GeoServer WPS would notice the special type, and feed the process with the
raw input.

For raw outputs the situation would be a bit more complicated, all the process
needs to do is to generate a InputStream that GeoServer can read from,
but we’d need to:

  • advertise the list of supported output mime types
  • pass down to the process the chosen mime type

I’d say that for this case in the Parameter we’d have two new entries,
MIME_TYPES just like for the inputs (list of supported output mime types in this case),
and the name of a input parameter (REQUESTED_OUTPUT_MIME)
that would be used by GeoServer to pass
down the chosen mime type (which of course would not be necessary in case there
is a single output mime type).

Java wise, given that all these things are quite WPS specific and not something
a generic non WPS process writer should be concerned with, my idea would be
for GeoServer to have custom extra annotations describing these parameter:

@DescribeRawResult(name = “result”, description = “Output raster”, mimeTypes = {“application/json”, “text/xml”}, selection = “outputMimeType”)

RawBinary execute( …
@anonymised.com(name = “data”, description = “Input features”, mimeTypes = {“application/json”, “text/xml”}) RawBinary myInput

String outputMimeType);

Alternatively, the existing java annotations could become open ended by having a KVP sub-annotation array in a meta field:

@DescribeResult(name = “result”, description = “Output raster”,
meta = {@KVP(key = “mimeTypes”, values = {“application/json”, “text/xml”}, @KVP(key = “selection”, value = “outputMimeType”)}

RawBinary execute( …
@DescribeParameter(name = “data”, description = “Input features”) ,
meta = {@KVP(key = “mimeTypes”, values = {“application/json”, “text/xml”}) RawBinary myInput


String outputMimeType);

Personally I somewhat prefer the first, as it’s more compact for the user, even if

Then there is the python scripting side, there we have it easier, since we are talking about bindings
that can be used only for WPS regardless, I guess we can just add some entries in the parameter description,
mimeTypes and mimeTypeSelection, and be done with it, what do you think?

I’m not sure if the streams should be wrapped into something more pythonic before passing them down
to Jython

Feedback welcomed!

Cheers
Andrea

==
Meet us at GEO Business 2014! in London! Visit http://goo.gl/fES3aK
for more information.

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


Hey Andrea,

The design sounds ok to me. One thing is given that a raw input/output might be text based like in your examples of xml and json I wonder if RawBinary is perhaps not the best name. I like RawInput / RawOutput a bit better I think, or just RawData if we only need one class.

Regarding the script bindings (be nice to consider more than just Python) agreed just adding some metadata makes sense. As for whether to the stream or not it is a good question. For python we could use PyFile to provide something that looks like a python file, but then i am not sure how easy it is to get the original stream back out of it. And given that it’s jython the likelihood of using a another java library that requires normal java streams is high. Anyways, i would leave it as is for now. If the script writer wants to wrap it in a python file they can do it with one line.

$0.02

-Justin

···

On Fri, May 23, 2014 at 8:23 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

Hi,
I would like to add support for “raw” inputs for WPS processes, that is, processes
that have either inputs or outputs that are binary streams, and that do their own
parsing (e.g., let’s say we are wrapping some command line utility such as gdal_translate).

In terms of inputs we need:

  • a way for the process to advertise the supported mime types (so that we can generate
    the process description accordingly)
  • a way for the process to grab the raw data
  • a way for the process to know in which mime type the raw data is in

Advertising wise, I would add a new item in the Parameter metadata map,
called MIME_TYPES, that would contain a static list of accepted mime types.

A process that needs to receive a raw input would then have a special argument type,

let’s call it RawBinary, that conveys both the data and the selected mime type:

interface RawBinary {
String getMimeType();
InputStream getInputStream();
}

GeoServer WPS would notice the special type, and feed the process with the
raw input.

For raw outputs the situation would be a bit more complicated, all the process
needs to do is to generate a InputStream that GeoServer can read from,
but we’d need to:

  • advertise the list of supported output mime types
  • pass down to the process the chosen mime type

I’d say that for this case in the Parameter we’d have two new entries,
MIME_TYPES just like for the inputs (list of supported output mime types in this case),
and the name of a input parameter (REQUESTED_OUTPUT_MIME)
that would be used by GeoServer to pass
down the chosen mime type (which of course would not be necessary in case there
is a single output mime type).

Java wise, given that all these things are quite WPS specific and not something
a generic non WPS process writer should be concerned with, my idea would be
for GeoServer to have custom extra annotations describing these parameter:

@DescribeRawResult(name = “result”, description = “Output raster”, mimeTypes = {“application/json”, “text/xml”}, selection = “outputMimeType”)

RawBinary execute( …
@anonymised.com(name = “data”, description = “Input features”, mimeTypes = {“application/json”, “text/xml”}) RawBinary myInput

String outputMimeType);

Alternatively, the existing java annotations could become open ended by having a KVP sub-annotation array in a meta field:

@DescribeResult(name = “result”, description = “Output raster”,
meta = {@KVP(key = “mimeTypes”, values = {“application/json”, “text/xml”}, @KVP(key = “selection”, value = “outputMimeType”)}

RawBinary execute( …
@DescribeParameter(name = “data”, description = “Input features”) ,
meta = {@KVP(key = “mimeTypes”, values = {“application/json”, “text/xml”}) RawBinary myInput


String outputMimeType);

Personally I somewhat prefer the first, as it’s more compact for the user, even if

Then there is the python scripting side, there we have it easier, since we are talking about bindings
that can be used only for WPS regardless, I guess we can just add some entries in the parameter description,
mimeTypes and mimeTypeSelection, and be done with it, what do you think?

I’m not sure if the streams should be wrapped into something more pythonic before passing them down
to Jython

Feedback welcomed!

Cheers
Andrea

==
Meet us at GEO Business 2014! in London! Visit http://goo.gl/fES3aK
for more information.

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it



“Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free.”
http://p.sf.net/sfu/SauceLabs


Geoserver-devel mailing list
Geoserver-devel@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Justin Deoliveira
Vice President, Engineering | Boundless
jdeolive@anonymised.com
@j_deolive

On Mon, May 26, 2014 at 2:47 PM, Justin Deoliveira <
jdeolive@anonymised.com> wrote:

Hey Andrea,

The design sounds ok to me. One thing is given that a raw input/output
might be text based like in your examples of xml and json I wonder if
RawBinary is perhaps not the best name. I like RawInput / RawOutput a bit
better I think, or just RawData if we only need one class.

Good thinking, I'll call it RawData

Regarding the script bindings (be nice to consider more than just Python)
agreed just adding some metadata makes sense. As for whether to the stream
or not it is a good question. For python we could use PyFile to provide
something that looks like a python file, but then i am not sure how easy it
is to get the original stream back out of it. And given that it's jython
the likelihood of using a another java library that requires normal java
streams is high. Anyways, i would leave it as is for now. If the script
writer wants to wrap it in a python file they can do it with one line.

Sounds reasonable to me

Thanks for the feedback!

Cheers
Andrea

--

Meet us at GEO Business 2014! in London! Visit http://goo.gl/fES3aK
for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------

On Fri, May 23, 2014 at 4:23 PM, Andrea Aime <andrea.aime@anonymised.com>
wrote:

Java wise, given that all these things are quite WPS specific and not
something
a generic non WPS process writer should be concerned with, my idea would be
for GeoServer to have custom extra annotations describing these parameter:

@DescribeRawResult(name = "result", description = "Output raster",
mimeTypes = {"application/json", "text/xml"}, selection = "outputMimeType")
RawBinary execute( ...
   @DescribeRawParameter(name = "data", description = "Input features",
mimeTypes = {"application/json", "text/xml"}) RawBinary myInput
   ...
   String outputMimeType);

Alternatively, the existing java annotations could become open ended by
having a KVP sub-annotation array in a meta field:

@DescribeResult(name = "result", description = "Output raster",
    meta = {@KVP(key = "mimeTypes", values = {"application/json",
"text/xml"}, @KVP(key = "selection", value = "outputMimeType")}
RawBinary execute( ...
    @DescribeParameter(name = "data", description = "Input features") ,
                                    meta = {@KVP(key = "mimeTypes",
values = {"application/json", "text/xml"}) RawBinary myInput
    ...
    String outputMimeType);

Speaking of annotations, I've been going over both approaches, and both are
making me bleed somehow.
The first one basically requires to duplicate the geotools annotations
fully, since annotation inheritance is not available....
which makes it a long term maintenance nightmare.

The second approach still looks rather verbose, but maybe it can be made
more tolerable this way:

@DescribeResult(name = "result", description = "Output raster",
    meta = {@KVP("mimeTypes=application/json,text/xml",
@KVP("selection=outputMimeType")}
RawData execute( ...
    @DescribeParameter(name = "data", description = "Input features") ,
                                    meta =
{@KVP("mimeTypes=application/json", "text/xml"}) RawData myInput
    ...
    String outputMimeType);

A bit more compact, upside, it's open ended, it could be useful for other
purposes in the future,
drawback, still somewhat verbose and if one types "mimeType" for example,
there will be no
feedback (typo prone...)

Alternatively, an extra annotation could be added in GeoServer, Raw, that
would work as follows:

@Raw(mimeTypes = {"application/json", "text/xml"}, selection =
"outputMimeType")
@DescribeResult(name = "result", description = "Output raster")
RawData execute( ...
    @Raw(mimeTypes = {"application/json", "text/xml"})
    @DescribeParameter(name = "data", description = "Input features") )
RawData myInput
    ...
    String outputMimeType);

An annoying bit would be that if we have multiple inputs, then we'd have
the following:

@DescribeResults({
  @DescribeResult(name = "a", description = "Intersection"),
  {@DescribeResult(name = "b", description = "everything else")
})
@RawResults({
  @Raw(name="a", mimeTypes = {"application/json", "text/xml"}, selection =
"aMimeType"),
  @Raw(name="b", mimeTypes = {"application/json", "text/xml"}, selection =
"bMimeType")
})
Map execute( ...
    @Raw(mimeTypes = {"application/json", "text/xml"})
    @DescribeParameter(name = "data", description = "Input features") )
RawData myInput
    ...
    String aMimeType, String bMimeType);

Which is honestly.. uuuugh...

Right now I'm sort of leaning towards the meta/@kvp thing... feedback
welcomed

Cheers
Andrea

--

Meet us at GEO Business 2014! in London! Visit http://goo.gl/fES3aK
for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------

On Fri, May 23, 2014 at 4:23 PM, Andrea Aime <andrea.aime@anonymised.com>
wrote:

Hi,
I would like to add support for "raw" inputs for WPS processes, that is,
processes
that have either inputs or outputs that are binary streams, and that do
their own
parsing (e.g., let's say we are wrapping some command line utility such as
gdal_translate).

And here we go with a pull request with changes, tests and docs.
Let me know if you are interested in reviewing it :slight_smile:

Cheers
Andrea

--

Meet us at GEO Business 2014! in London! Visit http://goo.gl/fES3aK
for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------