[Geoserver-devel] Encoding binary data in WPS (XML) output

Hi,
I’m looking into having the binary outputs get encoded in the WPS document response and
failing to get a grip on how that would be done with the current Encoder architecture.

When binary data needs to be encoded inside the WPS response XML document, we basically
need to put it as the content of a ComplexData element, as base64 encoded data.

Now, to do that the code is using EncoderDelegate instances, and there are a few in use,
but only XMLEncoderDelegate actually works.
The ComplexDataTypeBinding code in the wps-core module has this code:

@Override
public List getProperties(Object object) throws Exception {
ComplexDataType complex = (ComplexDataType) object;
if ( !complex.getData().isEmpty() && complex.getData().get( 0 ) instanceof XMLEncoderDelegate ) {
XMLEncoderDelegate delegate = (XMLEncoderDelegate) complex.getData().get( 0 );
List properties = new ArrayList();
properties.add( new Object{
delegate.getProcessParameterIO().getElement(), delegate } );

return properties;
}

return null;
}

Which makes sense, as the XML encoder delegate will generate a XML subtree under
the ComplexData element.

But… what about the existing CDATAEncoderDelegate and BinaryEncoderDelegate, if one
of those is used, nothing will be generated in the output.

I guess the biggest difficulty here is that the EncoderDelegate in this case will not generate
a sub-element, it will generate the body of the ComplexData element instead.
Is this supported by the Encoder, and if so, what would be the name of the property to be used?

Or if not, do you have some rough indications on how to modify the GeoTools xsd Encoder to handle that case?

Cheers
Andrea

==
Meet us at GEO Business 2014! in London! Visit http://goo.gl/fES3aK
for more information.

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


What does the WPS spec say that binary encoded data is supposed to look like? Hard to visualize this one without some context but my first impression is that we could implement the encode() method for the binding and insert the binary/cdata encoded content directly into the dom element that is provided.

···

On Wed, Jun 4, 2014 at 11:03 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

Hi,
I’m looking into having the binary outputs get encoded in the WPS document response and
failing to get a grip on how that would be done with the current Encoder architecture.

When binary data needs to be encoded inside the WPS response XML document, we basically
need to put it as the content of a ComplexData element, as base64 encoded data.

Now, to do that the code is using EncoderDelegate instances, and there are a few in use,
but only XMLEncoderDelegate actually works.
The ComplexDataTypeBinding code in the wps-core module has this code:

@Override
public List getProperties(Object object) throws Exception {
ComplexDataType complex = (ComplexDataType) object;
if ( !complex.getData().isEmpty() && complex.getData().get( 0 ) instanceof XMLEncoderDelegate ) {
XMLEncoderDelegate delegate = (XMLEncoderDelegate) complex.getData().get( 0 );
List properties = new ArrayList();
properties.add( new Object{
delegate.getProcessParameterIO().getElement(), delegate } );

return properties;
}

return null;
}

Which makes sense, as the XML encoder delegate will generate a XML subtree under
the ComplexData element.

But… what about the existing CDATAEncoderDelegate and BinaryEncoderDelegate, if one
of those is used, nothing will be generated in the output.

I guess the biggest difficulty here is that the EncoderDelegate in this case will not generate
a sub-element, it will generate the body of the ComplexData element instead.
Is this supported by the Encoder, and if so, what would be the name of the property to be used?

Or if not, do you have some rough indications on how to modify the GeoTools xsd Encoder to handle that case?

Cheers
Andrea

==
Meet us at GEO Business 2014! in London! Visit http://goo.gl/fES3aK
for more information.

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it



Learn Graph Databases - Download FREE O’Reilly Book
“Graph Databases” is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech


Geoserver-devel mailing list
Geoserver-devel@anonymised.comt
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Justin Deoliveira
Vice President, Engineering | Boundless
jdeolive@anonymised.com
@j_deolive

On Thu, Jun 5, 2014 at 4:01 PM, Justin Deoliveira <jdeolive@anonymised.com

wrote:

What does the WPS spec say that binary encoded data is supposed to look
like? Hard to visualize this one without some context but my first
impression is that we could implement the encode() method for the binding
and insert the binary/cdata encoded content directly into the dom element
that is provided.

Eh, I could not find an example in the spec, but the way I expect it to
look like is:

<ComplexData mimeType="image/tiff" encoding="base64">
TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz
IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg
dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu
dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo
...
ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=
</ComplexData>

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------

On Thu, Jun 5, 2014 at 4:01 PM, Justin Deoliveira <jdeolive@anonymised.com

wrote:

What does the WPS spec say that binary encoded data is supposed to look
like? Hard to visualize this one without some context but my first
impression is that we could implement the encode() method for the binding
and insert the binary/cdata encoded content directly into the dom element
that is provided.

So had a look around and found this example of direct encoding, in
SimpleContentComplexEMFBinding:

/**
     * Calls getValue() and appends the result as child text of
<tt>value</tt>.
     */
    public Element encode(Object object, Document document, Element value)
throws Exception {
        EObject eobject = (EObject) object;
        if ( EMFUtils.has( eobject, "value") ) {
            Object v = EMFUtils.get( ((EObject)object), "value" );
            if ( v != null ) {
                value.appendChild( document.createTextNode( v.toString() )
);
            }
        }
        return value;
    }

I guess the binding has to either implement encode, or to return the
properties, right?
One thing that bothers me about the above example is that it's memory
bound, if I have
a large binary to put in the output module, I'd rather go with a
ContentHandler instead,
which is what the binary delegates expect anyways, for example:

public class RawDataEncoderDelegate implements EncoderDelegate {

    private RawData rawData;

    public RawDataEncoderDelegate(RawData rawData) {
        this.rawData = rawData;
    }

   * public void encode(ContentHandler output) throws Exception {*
        InputStream is = null;
        try {
            is = rawData.getInputStream();
            byte buffer = new byte[4096];
            int read = 0;
            while ((read = is.read(buffer)) > 0) {
                char chars;
                if (read == 4096) {
                    chars = new
String(Base64.encodeBase64(buffer)).toCharArray();
                } else {
                    byte reducedBuffer = new byte[read];
                    System.arraycopy(buffer, 0, reducedBuffer, 0, read);
                    chars = new
String(Base64.encodeBase64(reducedBuffer)).toCharArray();
                }

                output.characters(chars, 0, chars.length);
            }
        } finally {
            IOUtils.closeQuietly(is);
        }
    }

    public void encode(OutputStream os) throws IOException {
        IOUtils.copy(rawData.getInputStream(), os);
    }

Is this possible?

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------

On Thu, Jun 5, 2014 at 8:25 AM, Andrea Aime <andrea.aime@anonymised.com>
wrote:

On Thu, Jun 5, 2014 at 4:01 PM, Justin Deoliveira <
jdeolive@anonymised.com> wrote:

What does the WPS spec say that binary encoded data is supposed to look
like? Hard to visualize this one without some context but my first
impression is that we could implement the encode() method for the binding
and insert the binary/cdata encoded content directly into the dom element
that is provided.

So had a look around and found this example of direct encoding, in
SimpleContentComplexEMFBinding:

/**
     * Calls getValue() and appends the result as child text of
<tt>value</tt>.
     */
    public Element encode(Object object, Document document, Element value)
throws Exception {
        EObject eobject = (EObject) object;
        if ( EMFUtils.has( eobject, "value") ) {
            Object v = EMFUtils.get( ((EObject)object), "value" );
            if ( v != null ) {
                value.appendChild( document.createTextNode( v.toString() )
);
            }
        }
        return value;
    }

I guess the binding has to either implement encode, or to return the
properties, right?

If I remember correctly it can do both if necessary. Properties are always
mapped to child elements.

One thing that bothers me about the above example is that it's memory
bound, if I have
a large binary to put in the output module, I'd rather go with a
ContentHandler instead,
which is what the binary delegates expect anyways, for example:

public class RawDataEncoderDelegate implements EncoderDelegate {

    private RawData rawData;

    public RawDataEncoderDelegate(RawData rawData) {
        this.rawData = rawData;
    }

   * public void encode(ContentHandler output) throws Exception {*
        InputStream is = null;
        try {
            is = rawData.getInputStream();
            byte buffer = new byte[4096];
            int read = 0;
            while ((read = is.read(buffer)) > 0) {
                char chars;
                if (read == 4096) {
                    chars = new
String(Base64.encodeBase64(buffer)).toCharArray();
                } else {
                    byte reducedBuffer = new byte[read];
                    System.arraycopy(buffer, 0, reducedBuffer, 0, read);
                    chars = new
String(Base64.encodeBase64(reducedBuffer)).toCharArray();
                }

                output.characters(chars, 0, chars.length);
            }
        } finally {
            IOUtils.closeQuietly(is);
        }
    }

    public void encode(OutputStream os) throws IOException {
        IOUtils.copy(rawData.getInputStream(), os);
    }

Is this possible?

Sorry... not sure what you are asking here.

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------

--
*Justin Deoliveira*
Vice President, Engineering | Boundless
jdeolive@anonymised.com
@j_deolive <https://twitter.com/j_deolive&gt;

On Fri, Jun 6, 2014 at 4:24 PM, Justin Deoliveira <jdeolive@anonymised.com

wrote:

So had a look around and found this example of direct encoding, in

SimpleContentComplexEMFBinding:

/**
     * Calls getValue() and appends the result as child text of
<tt>value</tt>.
     */
    public Element encode(Object object, Document document, Element
value) throws Exception {
        EObject eobject = (EObject) object;
        if ( EMFUtils.has( eobject, "value") ) {
            Object v = EMFUtils.get( ((EObject)object), "value" );
            if ( v != null ) {
                valueappendChild( document.createTextNode( v.toString() )
);
            }
        }
        return value;
    }

I guess the binding has to either implement encode, or to return the
properties, right?

If I remember correctly it can do both if necessary. Properties are always
mapped to child elements.

Ah right, indeed I noticed that in SimpleContentComplexEMFBinding after
sending the mail

One thing that bothers me about the above example is that it's memory

bound, if I have
a large binary to put in the output module, I'd rather go with a
ContentHandler instead,
which is what the binary delegates expect anyways, for example:

public class RawDataEncoderDelegate implements EncoderDelegate {

    private RawData rawData;

    public RawDataEncoderDelegate(RawData rawData) {
        this.rawData = rawData;
    }

    * public void encode(ContentHandler output) throws Exception {*
        InputStream is = null;
        try {
            is = rawData.getInputStream();
            byte buffer = new byte[4096];
            int read = 0;
            while ((read = is.read(buffer)) > 0) {
                char chars;
                if (read == 4096) {
                    chars = new
String(Base64.encodeBase64(buffer)).toCharArray();
                } else {
                    byte reducedBuffer = new byte[read];
                    System.arraycopy(buffer, 0, reducedBuffer, 0, read);
                    chars = new
String(Base64.encodeBase64(reducedBuffer)).toCharArray();
                }

                output.characters(chars, 0, chars.length);
            }
        } finally {
            IOUtils.closeQuietly(is);
        }
    }

    public void encode(OutputStream os) throws IOException {
        IOUtils.copy(rawData.getInputStream(), os);
    }

Is this possible?

Sorry... not sure what you are asking here.

So basically, when running something like:
                valueappendChild( document.createTextNode( v.toString() ) );
I am creating a text node by loading all of its contents in memory.
Unfortunately,
a WPS result can be GB large, so I would like to avoid that.

Instead, the code above using the ContentHandler generates the output XML
in a
streaming fashion.
My question is, would it possible to do a streaming encode from the
encode() method
of a binding? What I see is that when we return a encoder delegate as a
property,
the delegate can indeed write the large content in a streaming fashion

I guess that the solution might be to return the binary encoder delegates
in the ComplexData parent, and then have the delegate write both the
ComplexData
tag, and its content?

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------

Chiming in where nobody calls me, but what if the response returned a URL where to get the raw data from instead of the Base64 encoded data? Too 90’s I know (ArcIMS memories anyone?), but I’d seriously consider that given the options. A expiration time for the response, whether it’ll be saved to disk and deleted after some time, or computed on the fly out of the URL hints, etc would be up to each process to decide?

···

On Fri, Jun 6, 2014 at 11:45 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:


Learn Graph Databases - Download FREE O’Reilly Book
“Graph Databases” is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech


Geoserver-devel mailing list
Geoserver-devel@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Gabriel Roldán

Software Developer | Boundless

groldan@anonymised.com

@gabrielroldan

On Fri, Jun 6, 2014 at 4:24 PM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

Ah right, indeed I noticed that in SimpleContentComplexEMFBinding after sending the mail

So basically, when running something like:

valueappendChild( document.createTextNode( v.toString() ) );

I am creating a text node by loading all of its contents in memory. Unfortunately,
a WPS result can be GB large, so I would like to avoid that.

Instead, the code above using the ContentHandler generates the output XML in a
streaming fashion.
My question is, would it possible to do a streaming encode from the encode() method
of a binding? What I see is that when we return a encoder delegate as a property,
the delegate can indeed write the large content in a streaming fashion

I guess that the solution might be to return the binary encoder delegates
in the ComplexData parent, and then have the delegate write both the ComplexData
tag, and its content?

Cheers

Andrea

==

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

==

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


If I remember correctly it can do both if necessary. Properties are always mapped to child elements.

So had a look around and found this example of direct encoding, in SimpleContentComplexEMFBinding:

/**

  • Calls getValue() and appends the result as child text of value.
    */
    public Element encode(Object object, Document document, Element value) throws Exception {
    EObject eobject = (EObject) object;
    if ( EMFUtils.has( eobject, “value”) ) {
    Object v = EMFUtils.get( ((EObject)object), “value” );
    if ( v != null ) {

valueappendChild( document.createTextNode( v.toString() ) );
}
}
return value;
}

I guess the binding has to either implement encode, or to return the properties, right?

Sorry… not sure what you are asking here.

One thing that bothers me about the above example is that it’s memory bound, if I have
a large binary to put in the output module, I’d rather go with a ContentHandler instead,
which is what the binary delegates expect anyways, for example:

public class RawDataEncoderDelegate implements EncoderDelegate {

private RawData rawData;

public RawDataEncoderDelegate(RawData rawData) {
this.rawData = rawData;
}

public void encode(ContentHandler output) throws Exception {
InputStream is = null;
try {
is = rawData.getInputStream();
byte buffer = new byte[4096];
int read = 0;
while ((read = is.read(buffer)) > 0) {
char chars;
if (read == 4096) {
chars = new String(Base64.encodeBase64(buffer)).toCharArray();
} else {
byte reducedBuffer = new byte[read];
System.arraycopy(buffer, 0, reducedBuffer, 0, read);
chars = new String(Base64.encodeBase64(reducedBuffer)).toCharArray();
}

output.characters(chars, 0, chars.length);
}
} finally {
IOUtils.closeQuietly(is);
}
}

public void encode(OutputStream os) throws IOException {
IOUtils.copy(rawData.getInputStream(), os);
}

Is this possible?

On Sat, Jun 7, 2014 at 12:34 AM, Gabriel Roldan <groldan@anonymised.com>
wrote:

Chiming in where nobody calls me, but what if the response returned a URL
where to get the raw data from instead of the Base64 encoded data? Too 90's
I know (ArcIMS memories anyone?), but I'd seriously consider that given the
options. A expiration time for the response, whether it'll be saved to disk
and deleted after some time, or computed on the fly out of the URL hints,
etc would be up to each process to decide?

Yes, WPS allows the binary data to be linked instead of being embedded in
the response document (or being
a direct raw return), however, it's not the server choice, it's the client
that requests things one way or the other,
and if nothing specific is requested, the standard mandates the inline
base64 response.

So, we need to be able to support both inline based 64 encoding, and linked
response, and raw direct responses.
We do the last two (the most reasonable ones imho) but we don't support the
first: this has to be amended,
since it makes the server not specification compliant

Cheers
Andrea

--

Meet us at GEO Business 2014! in London! Visit http://goo.gl/fES3aK
for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------