Hello everyone,
I'd like to enhance WFS responses in JSON/JSONP format. In some of our Web GIS Applications, one may download up to several thousand Simple Features through WFS. Some Layers may have between 400 and 500 columns.
Although JSON produces much smaller responses than XML/GML, such responses quickly have sizes between 60 and 120 MB (uncompressed).
The idea is to remove some redundancy from the GeoJSON format, which provides a feature's attributes in a key/value map (JSON Object):
{
"type":"FeatureCollection",
"features":[
{
"type":"Feature",
"id":"bugsites.3",
"geometry":{
"type":"Point",
"coordinates":[
590529,
4914625
]
},
"geometry_name":"the_geom",
"properties":{
"cat":3,
"str1":"Beetle site"
}
},
{
"type":"Feature",
"id":"bugsites.4",
"geometry":{
"type":"Point",
"coordinates":[
590546,
4915353
]
},
"geometry_name":"the_geom",
"properties":{
"cat":4,
"str1":"Beetle site"
}
}
],
"totalFeatures":2,
"numberMatched":2,
"numberReturned":2,
"timeStamp":"2022-09-13T08:44:45.118Z",
"crs":{
"type":"name",
"properties":{
"name":"urn:ogc:def:crs:EPSG::26713"
}
}
}
Also, the "geometry_name" property is repeated for every feature returned.
With lots of features and columns (the latter do not necessarily have short names), this repeated schema information can quickly become the dominating factor regarding the size of the response. (Of course, that also depends on the type and complexity of the geometry.)
Likely the repeated "type":"Feature" could be omitted as well. It's just there in order to satisfy the GeoJSON specs. Maybe the "level of compaction" could be specified for a request.
By including schema information only once in the FeatureCollection object, a more compact form of the JSON response may look like this:
{
"type":"FeatureCollection",
"features":[
{
"type":"Feature",
"id":"bugsites.3",
"geometry":{
"type":"Point",
"coordinates":[
590529,
4914625
]
},
"geometry_name":"the_geom",
"properties":[
3,
"Beetle site"
]
},
{
"type":"Feature",
"id":"bugsites.4",
"geometry":{
"type":"Point",
"coordinates":[
590546,
4915353
]
},
"properties":[
4,
"Beetle site"
]
}
],
"totalFeatures":2,
"numberMatched":2,
"numberReturned":2,
"timeStamp":"2022-09-13T08:44:45.118Z",
"crs":{
"type":"name",
"properties":{
"name":"urn:ogc:def:crs:EPSG::26713"
}
},
"schema":{
"geometry_name":"the_geom",
"properties":[
"cat",
"str1"
]
}
}
Here, a new "schema" object in the root "FeatureCollection" object contains the name of the geometry field as well as the names of the other properties as an array. The "properties" object in the "Feature" objects have become arrays as well, containing the property values only.
Both arrays are "parallel", that is:
Key: "schema"."properties"[0] => cat
Val: features[N]."properties"[0] => 3 or 4 (depending on N)
Key: "schema"."properties"[1] => str1
Val: features[N]."properties"[1] => Beetle site
In the above example with only two short-named fields savings are almost zero. However, with requests getting some thousand features, each having 300+ fields, savings may be quite significant.
The new compact format is not GeoJSON, of course. However, what GeoServer currently returns is already not really compatible with GeoJSON specs, which, for example, only permit EPGS:4326 coordinates. In any case, that format is likely much smaller for requests described above.
On the wire, these responses are typically compressed (deflate, brotli etc.). However, compressing smaller amounts of data typically results in smaller compressed junks. Also, it is not only about transferring the data; the data must be created and compressed by GeoServer and must be decompressed and parsed in the client. After all, smaller junks of data seem to be less resource consuming than larger ones.
I'd like to add this new compact format directly into GeoServer's core WFS JSON code:
gs-wfs: org.geoserver.wfs.json.GeoJSONGetFeatureResponse
More or less, only method encodeSimpleFeatures must be modified in order to implement this new compact JSON format.
However, that method iterates over a list of FeatureCollection objects (argument List<FeatureCollection> resultsList). Could there be more than one FeatureCollection is the request returns simple features only? Shouldn't all simple features requested through WFS be of the same type? In other words, must I expect to deal with several distinct "schema" objects when requesting simple features? (don't think so)
Since using this format, of course, is optional, I need a mechanism to tell the server whether to return compact JSON or not (and maybe the level of compaction it should use).
One quite obvious option is to use vendor parameter "format_options". Here, a parameter like "json:compact" could trigger responding with compact JSON. This seems quite simple to implement.
Another idea is to use different outputFormat MIME types or MIME types with additional parameters:
application/json; type=compact; omitTypes=true
text/javascript; type=compact
Although distinguishing the format via the MIME type may be some more work to do, I do prefer this approach over the "format_options" way.
What else do I have to consider?
Many thanks in advance for your ideas on this
Regards, Carsten