Alessio Fabiani wrote:
Even using directly closing intervals won't break FAO implementation, so use the solution you prefer ... however using the closingIntervals parameter would give higher control on the rules generation, so in my opinion we should go for it.
I concur; lets do what is needed - it sounds like when we get to these corner cases there is no mathematical definition that captures the right way; we have been asked to make equal size buckets and this is the best we could do...
But really this just seems to be a problem of generating the document want based on the resulting categorization object right? Do we need to change the generated ranges? or just how we interpret them...
As for closing intervals or not we could make separate function names; so it is really explicit what is going on (rather than having a magic boolean flag). I have have suggested we check what the "offical" function does for this work as recently defined in the SE 1.1 specification; but thus far nobody has done it...
So let me look it up, and tag in Eclesia who actually cares about this stuff...
They have:
- Categorization: The transformation of continuous values to distinct values. This is for example needed to generate choropleth maps from continuous attributes. Another example would be the stepwise selection of different text heights or line widths in dependence from such an attribute.
- Interpolation: Transformation of continuous values by a function defined on a number of nodes. This is used to adjust the value distribution of an attribute to the desired distribution of a continuous symbolization control variable (like size, width, color, etc).
- Recoding: Transformation of discrete values to any other values. This is needed when integers have to be translated into text or, reversely, text contents into other texts or numeric values or colors.
So it looks like "Categoriation" is the one we care about ... here is the example:
<SvgParameter name="stroke-width">
<Categorize fallbackValue="1">
<LookupValue>
<ogc:PropertyName>vehiclesPerHour</ogc:PropertyName>
</LookupValue>
<Value>1</Value>
<Threshold>5000</Threshold>
<Value>2</Value>
<Threshold>15000</Threshold>
<Value>3</Value>
<Threshold>40000</Threshold>
<Value>4</Value>
<Threshold>75000</Threshold>
<Value>5</Value>
</Categorize>
</SvgParameter>
To be interpreted as (4999 or less: 1 pixel; 5000..14999: 2 pixel; 15000..39999: 3 pixel; 40000..74999: 4 pixel; 75000+: 5 pixel).
So the function we currently have should be called renamed "categorization", and we can make up a new function for the "closed" intervals?
Thinking ...
if we really want to represent closed intervals I would generate the same data structure and produce something like this ...
x <= 0 no data
0 <= x <= 10 yellow
10 < x <= 20 orange
20 < x <= 30 red
30 < x no data
That is use the same boundary conditions; just explicitly map where the no data sections are.
The second issue is dealing with odd but not uncommon data histograms
like, for example {0 0 0 0 2 4 6 8}. If you ask the current quantile
classifier to classify that with 4 intervals, you'll get {0 0}, {0 0},
{2 4} {6 8}, which would result in very confusing rules (see attached
screenshot for an example).
One idea I had is to make the quantile function detect the flat areas
and make it create separate classes for them, and then classify the
rest using the standard sized classes. The above example would
then become: {0 0 0 0} {2 4} {6 8}.
Another example: {0 1 2 3 3 3 4 5}, 2 classes -> {0 1 2} {3 3 3}
{4 5},
that is, first isolate the flat area, then try to build classes of 4
elements with the rest of the data, accepting the fact that more
classes
than requested may be generated.
Another possible approach, which is easier because we can handle it
in post processing, that is, when building the rules, would be to
simply kill subsequent classes that do have the same min and max.
With the
above examples, the rules would become:
{0 0 0 0 2 4 6 8}
x = 0 (killing the second x = 0 interval)
0 < x <= 4
4 < x <= 8
or if you prefer, {0 0 0 0} {2 4} {6 8}
and:
{0 1 2 3 3 3 4 5}
0 <= x <= 3
3 < x <= 5
that is {0 1 2 3 3 3} {4 5}
If the latter is good for FAO as well, I can just modify the way
we build the rules and avoid touching the underlying quantile
classification function. Opinions?
I would prefer that simply the quantile generates the right rules, I don't think that post-processing would be a good approach. I would prefer to modify the client behaviour accepting the fact that quantile can generate a different number of classes than requested.
Cheers
Andrea
--
-------------------------------------------------------
Eng. Alessio Fabiani
Vice-President /CTO GeoSolutions S.A.S.
Via Carignoni 51
55041 Camaiore (LU)
Italy
phone: +39 0584983027
fax: +39 0584983027
mob: +39 349 8227000
http://www.geo-solutions.it
-------------------------------------------------------
------------------------------------------------------------------------
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
------------------------------------------------------------------------
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel