[Geoserver-devel] sldService patches

Hi,
in the project we're following we need to make a couple of variants
to the sldService, so I guess I'll need to discuss them a bit to make
sure we don't break FAO users of that module.

The first thing I need, is to change the generated rules so that they
all use closed intervals, such as:
0 <= x <= 10
10 < x <= 20
20 < x <= 30
as opposed of today's result:
x <= 10
10 < x <= 20
x > 20

In order to do that without breaking current users, I was thinking
of adding a new parameter, closedIntervals=true/false to the POST
call that generates the intervals.

The second issue is dealing with odd but not uncommon data histograms
like, for example {0 0 0 0 2 4 6 8}. If you ask the current quantile
classifier to classify that with 4 intervals, you'll get {0 0}, {0 0},
{2 4} {6 8}, which would result in very confusing rules (see attached
screenshot for an example).

One idea I had is to make the quantile function detect the flat areas
and make it create separate classes for them, and then classify the
rest using the standard sized classes. The above example would then become: {0 0 0 0} {2 4} {6 8}.
Another example: {0 1 2 3 3 3 4 5}, 2 classes -> {0 1 2} {3 3 3} {4 5},
that is, first isolate the flat area, then try to build classes of 4
elements with the rest of the data, accepting the fact that more classes
than requested may be generated.

Another possible approach, which is easier because we can handle it
in post processing, that is, when building the rules, would be to simply kill subsequent classes that do have the same min and max. With the
above examples, the rules would become:
{0 0 0 0 2 4 6 8}
x = 0 (killing the second x = 0 interval)
0 < x <= 4
4 < x <= 8
or if you prefer, {0 0 0 0} {2 4} {6 8}
and:
{0 1 2 3 3 3 4 5}
0 <= x <= 3
3 < x <= 5
that is {0 1 2 3 3 3} {4 5}

If the latter is good for FAO as well, I can just modify the way
we build the rules and avoid touching the underlying quantile classification function. Opinions?

Cheers
Andrea

Andrea Aime ha scritto:
...

The second issue is dealing with odd but not uncommon data histograms
like, for example {0 0 0 0 2 4 6 8}. If you ask the current quantile
classifier to classify that with 4 intervals, you'll get {0 0}, {0 0},
{2 4} {6 8}, which would result in very confusing rules (see attached
screenshot for an example).

And here is the attachemnt.
Cheers
Andrea

(attachments)

quantileConfusion.jpg

Hi Andrea,
see responses below…

On Thu, May 22, 2008 at 12:22 PM, Andrea Aime <aaime@anonymised.com> wrote:

Hi,
in the project we’re following we need to make a couple of variants
to the sldService, so I guess I’ll need to discuss them a bit to make
sure we don’t break FAO users of that module.

The first thing I need, is to change the generated rules so that they
all use closed intervals, such as:
0 <= x <= 10
10 < x <= 20
20 < x <= 30
as opposed of today’s result:
x <= 10
10 < x <= 20
x > 20

In order to do that without breaking current users, I was thinking
of adding a new parameter, closedIntervals=true/false to the POST
call that generates the intervals.

Even using directly closing intervals won’t break FAO implementation, so use the solution you prefer … however using the closingIntervals parameter would give higher control on the rules generation, so in my opinion we should go for it.

The second issue is dealing with odd but not uncommon data histograms
like, for example {0 0 0 0 2 4 6 8}. If you ask the current quantile
classifier to classify that with 4 intervals, you’ll get {0 0}, {0 0},
{2 4} {6 8}, which would result in very confusing rules (see attached
screenshot for an example).

One idea I had is to make the quantile function detect the flat areas
and make it create separate classes for them, and then classify the
rest using the standard sized classes. The above example would then become: {0 0 0 0} {2 4} {6 8}.
Another example: {0 1 2 3 3 3 4 5}, 2 classes → {0 1 2} {3 3 3} {4 5},
that is, first isolate the flat area, then try to build classes of 4
elements with the rest of the data, accepting the fact that more classes
than requested may be generated.

Another possible approach, which is easier because we can handle it
in post processing, that is, when building the rules, would be to simply kill subsequent classes that do have the same min and max. With the
above examples, the rules would become:
{0 0 0 0 2 4 6 8}
x = 0 (killing the second x = 0 interval)
0 < x <= 4
4 < x <= 8
or if you prefer, {0 0 0 0} {2 4} {6 8}
and:
{0 1 2 3 3 3 4 5}
0 <= x <= 3
3 < x <= 5
that is {0 1 2 3 3 3} {4 5}

If the latter is good for FAO as well, I can just modify the way
we build the rules and avoid touching the underlying quantile classification function. Opinions?

I would prefer that simply the quantile generates the right rules, I don’t think that post-processing would be a good approach. I would prefer to modify the client behaviour accepting the fact that quantile can generate a different number of classes than requested.

Cheers
Andrea

Eng. Alessio Fabiani
Vice-President /CTO GeoSolutions S.A.S.
Via Carignoni 51
55041 Camaiore (LU)
Italy

phone: +39 0584983027
fax: +39 0584983027
mob: +39 349 8227000

http://www.geo-solutions.it


Alessio Fabiani ha scritto:

Hi Andrea,
see responses below...

On Thu, May 22, 2008 at 12:22 PM, Andrea Aime <aaime@anonymised.com <mailto:aaime@anonymised.com>> wrote:

    Hi,
    in the project we're following we need to make a couple of variants
    to the sldService, so I guess I'll need to discuss them a bit to make
    sure we don't break FAO users of that module.

    The first thing I need, is to change the generated rules so that they
    all use closed intervals, such as:
    0 <= x <= 10
    10 < x <= 20
    20 < x <= 30
    as opposed of today's result:
    x <= 10
    10 < x <= 20
    x > 20

    In order to do that without breaking current users, I was thinking
    of adding a new parameter, closedIntervals=true/false to the POST
    call that generates the intervals.

Even using directly closing intervals won't break FAO implementation, so use the solution you prefer ... however using the closingIntervals parameter would give higher control on the rules generation, so in my opinion we should go for it.

Ok, I'll add the parameter.

    The second issue is dealing with odd but not uncommon data histograms
    like, for example {0 0 0 0 2 4 6 8}. If you ask the current quantile
    classifier to classify that with 4 intervals, you'll get {0 0}, {0 0},
    {2 4} {6 8}, which would result in very confusing rules (see attached
    screenshot for an example).

    One idea I had is to make the quantile function detect the flat areas
    and make it create separate classes for them, and then classify the
    rest using the standard sized classes. The above example would then
    become: {0 0 0 0} {2 4} {6 8}.
    Another example: {0 1 2 3 3 3 4 5}, 2 classes -> {0 1 2} {3 3 3} {4 5},
    that is, first isolate the flat area, then try to build classes of 4
    elements with the rest of the data, accepting the fact that more classes
    than requested may be generated.

    Another possible approach, which is easier because we can handle it
    in post processing, that is, when building the rules, would be to
    simply kill subsequent classes that do have the same min and max.
    With the
    above examples, the rules would become:
    {0 0 0 0 2 4 6 8}
    x = 0 (killing the second x = 0 interval)
    0 < x <= 4
    4 < x <= 8
    or if you prefer, {0 0 0 0} {2 4} {6 8}
    and:
    {0 1 2 3 3 3 4 5}
    0 <= x <= 3
    3 < x <= 5
    that is {0 1 2 3 3 3} {4 5}

    If the latter is good for FAO as well, I can just modify the way
    we build the rules and avoid touching the underlying quantile
    classification function. Opinions?

I would prefer that simply the quantile generates the right rules, I don't think that post-processing would be a good approach. I would prefer to modify the client behaviour accepting the fact that quantile can generate a different number of classes than requested.

It's not that simple for two reasons:
* the way the current quantile function works is mathematically sound,
   it's the data in the samples above that should not be fed into it.
   That is, {0 0} {0 0} {2 4} {6 8} is the right answer from a
   mathematical standpoint, 4 classes each of which has the same number
   of elements in it, and thus the same probability of representing
   a feature in the map.
* uDig is already using it as is, I cannot just go and change the
   functionality of it under its feet.

I tried to discuss this on the gt2-devel mailing list, but did not have
very good success. That's why post processing seems the easiest and most
effective way of dealing with it.
Cheers
Andrea

Mmm, I see … is there no chance to have some feedbaks from uDig guys? It could be a suitable solution to introduce a parameter here too in your opinion?

On Thu, May 22, 2008 at 3:05 PM, Andrea Aime <aaime@anonymised.com> wrote:

Alessio Fabiani ha scritto:

Hi Andrea,
see responses below…

On Thu, May 22, 2008 at 12:22 PM, Andrea Aime <aaime@anonymised.com mailto:[aaime@anonymised.com](mailto:aaime@anonymised.com)> wrote:

Hi,
in the project we’re following we need to make a couple of variants
to the sldService, so I guess I’ll need to discuss them a bit to make
sure we don’t break FAO users of that module.

The first thing I need, is to change the generated rules so that they
all use closed intervals, such as:
0 <= x <= 10
10 < x <= 20
20 < x <= 30
as opposed of today’s result:
x <= 10
10 < x <= 20
x > 20

In order to do that without breaking current users, I was thinking
of adding a new parameter, closedIntervals=true/false to the POST
call that generates the intervals.

Even using directly closing intervals won’t break FAO implementation, so use the solution you prefer … however using the closingIntervals parameter would give higher control on the rules generation, so in my opinion we should go for it.

Ok, I’ll add the parameter.

The second issue is dealing with odd but not uncommon data histograms
like, for example {0 0 0 0 2 4 6 8}. If you ask the current quantile
classifier to classify that with 4 intervals, you’ll get {0 0}, {0 0},
{2 4} {6 8}, which would result in very confusing rules (see attached
screenshot for an example).

One idea I had is to make the quantile function detect the flat areas
and make it create separate classes for them, and then classify the
rest using the standard sized classes. The above example would then
become: {0 0 0 0} {2 4} {6 8}.
Another example: {0 1 2 3 3 3 4 5}, 2 classes → {0 1 2} {3 3 3} {4 5},
that is, first isolate the flat area, then try to build classes of 4
elements with the rest of the data, accepting the fact that more classes
than requested may be generated.

Another possible approach, which is easier because we can handle it
in post processing, that is, when building the rules, would be to
simply kill subsequent classes that do have the same min and max.
With the
above examples, the rules would become:
{0 0 0 0 2 4 6 8}
x = 0 (killing the second x = 0 interval)
0 < x <= 4
4 < x <= 8
or if you prefer, {0 0 0 0} {2 4} {6 8}
and:
{0 1 2 3 3 3 4 5}
0 <= x <= 3
3 < x <= 5
that is {0 1 2 3 3 3} {4 5}

If the latter is good for FAO as well, I can just modify the way
we build the rules and avoid touching the underlying quantile
classification function. Opinions?

I would prefer that simply the quantile generates the right rules, I don’t think that post-processing would be a good approach. I would prefer to modify the client behaviour accepting the fact that quantile can generate a different number of classes than requested.

It’s not that simple for two reasons:

  • the way the current quantile function works is mathematically sound,
    it’s the data in the samples above that should not be fed into it.
    That is, {0 0} {0 0} {2 4} {6 8} is the right answer from a
    mathematical standpoint, 4 classes each of which has the same number
    of elements in it, and thus the same probability of representing
    a feature in the map.
  • uDig is already using it as is, I cannot just go and change the
    functionality of it under its feet.

I tried to discuss this on the gt2-devel mailing list, but did not have
very good success. That’s why post processing seems the easiest and most
effective way of dealing with it.
Cheers
Andrea

Eng. Alessio Fabiani
Vice-President /CTO GeoSolutions S.A.S.
Via Carignoni 51
55041 Camaiore (LU)
Italy

phone: +39 0584983027
fax: +39 0584983027
mob: +39 349 8227000

http://www.geo-solutions.it


Alessio Fabiani ha scritto:

Mmm, I see ... is there no chance to have some feedbaks from uDig guys? It could be a suitable solution to introduce a parameter here too in your opinion?

I frankly don't know. Jody proposed a solution already, but it was like
making up category names:

0-0 Category A
0-0 Category B
4-6 Category C
6-8 Category D

I don't believe this solution is acceptable. That's why, after a few
days since my mails, I'm not looking for alternate routes.
Cheers
Andrea

Ok,

I don’t want you waste time anymore, so go for the solution you prefer; since FAO project is yet on a beginning state about SLD color quantization, it would not be a problem to adapt the code consequently.

On Thu, May 22, 2008 at 3:16 PM, Andrea Aime <aaime@anonymised.com> wrote:

Alessio Fabiani ha scritto:

Mmm, I see … is there no chance to have some feedbaks from uDig guys? It could be a suitable solution to introduce a parameter here too in your opinion?

I frankly don’t know. Jody proposed a solution already, but it was like
making up category names:

0-0 Category A
0-0 Category B
4-6 Category C
6-8 Category D

I don’t believe this solution is acceptable. That’s why, after a few
days since my mails, I’m not looking for alternate routes.
Cheers
Andrea

Eng. Alessio Fabiani
Vice-President /CTO GeoSolutions S.A.S.
Via Carignoni 51
55041 Camaiore (LU)
Italy

phone: +39 0584983027
fax: +39 0584983027
mob: +39 349 8227000

http://www.geo-solutions.it


Andrea Aime ha scritto:

Alessio Fabiani ha scritto:

Mmm, I see ... is there no chance to have some feedbaks from uDig guys? It could be a suitable solution to introduce a parameter here too in your opinion?

I frankly don't know. Jody proposed a solution already, but it was like
making up category names:

0-0 Category A
0-0 Category B
4-6 Category C
6-8 Category D

I don't believe this solution is acceptable. That's why, after a few
days since my mails, I'm not looking for alternate routes.

Typo: "I'm now looking for alternate routes"
Cheers
Andrea

Alessio Fabiani ha scritto:

Ok,

I don't want you waste time anymore, so go for the solution you prefer; since FAO project is yet on a beginning state about SLD color quantization, it would not be a problem to adapt the code consequently.

Ok, I added a parameter to the rules generator to allow both closed
and open approach, I added a parameter to the POST request so that
it's possible to choose which approach to follow.

For the xxxGenerator classes you added, I just defaulted them to use
the open approach for backwards compatibility, feel free to add
a param there too if you want it.

Cheers
Andrea

Alessio Fabiani wrote:

Even using directly closing intervals won't break FAO implementation, so use the solution you prefer ... however using the closingIntervals parameter would give higher control on the rules generation, so in my opinion we should go for it.

I concur; lets do what is needed - it sounds like when we get to these corner cases there is no mathematical definition that captures the right way; we have been asked to make equal size buckets and this is the best we could do...

But really this just seems to be a problem of generating the document want based on the resulting categorization object right? Do we need to change the generated ranges? or just how we interpret them...

As for closing intervals or not we could make separate function names; so it is really explicit what is going on (rather than having a magic boolean flag). I have have suggested we check what the "offical" function does for this work as recently defined in the SE 1.1 specification; but thus far nobody has done it...

So let me look it up, and tag in Eclesia who actually cares about this stuff...
They have:
- Categorization: The transformation of continuous values to distinct values. This is for example needed to generate choropleth maps from continuous attributes. Another example would be the stepwise selection of different text heights or line widths in dependence from such an attribute.
- Interpolation: Transformation of continuous values by a function defined on a number of nodes. This is used to adjust the value distribution of an attribute to the desired distribution of a continuous symbolization control variable (like size, width, color, etc).
- Recoding: Transformation of discrete values to any other values. This is needed when integers have to be translated into text or, reversely, text contents into other texts or numeric values or colors.

So it looks like "Categoriation" is the one we care about ... here is the example:

<SvgParameter name="stroke-width">
  <Categorize fallbackValue="1">
  <LookupValue>
    <ogc:PropertyName>vehiclesPerHour</ogc:PropertyName>
  </LookupValue>
  <Value>1</Value>
  <Threshold>5000</Threshold>
  <Value>2</Value>
  <Threshold>15000</Threshold>
  <Value>3</Value>
  <Threshold>40000</Threshold>
  <Value>4</Value>
  <Threshold>75000</Threshold>
  <Value>5</Value>
  </Categorize>
</SvgParameter>

To be interpreted as (4999 or less: 1 pixel; 5000..14999: 2 pixel; 15000..39999: 3 pixel; 40000..74999: 4 pixel; 75000+: 5 pixel).

So the function we currently have should be called renamed "categorization", and we can make up a new function for the "closed" intervals?

Thinking ...

if we really want to represent closed intervals I would generate the same data structure and produce something like this ...

x <= 0 no data
0 <= x <= 10 yellow
10 < x <= 20 orange
20 < x <= 30 red
30 < x no data

That is use the same boundary conditions; just explicitly map where the no data sections are.

    The second issue is dealing with odd but not uncommon data histograms
    like, for example {0 0 0 0 2 4 6 8}. If you ask the current quantile
    classifier to classify that with 4 intervals, you'll get {0 0}, {0 0},
    {2 4} {6 8}, which would result in very confusing rules (see attached
    screenshot for an example).

    One idea I had is to make the quantile function detect the flat areas
    and make it create separate classes for them, and then classify the
    rest using the standard sized classes. The above example would
    then become: {0 0 0 0} {2 4} {6 8}.
    Another example: {0 1 2 3 3 3 4 5}, 2 classes -> {0 1 2} {3 3 3}
    {4 5},
    that is, first isolate the flat area, then try to build classes of 4
    elements with the rest of the data, accepting the fact that more
    classes
    than requested may be generated.

    Another possible approach, which is easier because we can handle it
    in post processing, that is, when building the rules, would be to
    simply kill subsequent classes that do have the same min and max.
    With the
    above examples, the rules would become:
    {0 0 0 0 2 4 6 8}
    x = 0 (killing the second x = 0 interval)
    0 < x <= 4
    4 < x <= 8
    or if you prefer, {0 0 0 0} {2 4} {6 8}
    and:
    {0 1 2 3 3 3 4 5}
    0 <= x <= 3
    3 < x <= 5
    that is {0 1 2 3 3 3} {4 5}

    If the latter is good for FAO as well, I can just modify the way
    we build the rules and avoid touching the underlying quantile
    classification function. Opinions?

I would prefer that simply the quantile generates the right rules, I don't think that post-processing would be a good approach. I would prefer to modify the client behaviour accepting the fact that quantile can generate a different number of classes than requested.

    Cheers
    Andrea

--
-------------------------------------------------------
Eng. Alessio Fabiani
Vice-President /CTO GeoSolutions S.A.S.
Via Carignoni 51
55041 Camaiore (LU)
Italy

phone: +39 0584983027
fax: +39 0584983027
mob: +39 349 8227000

http://www.geo-solutions.it

-------------------------------------------------------
------------------------------------------------------------------------

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
------------------------------------------------------------------------

_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel
  

Jody Garnett ha scritto:

Alessio Fabiani wrote:

...

But really this just seems to be a problem of generating the document want based on the resulting categorization object right? Do we need to change the generated ranges? or just how we interpret them...

I had to do both. The end user found anyting other than closed intervals too difficult to understand, so I had to change the way we generated
the rules out of the classification output (this is presentation changes
only).
Yet, I also had to work around intervals like {0 0} {0 0} that do
not make sense once you try to apply them, since the first rule you
generate will catch all the features whose attribute is 0.
In the end my code just ignores subsequent intervals that do match
the rule maxInterval(i) == maxInterval(i-1). This could have been done
in the quantile function, but as I noted in gt2, this initial suggestion
did not seem to find good feedback so I just gave up.

As for closing intervals or not we could make separate function names; so it is really explicit what is going on (rather than having a magic boolean flag). I have have suggested we check what the "offical" function does for this work as recently defined in the SE 1.1 specification; but thus far nobody has done it...

This is not a matter of what is official, it's a matter of what the
customer wants. They don't give a damn to OGC standards, I already
tried, and they found WMS calls (the simplest example of OGC standard)
way too complex.

So let me look it up, and tag in Eclesia who actually cares about this stuff...
They have:
- Categorization: The transformation of continuous values to distinct values. This is for example needed to generate choropleth maps from continuous attributes. Another example would be the stepwise selection of different text heights or line widths in dependence from such an attribute.
- Interpolation: Transformation of continuous values by a function defined on a number of nodes. This is used to adjust the value distribution of an attribute to the desired distribution of a continuous symbolization control variable (like size, width, color, etc).
- Recoding: Transformation of discrete values to any other values. This is needed when integers have to be translated into text or, reversely, text contents into other texts or numeric values or colors.

So it looks like "Categoriation" is the one we care about ... here is the example:

<SvgParameter name="stroke-width">
  <Categorize fallbackValue="1">
  <LookupValue>
    <ogc:PropertyName>vehiclesPerHour</ogc:PropertyName>
  </LookupValue>
  <Value>1</Value>
  <Threshold>5000</Threshold>
  <Value>2</Value>
  <Threshold>15000</Threshold>
  <Value>3</Value>
  <Threshold>40000</Threshold>
  <Value>4</Value>
  <Threshold>75000</Threshold>
  <Value>5</Value>
  </Categorize>
</SvgParameter>

To be interpreted as (4999 or less: 1 pixel; 5000..14999: 2 pixel; 15000..39999: 3 pixel; 40000..74999: 4 pixel; 75000+: 5 pixel).

Interesting, I did not know SE allowed for this. Wow, this means
in SE it's possible to make crosstab like maps easily, that is, for example, have the line width depend on one attribute, and the line color
depend on another.

So the function we currently have should be called renamed "categorization", and we can make up a new function for the "closed" intervals?
Thinking ...

if we really want to represent closed intervals I would generate the same data structure and produce something like this ...

x <= 0 no data
0 <= x <= 10 yellow
10 < x <= 20 orange
20 < x <= 30 red
30 < x no data

Nope, when the user saw the open intervals for the first and last rules
he thought the meaning was exactly like the one you proposed and asked
us to remove the useless rules (first and last). What he wants is exactly this:
0 <= x <= 10 yellow
10 < x <= 20 orange
20 < x <= 30 red

The first implementation of the sldService by FAO used to output this
instead and confused the hell out of the users:
x <= 10 yellow
10 < x <= 20 orange
20 < x red

But as we both noted, this is just a matter of interpreting the
classification function outputs.

Cheers
Andrea

Andrea Aime wrote:

the rule maxInterval(i) == maxInterval(i-1). This could have been done
in the quantile function, but as I noted in gt2, this initial suggestion
did not seem to find good feedback so I just gave up.

Not good - some kind of communication gap; I was happy with your suggestion. The way you asked it made me think you were wanting to know what was correct; acuster did some research into what was correct (for your flat areas), it sounded like we were all happy with your ideas to me?

As for closing intervals or not we could make separate function names; so it is really explicit what is going on (rather than having a magic boolean flag). I have have suggested we check what the "offical" function does for this work as recently defined in the SE 1.1 specification; but thus far nobody has done it...

This is not a matter of what is official, it's a matter of what the customer wants. They don't give a damn to OGC standards, I already tried, and they found WMS calls (the simplest example of OGC standard) way too complex.

I did not care what the customer wanted; only what you wanted (and I thought you were asking what was correct). As such we looked at what R stats did, and I figured the OGC function for this purpose may have something useful to say.

To be interpreted as (4999 or less: 1 pixel; 5000..14999: 2 pixel; 15000..39999: 3 pixel; 40000..74999: 4 pixel; 75000+: 5 pixel).

Interesting, I did not know SE allowed for this. Wow, this means
in SE it's possible to make crosstab like maps easily, that is, for example, have the line width depend on one attribute, and the line color
depend on another.

As I have been saying for a couple months; Eclesia is into some interesting work and is going in there with very little feedback form the community.
And yes this is exactly why I am excited and paying attention to his activities (much to his annoyance I am sure).

Nope, when the user saw the open intervals for the first and last rules
he thought the meaning was exactly like the one you proposed and asked
us to remove the useless rules (first and last). What he wants is exactly this:
0 <= x <= 10 yellow
10 < x <= 20 orange
20 < x <= 30 red

Okay sounds fine; lets do it. We display the same concept to users in uDig (and have an "else" clause for the rest).

Cheers,
Jody