[GRASS-dev] some detail questions on i.segment

Hi Markus,

I'm working on potentially improbing the i.segment.uspo addon and am looking at the possibility of including the goodness of fit output map somehow in the evaluation of the quality of the segmentation.

For that, I need to exactly understand the goodness of fit measure.

As a starter: why is the threshold parameter (globals->alpha) squared before being used in create_isegs.c (and in the calculation of the goodness of fit) ? Is it because i.segment works with the squared distance and not the actual distance ?

IIUC, the worst goodness of fit measure (i.e. 1 - difference) is equal to the 1 - threshold parameter value. This thus means that if one would want to compare segmentations done with different threshold values by comparing mean goodness of fit, for example, this would have to be scaled taking into account the respective parameter value. Would something like

( goodness of fit - (1 - threshold parameter value) ) / threshold parameter value

make sense ?

BTW, in write_output.c, in the comments starting at line 82, there is mention of a globals->threshold, but there is not threshold in the globals structure... I guess this should read globals->alpha or threshold->answer, or ?

Moritz

Hi Moritz,

On Wed, May 18, 2016 at 6:36 PM, Moritz Lennert
<mlennert@club.worldonline.be> wrote:

Hi Markus,

I'm working on potentially improbing the i.segment.uspo addon and am looking
at the possibility of including the goodness of fit output map somehow in
the evaluation of the quality of the segmentation.

For that, I need to exactly understand the goodness of fit measure.

As a starter: why is the threshold parameter (globals->alpha) squared before
being used in create_isegs.c (and in the calculation of the goodness of fit)
? Is it because i.segment works with the squared distance and not the actual
distance ?

Yes, i.segment works with the squared distance to avoid sqrt() which
is slow. All that matters is if the distance is larger or smaller than
threshold, and this relation is the same with squared values.

IIUC, the worst goodness of fit measure (i.e. 1 - difference) is equal to
the 1 - threshold parameter value. This thus means that if one would want to
compare segmentations done with different threshold values by comparing mean
goodness of fit, for example, this would have to be scaled taking into
account the respective parameter value. Would something like

( goodness of fit - (1 - threshold parameter value) ) / threshold parameter
value

make sense ?

The goodness of fit is currently 1 - similarity by comparing the
current cell values to the object's mean values. Similarity is in the
range [0, 1], 0 means identical, 1 means maximum possible difference.
With the region growing algorithm, that difference can actually be
larger than the given threshold if a cell is included in an object and
subsequent growing of the object shifts the mean away.

BTW, in write_output.c, in the comments starting at line 82, there is
mention of a globals->threshold, but there is not threshold in the globals
structure... I guess this should read globals->alpha or threshold->answer,
or ?

The comments starting at line 82 in write_output.c are an idea for
goodness of fit, the actual goodness of fit is calculated in lines 168
and 182.

HTH,

Markus

Hi,

Another question concernant i.segment's details:

IIUC, the threshold used in region-growing is normalized using a common denominator defined by:

divisor = globals->nrows + globals->ncols;
(BTW, why '+', not '*' ?)

Row and column numbers come from

globals->nrows = Rast_window_rows();
globals->ncols = Rast_window_cols();

The threshold is then adjusted to take into account object size, i.e. to favor merging of smaller regions compared to merging of larger regions:

adjthresh = pow(alpha2, 1. + (double) smaller / divisor);

It is this adjusted threshold that is used to decide whether to merge regions or not, depending on whether their similarity is smaller than this adjusted threshold or not.:

if (compare_double(Ri_similarity, adjthresh) == -1)

Ri_similarity is normalized by

val /= globals->max_diff;

where globals->max_diff is defined as the difference between max anf min values in the input file, as obtained by

Rast_get_fp_range_min_max(&(fp_range[n]), &min[n], &max[n]);

I hope that I've understood all of this correctly.

Now my question:

Are nrows and ncols region-dependent, i.e. will the divisor in the calculation of the adjusted vary depending on the region I defined ?

And max->diff do I understand correctly that Rast_get_fp_range_min_max() is region-independent, i.e. that if I take different regions of the same image, I will always get the same max_diff ?

If this is correct, does this mean the region size might determine whether some objects (or pixels) are merged or not ?

This would put into question the determination of a good threshold by testing on small regions as the same threshold might not have the same effect in larger regions, or ?

Moritz

On 20/05/16 18:40, Markus Metz wrote:

Hi Moritz,

On Wed, May 18, 2016 at 6:36 PM, Moritz Lennert
<mlennert@club.worldonline.be> wrote:

Hi Markus,

I'm working on potentially improbing the i.segment.uspo addon and am looking
at the possibility of including the goodness of fit output map somehow in
the evaluation of the quality of the segmentation.

For that, I need to exactly understand the goodness of fit measure.

As a starter: why is the threshold parameter (globals->alpha) squared before
being used in create_isegs.c (and in the calculation of the goodness of fit)
? Is it because i.segment works with the squared distance and not the actual
distance ?

Yes, i.segment works with the squared distance to avoid sqrt() which
is slow. All that matters is if the distance is larger or smaller than
threshold, and this relation is the same with squared values.

IIUC, the worst goodness of fit measure (i.e. 1 - difference) is equal to
the 1 - threshold parameter value. This thus means that if one would want to
compare segmentations done with different threshold values by comparing mean
goodness of fit, for example, this would have to be scaled taking into
account the respective parameter value. Would something like

( goodness of fit - (1 - threshold parameter value) ) / threshold parameter
value

make sense ?

The goodness of fit is currently 1 - similarity by comparing the
current cell values to the object's mean values. Similarity is in the
range [0, 1], 0 means identical, 1 means maximum possible difference.
With the region growing algorithm, that difference can actually be
larger than the given threshold if a cell is included in an object and
subsequent growing of the object shifts the mean away.

BTW, in write_output.c, in the comments starting at line 82, there is
mention of a globals->threshold, but there is not threshold in the globals
structure... I guess this should read globals->alpha or threshold->answer,
or ?

The comments starting at line 82 in write_output.c are an idea for
goodness of fit, the actual goodness of fit is calculated in lines 168
and 182.

HTH,

Markus

--
Département Géosciences, Environnement et Société
Université Libre de Bruxelles
Bureau: S.DB.6.138
CP 130/03
Av. F.D. Roosevelt 50
1050 Bruxelles
Belgique

tél. + 32 2 650.68.12 / 68.11 (secr.)
fax + 32 2 650.68.30

On Wed, Jun 1, 2016 at 2:24 PM, Moritz Lennert
<mlennert@club.worldonline.be> wrote:

Hi,

Another question concernant i.segment's details:

IIUC, the threshold used in region-growing is normalized using a common
denominator defined by:

divisor = globals->nrows + globals->ncols;
(BTW, why '+', not '*' ?)

The divisor would become too large and the adjustment would have no effect.

Row and column numbers come from

globals->nrows = Rast_window_rows();
globals->ncols = Rast_window_cols();

The threshold is then adjusted to take into account object size, i.e. to
favor merging of smaller regions compared to merging of larger regions:

adjthresh = pow(alpha2, 1. + (double) smaller / divisor);

It is this adjusted threshold that is used to decide whether to merge
regions or not, depending on whether their similarity is smaller than this
adjusted threshold or not.:

if (compare_double(Ri_similarity, adjthresh) == -1)

Ri_similarity is normalized by

val /= globals->max_diff;

where globals->max_diff is defined as the difference between max anf min
values in the input file, as obtained by

Rast_get_fp_range_min_max(&(fp_range[n]), &min[n], &max[n]);

I hope that I've understood all of this correctly.

Now my question:

Are nrows and ncols region-dependent, i.e. will the divisor in the
calculation of the adjusted vary depending on the region I defined ?

Yes, nrows and ncols come from the current region.

And max->diff do I understand correctly that Rast_get_fp_range_min_max() is
region-independent, i.e. that if I take different regions of the same image,
I will always get the same max_diff ?

Yes, this way you can test settings on a small region before applying
them to a larger region.

If this is correct, does this mean the region size might determine whether
some objects (or pixels) are merged or not ?

Yes. In effect, the computational region size determines whether an
object is large or small.

This would put into question the determination of a good threshold by
testing on small regions as the same threshold might not have the same
effect in larger regions, or ?

Indeed. There are no comments in the code (apart from "TODO: better")
explaining the reason for this adjustment. In theory it makes sense to
me to favour merging of smaller regions, or more precisely, to avoid
merging of larger regions. "Small" and "large" depend on the
computational region. When testing for the previous GSoC project on
i.segment, I did not notice drastic differences when changing the
computational region.

Markus

On 20/05/16 18:40, Markus Metz wrote:

Hi Moritz,

On Wed, May 18, 2016 at 6:36 PM, Moritz Lennert
<mlennert@club.worldonline.be> wrote:

Hi Markus,

I'm working on potentially improbing the i.segment.uspo addon and am
looking
at the possibility of including the goodness of fit output map somehow in
the evaluation of the quality of the segmentation.

For that, I need to exactly understand the goodness of fit measure.

As a starter: why is the threshold parameter (globals->alpha) squared
before
being used in create_isegs.c (and in the calculation of the goodness of
fit)
? Is it because i.segment works with the squared distance and not the
actual
distance ?

Yes, i.segment works with the squared distance to avoid sqrt() which
is slow. All that matters is if the distance is larger or smaller than
threshold, and this relation is the same with squared values.

IIUC, the worst goodness of fit measure (i.e. 1 - difference) is equal to
the 1 - threshold parameter value. This thus means that if one would want
to
compare segmentations done with different threshold values by comparing
mean
goodness of fit, for example, this would have to be scaled taking into
account the respective parameter value. Would something like

( goodness of fit - (1 - threshold parameter value) ) / threshold
parameter
value

make sense ?

The goodness of fit is currently 1 - similarity by comparing the
current cell values to the object's mean values. Similarity is in the
range [0, 1], 0 means identical, 1 means maximum possible difference.
With the region growing algorithm, that difference can actually be
larger than the given threshold if a cell is included in an object and
subsequent growing of the object shifts the mean away.

BTW, in write_output.c, in the comments starting at line 82, there is
mention of a globals->threshold, but there is not threshold in the
globals
structure... I guess this should read globals->alpha or
threshold->answer,
or ?

The comments starting at line 82 in write_output.c are an idea for
goodness of fit, the actual goodness of fit is calculated in lines 168
and 182.

HTH,

Markus

--
Département Géosciences, Environnement et Société
Université Libre de Bruxelles
Bureau: S.DB.6.138
CP 130/03
Av. F.D. Roosevelt 50
1050 Bruxelles
Belgique

tél. + 32 2 650.68.12 / 68.11 (secr.)
fax + 32 2 650.68.30

On 03/06/16 10:05, Markus Metz wrote:

On Wed, Jun 1, 2016 at 2:24 PM, Moritz Lennert
<mlennert@club.worldonline.be> wrote:

Hi,

Another question concernant i.segment's details:

IIUC, the threshold used in region-growing is normalized using a common
denominator defined by:

divisor = globals->nrows + globals->ncols;
(BTW, why '+', not '*' ?)

The divisor would become too large and the adjustment would have no effect.

Row and column numbers come from

globals->nrows = Rast_window_rows();
globals->ncols = Rast_window_cols();

The threshold is then adjusted to take into account object size, i.e. to
favor merging of smaller regions compared to merging of larger regions:

adjthresh = pow(alpha2, 1. + (double) smaller / divisor);

It is this adjusted threshold that is used to decide whether to merge
regions or not, depending on whether their similarity is smaller than this
adjusted threshold or not.:

if (compare_double(Ri_similarity, adjthresh) == -1)

Ri_similarity is normalized by

val /= globals->max_diff;

where globals->max_diff is defined as the difference between max anf min
values in the input file, as obtained by

Rast_get_fp_range_min_max(&(fp_range[n]), &min[n], &max[n]);

I hope that I've understood all of this correctly.

Now my question:

Are nrows and ncols region-dependent, i.e. will the divisor in the
calculation of the adjusted vary depending on the region I defined ?

Yes, nrows and ncols come from the current region.

And max->diff do I understand correctly that Rast_get_fp_range_min_max() is
region-independent, i.e. that if I take different regions of the same image,
I will always get the same max_diff ?

Yes, this way you can test settings on a small region before applying
them to a larger region.

If this is correct, does this mean the region size might determine whether
some objects (or pixels) are merged or not ?

Yes. In effect, the computational region size determines whether an
object is large or small.

This would put into question the determination of a good threshold by
testing on small regions as the same threshold might not have the same
effect in larger regions, or ?

Indeed. There are no comments in the code (apart from "TODO: better")
explaining the reason for this adjustment. In theory it makes sense to
me to favour merging of smaller regions, or more precisely, to avoid
merging of larger regions.

I completely agree with this approach.

"Small" and "large" depend on the
computational region. When testing for the previous GSoC project on
i.segment, I did not notice drastic differences when changing the
computational region.

I would imagine that differences are not dramatic, but we will test this, here. But maybe an option would be to use the number of rows and cols in the entire image as divisor, not the current region's ?

Moritz