[GRASS-dev] v.kernel: should the module take into account the resolution for default output ?

Hi,

AFAIU kernel density calculations, one takes a number of points and redistributes this total number across the entire region using a specified kernel function as estimator as to the spatial pattern of this redistribution. The total sum should correspond to the total number of points in the input. Is this understanding correct ?

In v.kernel, this seems to be dependent on the resolution:

echo "4.5,4.5" | v.in.ascii in=- sep=comma out=testpoint

g.region n=9 s=0 w=0 e=9 res=1
v.kernel in=testpoint out=testrast radius=5 kernel=gaussian --o
r.univar testrast
[...]
sum: 0.999544365566944

but
g.region n=9 s=0 w=0 e=9 res=2
v.kernel in=testpoint out=testrast radius=5 kernel=gaussian --o
r.univar testrast
[...]
sum: 0.308567902849234

IMHO, the sum should always be close to 1, or ?

If we consider that the value per pixel is in points/squared map unit, i.e. in a meter-projection, points/m2, then we would have to multiply the output value, which can be done with the multiplier parameter. In this case, a resolution of 2, means we have 4m2, so:

v.kernel in=testpoint out=testrast radius=5 kernel=gaussian multiplier=4 --o
r.univar testrast
[...]
sum: 1.23427161139693

So this is close, but not very precise.

So, do I understand correctly that the density calculated is by squared map unit ? Would it make more sense to make the output by pixel ? If not, and so if the multiplier option is the way to go, this should probably go into the manual, which currently sounds as if this parameter is only for the network version of v.kernel:

"The multiplier option is needed to overcome the limitation that the resulting density in case of a vector map output is stored as category (integer). The density result stored as category may be multiplied by this number. "

Moritz

On Mon, May 28, 2018 at 6:16 PM, Moritz Lennert <mlennert@club.worldonline.be> wrote:

Hi,

AFAIU kernel density calculations, one takes a number of points and redistributes this total number across the entire region using a specified kernel function as estimator as to the spatial pattern of this redistribution. The total sum should correspond to the total number of points in the input. Is this understanding correct ?

In v.kernel, this seems to be dependent on the resolution:

echo “4.5,4.5” | v.in.ascii in=- sep=comma out=testpoint

g.region n=9 s=0 w=0 e=9 res=1
v.kernel in=testpoint out=testrast radius=5 kernel=gaussian --o
r.univar testrast
[…]
sum: 0.999544365566944

but
g.region n=9 s=0 w=0 e=9 res=2
v.kernel in=testpoint out=testrast radius=5 kernel=gaussian --o
r.univar testrast
[…]
sum: 0.308567902849234

IMHO, the sum should always be close to 1, or ?

I think not, because the Gaussian kernel is a general Gaussian function with user-defined sigma = dmax / 4 [0]. The sum would be close to 1 only for a normal function (special case of the Gaussian function) with sigma determined from the observed distances. For the Gaussian kernel, the sum of the output raster should increase with higher resolution and constant sigma.

Markus M

[0] https://trac.osgeo.org/grass/browser/grass/trunk/vector/v.kernel/main.c#L434

If we consider that the value per pixel is in points/squared map unit, i.e. in a meter-projection, points/m2, then we would have to multiply the output value, which can be done with the multiplier parameter. In this case, a resolution of 2, means we have 4m2, so:

v.kernel in=testpoint out=testrast radius=5 kernel=gaussian multiplier=4 --o
r.univar testrast
[…]
sum: 1.23427161139693

So this is close, but not very precise.

So, do I understand correctly that the density calculated is by squared map unit ? Would it make more sense to make the output by pixel ? If not, and so if the multiplier option is the way to go, this should probably go into the manual, which currently sounds as if this parameter is only for the network version of v.kernel:

"The multiplier option is needed to overcome the limitation that the resulting density in case of a vector map output is stored as category (integer). The density result stored as category may be multiplied by this number. "

Moritz


grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

Le Tue, 29 May 2018 08:59:47 +0200,
Markus Metz <markus.metz.giswork@gmail.com> a écrit :

On Mon, May 28, 2018 at 6:16 PM, Moritz Lennert <
mlennert@club.worldonline.be> wrote:
>
> Hi,
>
> AFAIU kernel density calculations, one takes a number of points
> and
redistributes this total number across the entire region using a
specified kernel function as estimator as to the spatial pattern of
this redistribution. The total sum should correspond to the total
number of points in the input. Is this understanding correct ?
>
> In v.kernel, this seems to be dependent on the resolution:
>
> echo "4.5,4.5" | v.in.ascii in=- sep=comma out=testpoint
>
> g.region n=9 s=0 w=0 e=9 res=1
> v.kernel in=testpoint out=testrast radius=5 kernel=gaussian --o
> r.univar testrast
> [...]
> sum: 0.999544365566944
>
>
> but
> g.region n=9 s=0 w=0 e=9 res=2
> v.kernel in=testpoint out=testrast radius=5 kernel=gaussian --o
> r.univar testrast
> [...]
> sum: 0.308567902849234
>
> IMHO, the sum should always be close to 1, or ?

I think not, because the Gaussian kernel is a general Gaussian
function with user-defined sigma = dmax / 4 [0]. The sum would be
close to 1 only for a normal function (special case of the Gaussian
function) with sigma determined from the observed distances. For the
Gaussian kernel, the sum of the output raster should increase with
higher resolution and constant sigma.

Thanks. I'll have to think about this a bit more when I have the time
to fully understand. I imagine there should be a way to calculate the
multiplier necessary to reach the correct sum using some
combination region extent, radius and resolution... But no time for
such thinking right now.

BTW, in my previous mail I wrote:

g.region n=9 s=0 w=0 e=9 res=2

[...]

v.kernel in=testpoint out=testrast radius=5 kernel=gaussian
multiplier=4 --o
r.univar testrast
[...]
sum: 1.23427161139693

So this is close, but not very precise.

This was wrong as the g.region call leads to a region with res=1.8 and
so the v.kernel call should have used multiplier=3.24 which does lead
to a sum of 0.99.

But when I reduce or increase the radius this doesn't work anymore.
Don't know if this is linked in any way to the fact that extent of 9
divided by radius of 5 equals 1.8 which is exactly the resolution, or
whether this is pure chance... :wink:

Moritz

On Tue, May 29, 2018 at 2:21 PM, Moritz Lennert <mlennert@club.worldonline.be> wrote:

Le Tue, 29 May 2018 08:59:47 +0200,
Markus Metz <markus.metz.giswork@gmail.com> a écrit :

On Mon, May 28, 2018 at 6:16 PM, Moritz Lennert <
mlennert@club.worldonline.be> wrote:

Hi,

AFAIU kernel density calculations, one takes a number of points
and
redistributes this total number across the entire region using a
specified kernel function as estimator as to the spatial pattern of
this redistribution. The total sum should correspond to the total
number of points in the input. Is this understanding correct ?

In v.kernel, this seems to be dependent on the resolution:

echo “4.5,4.5” | v.in.ascii in=- sep=comma out=testpoint

g.region n=9 s=0 w=0 e=9 res=1
v.kernel in=testpoint out=testrast radius=5 kernel=gaussian --o
r.univar testrast
[…]
sum: 0.999544365566944

but
g.region n=9 s=0 w=0 e=9 res=2
v.kernel in=testpoint out=testrast radius=5 kernel=gaussian --o
r.univar testrast
[…]
sum: 0.308567902849234

IMHO, the sum should always be close to 1, or ?

I think not, because the Gaussian kernel is a general Gaussian
function with user-defined sigma = dmax / 4 [0]. The sum would be
close to 1 only for a normal function (special case of the Gaussian
function) with sigma determined from the observed distances. For the
Gaussian kernel, the sum of the output raster should increase with
higher resolution and constant sigma.

Thanks. I’ll have to think about this a bit more when I have the time
to fully understand. I imagine there should be a way to calculate the

multiplier necessary to reach the correct sum

I don’t think there is something like “the correct sum”.

v.kernel provides an estimate for each output cell, how evenly the input points are distributed around this cell. The definition of “evenly” depends on the kernel function and bandwidth.

Markus M

using some
combination region extent, radius and resolution… But no time for
such thinking right now.

BTW, in my previous mail I wrote:

g.region n=9 s=0 w=0 e=9 res=2
[…]
v.kernel in=testpoint out=testrast radius=5 kernel=gaussian
multiplier=4 --o
r.univar testrast
[…]
sum: 1.23427161139693

So this is close, but not very precise.

This was wrong as the g.region call leads to a region with res=1.8 and
so the v.kernel call should have used multiplier=3.24 which does lead
to a sum of 0.99.

But when I reduce or increase the radius this doesn’t work anymore.
Don’t know if this is linked in any way to the fact that extent of 9
divided by radius of 5 equals 1.8 which is exactly the resolution, or
whether this is pure chance… :wink:

Moritz

Le Tue, 29 May 2018 15:19:39 +0200,
Markus Metz <markus.metz.giswork@gmail.com> a écrit :

On Tue, May 29, 2018 at 2:21 PM, Moritz Lennert <
mlennert@club.worldonline.be> wrote:
>
> Le Tue, 29 May 2018 08:59:47 +0200,
> Markus Metz <markus.metz.giswork@gmail.com> a écrit :
>
> > On Mon, May 28, 2018 at 6:16 PM, Moritz Lennert <
> > mlennert@club.worldonline.be> wrote:
> > >
> > > Hi,
> > >
> > > AFAIU kernel density calculations, one takes a number of points
> > > and
> > redistributes this total number across the entire region using a
> > specified kernel function as estimator as to the spatial pattern
> > of this redistribution. The total sum should correspond to the
> > total number of points in the input. Is this understanding
> > correct ?
> > >
> > > In v.kernel, this seems to be dependent on the resolution:
> > >
> > > echo "4.5,4.5" | v.in.ascii in=- sep=comma out=testpoint
> > >
> > > g.region n=9 s=0 w=0 e=9 res=1
> > > v.kernel in=testpoint out=testrast radius=5 kernel=gaussian --o
> > > r.univar testrast
> > > [...]
> > > sum: 0.999544365566944
> > >
> > >
> > > but
> > > g.region n=9 s=0 w=0 e=9 res=2
> > > v.kernel in=testpoint out=testrast radius=5 kernel=gaussian --o
> > > r.univar testrastr s
> > > [...]
> > > sum: 0.308567902849234
> > >
> > > IMHO, the sum should always be close to 1, or ?
> >
> > I think not, because the Gaussian kernel is a general Gaussian
> > function with user-defined sigma = dmax / 4 [0]. The sum would be
> > close to 1 only for a normal function (special case of the
> > Gaussian function) with sigma determined from the observed
> > distances. For the Gaussian kernel, the sum of the output raster
> > should increase with higher resolution and constant sigma.
> >
>
> Thanks. I'll have to think about this a bit more when I have the
> time to fully understand. I imagine there should be a way to
> calculate the multiplier necessary to reach the correct sum

I don't think there is something like "the correct sum".

You are right. Correct is not the right term. However:

v.kernel provides an estimate for each output cell, how evenly the
input points are distributed around this cell. The definition of
"evenly" depends on the kernel function and bandwidth.

I do not have the exact same understanding of kernel estimation: For me
it distributes an existing population (defined in GRASS as one per
point - others provide option to use a population attribute) across a
space. How it is distributed depends on the kernel function and
bandwidth. The idea being that for a series of observations in space
the exact location of that observation is somewhat random and that you
can thus distribute the actual observations across space.

If you only have a sample of observations, then you can use a
multiplier to estimate total observations as long as your sample is
well distributed. But if you have the total population, then you
distribute this population in space and the sum across this space
should correspond to your population.

Here's how they explain it in the ArcGIS manual (hardly a reference I
know, but still):

"Conceptually, a smoothly curved surface is fitted over each point. The
surface value is highest at the location of the point and diminishes
with increasing distance from the point, reaching zero at the Search
radius distance from the point. Only a circular neighborhood is
possible. The volume under the surface equals the Population field
value for the point, or 1 if NONE is specified. The density at each
output raster cell is calculated by adding the values of all the kernel
surfaces where they overlay the raster cell center."[1]

Often texts start by saying that the simplest form of the density
estimation is the histogram. Kernel functions then allow to make this
smoother, but IMU the surface beneath the curve should stay the same.

See also the answer by whuber at [2] using the sand analogy. And the
reference mentioned in the next post [3], where it is said

"In Figure 4‑43, for each point (7,8,9,12 and 14) we have provided a
Normal distribution curve with central value (the mean) at the point in
question and with an average spread (standard deviation) of 2 units. We
can then add the areas under each of these curves together to obtain a
(cumulative) curve with two peaks, and then divide this curve by 5 to
adjust the area under the curve back to 1 giving the lower red curve
shown. When adjusted in this way the values are often described as
probability densities, and when extended to two dimensions, the
resulting surface is described as a probability density surface, rather
than a density surface. We now have a density value for every position
along the original line, with smooth transitions between the values,
which is exactly what we were trying to achieve."

I guess the question is whether we are speaking about normalized or not
normalized density ?

Moritz

[1]
https://pro.arcgis.com/en/pro-app/tool-reference/spatial-analyst/how-kernel-density-works.htm#GUID-3BCBF5CA-CAC7-4547-A3CF-B5E30FDE584E
[2]
https://gis.stackexchange.com/questions/1553/how-to-interpret-grass-v-kernel-results
[3]
http://www.spatialanalysisonline.com/HTML/?density__kernels_and_occupancy.htm

On Tue, May 29, 2018 at 3:48 PM, Moritz Lennert <mlennert@club.worldonline.be> wrote:

Le Tue, 29 May 2018 15:19:39 +0200,
Markus Metz <markus.metz.giswork@gmail.com> a écrit :

On Tue, May 29, 2018 at 2:21 PM, Moritz Lennert <
mlennert@club.worldonline.be> wrote:

Le Tue, 29 May 2018 08:59:47 +0200,
Markus Metz <markus.metz.giswork@gmail.com> a écrit :

On Mon, May 28, 2018 at 6:16 PM, Moritz Lennert <
mlennert@club.worldonline.be> wrote:

Hi,

AFAIU kernel density calculations, one takes a number of points
and
redistributes this total number across the entire region using a
specified kernel function as estimator as to the spatial pattern
of this redistribution. The total sum should correspond to the
total number of points in the input. Is this understanding
correct ?

In v.kernel, this seems to be dependent on the resolution:

echo “4.5,4.5” | v.in.ascii in=- sep=comma out=testpoint

g.region n=9 s=0 w=0 e=9 res=1
v.kernel in=testpoint out=testrast radius=5 kernel=gaussian --o
r.univar testrast
[…]
sum: 0.999544365566944

but
g.region n=9 s=0 w=0 e=9 res=2
v.kernel in=testpoint out=testrast radius=5 kernel=gaussian --o
r.univar testrastr s
[…]
sum: 0.308567902849234

IMHO, the sum should always be close to 1, or ?

I think not, because the Gaussian kernel is a general Gaussian
function with user-defined sigma = dmax / 4 [0]. The sum would be
close to 1 only for a normal function (special case of the
Gaussian function) with sigma determined from the observed
distances. For the Gaussian kernel, the sum of the output raster
should increase with higher resolution and constant sigma.

Thanks. I’ll have to think about this a bit more when I have the
time to fully understand. I imagine there should be a way to
calculate the multiplier necessary to reach the correct sum

I don’t think there is something like “the correct sum”.

You are right. Correct is not the right term. However:

v.kernel provides an estimate for each output cell, how evenly the
input points are distributed around this cell. The definition of
“evenly” depends on the kernel function and bandwidth.

I do not have the exact same understanding of kernel estimation: For me
it distributes an existing population (defined in GRASS as one per
point - others provide option to use a population attribute) across a
space. How it is distributed depends on the kernel function and
bandwidth. The idea being that for a series of observations in space
the exact location of that observation is somewhat random and that you
can thus distribute the actual observations across space.

If you only have a sample of observations, then you can use a
multiplier to estimate total observations as long as your sample is
well distributed. But if you have the total population, then you
distribute this population in space and the sum across this space
should correspond to your population.

Here’s how they explain it in the ArcGIS manual (hardly a reference I
know, but still):

“Conceptually, a smoothly curved surface is fitted over each point. The
surface value is highest at the location of the point and diminishes
with increasing distance from the point, reaching zero at the Search
radius distance from the point. Only a circular neighborhood is
possible. The volume under the surface equals the Population field
value for the point, or 1 if NONE is specified. The density at each
output raster cell is calculated by adding the values of all the kernel
surfaces where they overlay the raster cell center.”[1]

Often texts start by saying that the simplest form of the density
estimation is the histogram. Kernel functions then allow to make this
smoother, but IMU the surface beneath the curve should stay the same.

See also the answer by whuber at [2] using the sand analogy. And the
reference mentioned in the next post [3], where it is said

“In Figure 4‑43, for each point (7,8,9,12 and 14) we have provided a
Normal distribution curve with central value (the mean) at the point in
question and with an average spread (standard deviation) of 2 units. We
can then add the areas under each of these curves together to obtain a
(cumulative) curve with two peaks, and then divide this curve by 5 to
adjust the area under the curve back to 1 giving the lower red curve
shown. When adjusted in this way the values are often described as
probability densities, and when extended to two dimensions, the
resulting surface is described as a probability density surface, rather
than a density surface. We now have a density value for every position
along the original line, with smooth transitions between the values,
which is exactly what we were trying to achieve.”

I guess the question is whether we are speaking about normalized or not
normalized density ?

When you have a curve or a curved surface, you can normalize to 1. However, the output raster is a sample of the curved surface at discrete locations (cell centers). Therefore the sum should increase with higher resolutions, keeping everything else constant.

Markus M

Moritz

[1]
https://pro.arcgis.com/en/pro-app/tool-reference/spatial-analyst/how-kernel-density-works.htm#GUID-3BCBF5CA-CAC7-4547-A3CF-B5E30FDE584E
[2]
https://gis.stackexchange.com/questions/1553/how-to-interpret-grass-v-kernel-results
[3]
http://www.spatialanalysisonline.com/HTML/?density__kernels_and_occupancy.htm