[GeoNetwork-devel] Related document indexing eg. kml and wfs indexing

Hi list, this is more an experiment on how to index related documents
which could be referenced in a metadata records.

For example having a kml document or a related WFS services in the
distribution section, we could then try to retrieve the document (GML
or KML) and index them in the full text search criteria (ie. any) the
content of those remote document.
This will slow down the index process for sure but could be useful in some ways.

Attached a quick patch adding the feature to the index mechanism for
iso19139 records.

Any thoughts ? Any people working on that direction ?

Ciao.
Francois

(attachments)

wfs-and-kml-indexing.patch (3.77 KB)

Hi Thijs,

2009/9/25 Thijs Brentjens <lists@anonymised.com>:

Great idea. Could be very powerful! Just to get it right for me: this patch
indexes data directly (if referred to in a metadata record) and adds this
information to the metadata records to improve search results.

That the point.

Possible practical issue: for WFS, even if you're using maxFeatures (as in
the patch), still the indexes could grow quickly, so I think one does want
to use a relatively small amount of features for indexing.

True, an idea, could be also to remove all non-text fields which
sounds not really useful at first glance.

But if using just

a few features, maybe the data returned is not representative enough. So
there is some balance to find here (worth experimenting..). But still, I
think it improves matching search results to queries.

And in some cases, when data could change quickly in time, the indexes may
become outdated, possibly resulting incorrect search results.

True also, but the index is updated for a record, everytime somebody
look at it (due to popularity increase) and related documents will be
parsed again (maybe we should only update the popularity value in the
index but for the time being the full record is reindex).

But again:

this is just in very rare cases.. I think these are just minor issues;
things to find out if they really do occur. Do you have some results / demo
maybe?

Not really, just had a try with some WFS I know about.

And to enable this feature, maybe add an extra queryable as well? To search
on the data (only) or maybe disable searches on data somehow? Would that be
possible?

For that, we could create a specific field in the index; "any"
contains metadata full text info, another field to store data info.
Easy.
Maybe this field could be updated on a regular basis in a background task.

Thanks for the comments.
Francois.

best regards,
Thijs

Francois Prunayre schreef:

Hi list, this is more an experiment on how to index related documents
which could be referenced in a metadata records.

For example having a kml document or a related WFS services in the
distribution section, we could then try to retrieve the document (GML
or KML) and index them in the full text search criteria (ie. any) the
content of those remote document.
This will slow down the index process for sure but could be useful in some
ways.

Attached a quick patch adding the feature to the index mechanism for
iso19139 records.

Any thoughts ? Any people working on that direction ?

Ciao.
Francois
------------------------------------------------------------------------

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
------------------------------------------------------------------------

_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

Hi All,

Looks good Francois - with regard to the WFS, the proposal crosses over with the proposal to harvest metadata from a WFS by converting features to ISO metadata fragments which can be linked into records (ComposedMetadata proposal in the list of proposals on http://trac.osgeo.org/geonetwork/proposals). I guess by comparison the composed metadata records harvested from WFS approach is an attempt to structure the info from the WFS rather than dump it directly into the index for free text search (both are valid approaches - composing the metadata records requires more work but permits targetted searching and because it uses a GN harvester & the xlink cache indexing is still speedy).

Would also be interesting to index content from attached document resources like pdf or doc files, maybe using the apache tika content analysis toolkit too? (Apache Tika – Apache Tika)

Cheers,
Simon

Francois Prunayre wrote:

Hi Thijs,

2009/9/25 Thijs Brentjens <lists@anonymised.com>:
  

Great idea. Could be very powerful! Just to get it right for me: this patch
indexes data directly (if referred to in a metadata record) and adds this
information to the metadata records to improve search results.
    

That the point.

Possible practical issue: for WFS, even if you're using maxFeatures (as in
the patch), still the indexes could grow quickly, so I think one does want
to use a relatively small amount of features for indexing.
    

True, an idea, could be also to remove all non-text fields which
sounds not really useful at first glance.

But if using just
  

a few features, maybe the data returned is not representative enough. So
there is some balance to find here (worth experimenting..). But still, I
think it improves matching search results to queries.

And in some cases, when data could change quickly in time, the indexes may
become outdated, possibly resulting incorrect search results.
    

True also, but the index is updated for a record, everytime somebody
look at it (due to popularity increase) and related documents will be
parsed again (maybe we should only update the popularity value in the
index but for the time being the full record is reindex).

But again:
  

this is just in very rare cases.. I think these are just minor issues;
things to find out if they really do occur. Do you have some results / demo
maybe?
    

Not really, just had a try with some WFS I know about.

And to enable this feature, maybe add an extra queryable as well? To search
on the data (only) or maybe disable searches on data somehow? Would that be
possible?
    

For that, we could create a specific field in the index; "any"
contains metadata full text info, another field to store data info.
Easy.
Maybe this field could be updated on a regular basis in a background task.

Thanks for the comments.
Francois.

best regards,
Thijs

Francois Prunayre schreef:
    

Hi list, this is more an experiment on how to index related documents
which could be referenced in a metadata records.

For example having a kml document or a related WFS services in the
distribution section, we could then try to retrieve the document (GML
or KML) and index them in the full text search criteria (ie. any) the
content of those remote document.
This will slow down the index process for sure but could be useful in some
ways.

Attached a quick patch adding the feature to the index mechanism for
iso19139 records.

Any thoughts ? Any people working on that direction ?

Ciao.
Francois
------------------------------------------------------------------------

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
Best Open Source Mac Front-Ends 2024
------------------------------------------------------------------------

_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
geonetwork-devel List Signup and Options
GeoNetwork OpenSource is maintained at
GeoNetwork - Geographic Metadata Catalog download | SourceForge.net
      
------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
Best Open Source Mac Front-Ends 2024
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
geonetwork-devel List Signup and Options
GeoNetwork OpenSource is maintained at GeoNetwork - Geographic Metadata Catalog download | SourceForge.net

Apologies link is ComposedMetadataRecords – GeoNetwork opensource Developer website

________________________________________
From: Simon Pigot [Simon.Pigot@anonymised.com]
Sent: Saturday, 26 September 2009 7:56 PM
To: Francois Prunayre
Cc: Devel geonetwork-devel@lists.sourceforge.net
Subject: Re: [GeoNetwork-devel] Related document indexing eg. kml and wfs indexing

Hi All,

Looks good Francois - with regard to the WFS, the proposal crosses over
with the proposal to harvest metadata from a WFS by converting features
to ISO metadata fragments which can be linked into records
(ComposedMetadata proposal in the list of proposals on
http://trac.osgeo.org/geonetwork/proposals). I guess by comparison the
composed metadata records harvested from WFS approach is an attempt to
structure the info from the WFS rather than dump it directly into the
index for free text search (both are valid approaches - composing the
metadata records requires more work but permits targetted searching and
because it uses a GN harvester & the xlink cache indexing is still speedy).

Would also be interesting to index content from attached document
resources like pdf or doc files, maybe using the apache tika content
analysis toolkit too? (Apache Tika – Apache Tika)

Cheers,
Simon

Francois Prunayre wrote:

Hi Thijs,

2009/9/25 Thijs Brentjens <lists@anonymised.com>:

Great idea. Could be very powerful! Just to get it right for me: this patch
indexes data directly (if referred to in a metadata record) and adds this
information to the metadata records to improve search results.

That the point.

Possible practical issue: for WFS, even if you're using maxFeatures (as in
the patch), still the indexes could grow quickly, so I think one does want
to use a relatively small amount of features for indexing.

True, an idea, could be also to remove all non-text fields which
sounds not really useful at first glance.

But if using just

a few features, maybe the data returned is not representative enough. So
there is some balance to find here (worth experimenting..). But still, I
think it improves matching search results to queries.

And in some cases, when data could change quickly in time, the indexes may
become outdated, possibly resulting incorrect search results.

True also, but the index is updated for a record, everytime somebody
look at it (due to popularity increase) and related documents will be
parsed again (maybe we should only update the popularity value in the
index but for the time being the full record is reindex).

But again:

this is just in very rare cases.. I think these are just minor issues;
things to find out if they really do occur. Do you have some results / demo
maybe?

Not really, just had a try with some WFS I know about.

And to enable this feature, maybe add an extra queryable as well? To search
on the data (only) or maybe disable searches on data somehow? Would that be
possible?

For that, we could create a specific field in the index; "any"
contains metadata full text info, another field to store data info.
Easy.
Maybe this field could be updated on a regular basis in a background task.

Thanks for the comments.
Francois.

best regards,
Thijs

Francois Prunayre schreef:

Hi list, this is more an experiment on how to index related documents
which could be referenced in a metadata records.

For example having a kml document or a related WFS services in the
distribution section, we could then try to retrieve the document (GML
or KML) and index them in the full text search criteria (ie. any) the
content of those remote document.
This will slow down the index process for sure but could be useful in some
ways.

Attached a quick patch adding the feature to the index mechanism for
iso19139 records.

Any thoughts ? Any people working on that direction ?

Ciao.
Francois
------------------------------------------------------------------------

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
Best Open Source Mac Front-Ends 2024
------------------------------------------------------------------------

_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
geonetwork-devel List Signup and Options
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
Best Open Source Mac Front-Ends 2024
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
geonetwork-devel List Signup and Options
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;

_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net

GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Hello all,

Just wanted to mention that this proposal is conceptually equivalent to
http://trac.osgeo.org/geonetwork/wiki/ComponentsAndComposites. We are definitely interested in participating…

Ted

On Sep 26, 2009, at 4:08 AM, Simon.Pigot@anonymised.com wrote:

Apologies link is http://trac.osgeo.org/geonetwork/wiki/ComposedMetadataRecords


From: Simon Pigot [Simon.Pigot@anonymised.com]
Sent: Saturday, 26 September 2009 7:56 PM
To: Francois Prunayre ilto:geonetwork-devel@lists.sourceforge.net">geonetwork-devel@lists.sourceforge.net
Subject: Re: [GeoNetwork-devel] Related document indexing eg. kml and wfs indexing

Hi All,

Looks good Francois - with regard to the WFS, the proposal crosses over
with the proposal to harvest metadata from a WFS by converting features
to ISO metadata fragments which can be linked into records
(ComposedMetadata proposal in the list of proposals on
http://trac.osgeo.org/geonetwork/proposals)). I guess by comparison the
composed metadata records harvested from WFS approach is an attempt to
structure the info from the WFS rather than dump it directly into the
index for free text search (both are valid approaches - composing the
metadata records requires more work but permits targetted searching and
because it uses a GN harvester & the xlink cache indexing is still speedy).

Would also be in te from attached document
resources like pdf or doc files, maybe using the apache tika content
analysis toolkit too? (http://lucene.apache.org/tika/)

Cheers,
Simon

Francois Prunayre wrote:

Hi Thijs,

2009/9/25 Thijs Brentjens <lists@anonymised.com>:

Great idea. Could be very powerful! Just to get it right for me: this patch

indexes data directly (if referred to in a metadata record) and adds this

information to the metadata records to improve search results.

That the point.

Possible practical issue: for WFS, even if you’re using maxFeatures (as in

the patch), still the indexes could grow quickly, so I think one does want

to use a relatively small amount of features for indexing.

True, an idea, could be also to remove all non-text fields which

sounds not really useful at first glance.

But if using just

a few features, maybe the data returned is not representative enough. So

there is some balance to find here (worth experimenting…). But still, I

think it improves matching search results to queries.

And in some cases, when data could change quickly in time, the indexes may

become outdated, possibly resulting incorrect search results.

True also, but the index is updated for a record, everytime somebody

parsed again (maybe we should only update the popularity value in the

index but for the time being the full record is reindex).

But again:

this is just in very rare cases… I think these are just minor issues;

things to find out if they really do occur. Do you have some results / demo

maybe?

Not really, just had a try with some WFS I know about.

< br type=“cite”>

And to enable this feature, maybe add an extra queryable as well? To search

on the data (only) or maybe disable searches on data somehow? Would that be

possible?

For that, we could create a specific field in the index; “any”

contains metadata full text info, another field to store data info.

Easy.

Maybe this field could be updated on a regular basis in a background task.

Thanks for the co mm ockquote type=“cite”>Francois.

best regards,

Thijs

Francois Prunayre schreef:

Hi list, this is more an experiment on how to index related documents

which could be referenced in a metadata records.

For example having a kml document or a related WFS services in the

distribution section, we could then try to retrieve the document (GML

or KML) and index them in the full text search criteria (ie. any) the

content of those remote document.

This will slow down the index process for sure but could be useful in some

ways.

< /b e=“cite”>

Attached a quick patch adding the feature to the index mechanism for

iso19139 records.

Any thoughts ? Any people working on that direction ?

Ciao.



Come build with us! The BlackBerry® Developer Conference in SF, CA

is the only developer event you need to attend this year. Jumpstart your

developing skills, take BlackBerry mobile applications to market and stay

ahead of the curve. Join us from November 9-12, 2009. Register now!

http://p.sf.net/sfu/devconf



GeoNetwork-devel mailing list

GeoNetwork-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/geonetwork-devel

GeoNetwork OpenSource is maintained at

http://sourceforge.net/projects/geonetwork


Come build with us! The BlackBerry® Developer Conference in SF, CA

is the only developer event you need to attend this year. Jumpstart your

developing skills, take BlackBerry mobile applications to market and stay

ahead of the curve. Join us from November 9-12, 2009. Register now!

http://p.sf.net/sfu/devconf


GeoNetwork-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/geonetwork-devel

GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork


Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register n ow p://p.sf.net/sfu/devconf">http://p.sf.net/sfu/devconf


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork


Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintai ne et/projects/geonetwork

==== Ted Habermann ===========================
Enterprise Data Systems Group Leader
NOAA, National Geophysical Data Center
V: 303.497.6472 F: 303.497.6513
“If you want to go quickly, go alone.
If you want to go far, go together”
Old Proverb
==== Ted.Habermann@anonymised.com ==================

(attachments)

Ted Habermann.vcf (390 Bytes)

Hi Ted,

Thanks - I think the proposal is at pains to point out that the idea of composed (or componentized or composite or even 'relational') metadata is not new :slight_smile: (is there anything new 'under the sun'?). I think we even discussed it on this group sometime ago using terms like 'woolly' to cover the vague ideas we were throwing around at the time. I'll add a link to ComponentsAndComposites proposal so this conceptual equivalence is clear and thanks for the memory jog!

There is an implementation of the composed metadata proposal for the case where the composed metadata records are created from fragments harvested out of a WFS and we're working on moving our THREDDS metadata harvester to use the same fragment based idea. Both are (and will continue for the time being) in the BlueNetMEST sandbox.

For the WFS harvesting we have been using deegree WFS as it has nice support for modelling database table relationships and server-side XML transformations using XSLT. I'd like to add a test case using GeoServer WFS which makes sense as GeoServer comes with GeoNetwork.

Very much like to develop our cooperation over this and other ideas we have discussed elsewhere.

Cheers,
Simon

Ted Habermann wrote:

Hello all,

Just wanted to mention that this proposal is conceptually equivalent to
ComponentsAndComposites – GeoNetwork opensource Developer website. We are definitely interested in participating...

Ted

On Sep 26, 2009, at 4:08 AM, Simon.Pigot@anonymised.com <mailto:Simon.Pigot@anonymised.com> wrote:

Apologies link is ComposedMetadataRecords – GeoNetwork opensource Developer website

________________________________________
From: Simon Pigot [Simon.Pigot@anonymised.com <mailto:Simon.Pigot@anonymised.com>]
Sent: Saturday, 26 September 2009 7:56 PM
To: Francois! Prunayre ilto:geonetwork-devel@lists.sourceforge.net">geonetwork-devel@lists.sourceforge.net
Subject: Re: [GeoNetwork-devel] Related document indexing eg. kml and wfs indexing

Hi All,

Looks good Francois - with regard to the WFS, the proposal crosses over
with the proposal to harvest metadata from a WFS by converting features
to ISO metadata fragments which can be linked into records
(ComposedMetadata proposal in the list of proposals on
http://trac.osgeo.org/geonetwork/proposals) <http://trac.osgeo.org/geonetwork/proposals)&gt;\. I guess by comparison the
composed metadata records harvested from WFS approach is an attempt to
structure the info from the WFS rather than dump it directly into the
index for free text search (both are valid approaches - composing the
metadata records requires more work but permits targetted searching and
because it uses a GN harvester & the xlink cache indexing is still speedy).

Would als! o be inte from attached document
resources like pdf or doc files, maybe using the apache tika content
analysis toolkit too? (Apache Tika – Apache Tika)

Cheers,
Simon

Francois Prunayre wrote:

Hi Thijs,

2009/9/25 Thijs Brentjens <lists@anonymised.com <mailto:lists@anonymised.com>>:

Great idea. Could be very powerful! Just to get it right for me: this patch
indexes data directly (if referred to in a metadata record) and adds this
information to the metadata records to improve search results.

That the point.

Possible practical issue: for WFS, even if you're using maxFeatures (as in
the patch), still the indexes could grow quickly, so I think one does want
to use a relatively small amount of features for indexing.

True, an idea, could be also to remove all non-text fields which
sounds not really useful at first glance.

But if using just
a few features, maybe the data returned is not representative enough. So

there is some balance to find here (worth experimenting..). But still, I
think it improves matching search results to queries.

And in some cases, when data could change quickly in time, the indexes may
become outdated, possibly resulting incorrect search results.

True also, but the index is updated for a record, everytime somebody

parsed again (maybe we should only update the popularity value in the
index but for the time being the full record is reindex).

But again:

this is just in very rare cases.. I think these are just minor issues;
things to find out if they really do occur. Do you have some results / demo
maybe?

Not really, just had a try with some WFS I know about.

And to enable this feature, maybe add an extra queryable as well? To search
on the data (only) or maybe disable searches on data somehow? Would that be
possible?

For that, we could create a specific field in the index; "any"
contains metadata full text info, another field to store data info.
Easy.
Maybe this field could be updated on a regular basis in a background task.

Thanks for! the comm ockquote type="cite">Francois.

best regards,
Thijs

Francois Prunayre schreef:

Hi list, this is more an experiment on how to index related documents
which could be referenced in a metadata records.

For example having a kml document or a related WFS services in the
distribution section, we could then try to retrieve the document (GML
or KML) and index them in the full text search criteria (ie. any) the
content of those remote document.
This will slow down the index process for sure but could be useful in some
ways.

Attached a quick patch adding the feature to the index mechanism for
iso19139 records.

Any thoughts ? Any people working on that direction ?

Ciao.
------------------------------------------------------------------------

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your

developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
Best Open Source Mac Front-Ends 2024
------------------------------------------------------------------------

_______________________________________________

GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net <mailto:GeoNetwork-devel@lists.sourceforge.net>
geonetwork-devel List Signup and Options
GeoNetwork OpenSource is maintained at
GeoNetwork - Geographic Metadata Catalog download | SourceForge.net

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
Best Open Source Mac Front-Ends 2024
_______________________________________________

GeoNetwork-devel@lists.sourceforge.net <mailto:GeoNetwork-devel@lists.sourceforge.net>
geonetwork-devel List Signup and Options
GeoNetwork OpenSource is maintained at GeoNetwork - Geographic Metadata Catalog download | SourceForge.net

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9&#45;12, 2009. Reg! ister now p://p.sf.net/sfu/devconf">Best Open Source Mac Front-Ends 2024
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
geonetwork-devel List Signup and Options
GeoNetwork OpenSource is maintained at GeoNetwork - Geographic Metadata Catalog download | SourceForge.net

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
Best Open Source Mac Front-Ends 2024
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
geonetwork-devel List Signup and Options
GeoNetwork OpenSource is ! maintaine et/projects/geonetwork

==== Ted Habermann ===========================
   Enterprise Data Systems Group Leader
   NOAA, National Geophysical Data Center
   V: 303.497.6472 F: 303.497.6513
   "If you want to go quickly, go alone.
   If you want to go far, go together"
   Old Proverb
==== Ted.Habermann@anonymised.com <mailto:Ted.Habermann@anonymised.com> ==================
------------------------------------------------------------------------

------------------------------------------------------------------------

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
Best Open Source Mac Front-Ends 2024
------------------------------------------------------------------------

_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
geonetwork-devel List Signup and Options
GeoNetwork OpenSource is maintained at GeoNetwork - Geographic Metadata Catalog download | SourceForge.net

Simon,

it has been awhile since we worked on this (too many things going on). I believe that we were working on the Citation > Contact > OnlineResurce chain of fragments. These three are all fragments (components) in the current tool we are using so it made sense to us. We are also very interested in DS_Series as a root for ISO metadata which makes MD/MI_Metadata a fragment. I think this is actually relevant to the WMS problem where a service with several layers may end up being represented as some sort of aggregate…

One thing that you point out in your proposal is the xlink resolver. This is going to be a critical and interesting bit. Also, how (if) the other elements of the xlink standard might be used. For example, we were thinking of xlink:role as a role for a CI_ResponsibleParty that was represented by the fragment. I be is being done in SML, but am not sure it is really copacetic.

Ted

On Sep 27, 2009, at 8:01 AM, Simon Pigot wrote:

Hi Ted,

Thanks - I think the proposal is at pains to point out that the idea of composed (or componentized or composite or even ‘relational’) metadata is not new :slight_smile: (is there anything new ‘under the sun’?). I think we even discussed it on this group sometime ago using terms like ‘woolly’ to cover the vague ideas we were throwing around at the time. I’ll add a link to ComponentsAndComposites proposal so this conceptual equivalence is clear and thanks for the memory jog!

There is an implementation of the composed metadata proposal for the case where the composed metadata records are created from fragments harvested out of a WFS and we’re working on moving our THREDDS metadata harvester to use the same fragment based idea. Both are (and w il eing) in the BlueNetMEST sandbox.

For the WFS harvesting we have been using deegree WFS as it has nice support for modelling database table relationships and server-side XML transformations using XSLT. I’d like to add a test case using GeoServer WFS which makes sense as GeoServer comes with GeoNetwork.

Very much like to develop our cooperation over this and other ideas we have discussed elsewhere.

Cheers,
Simon

Ted Habermann wrote:

Hello all,

Just wanted to mention that this proposal is conceptually equivalent to

http://trac.osgeo.org/geonetwork/wiki/ComponentsAndComposites. We are definitely interested in participating…

Ted

Subject: Re: [GeoNetwork-devel] Related document indexing eg. kml and wfs indexing

Hi All,

Looks good Francois - with regard to the WFS, the proposal crosses over

with the proposal to harvest metadata from a WFS by converting features

(ComposedMetadata proposal in the list of proposals on

http://trac.osgeo.org/geonetwork/proposals) <http://trac.osgeo.org/geonetwork/proposals%29>. I guess by comparison the

composed metadata records harvested from WFS approach is an attempt to

structure the info from the WFS rather than dump it directly into the

index for free text search (both are valid approaches - composing the

metadata records requires more work bu t ng and

because it uses a GN harvester & the xlink cache indexing is still speedy).

Would als! o be inte from attached document

resources like pdf or doc files, maybe using the apache tika content

analysis toolkit too? (http://lucene.apache.org/tika/)

Cheers,

Simon

Francois Prunayre wrote:

Hi Thijs,

2009/9/25 Thijs Brentjens <lists@anonymised.com <mailto:lists@anonymised.com>>:

Great idea. Could be very powerful! Just to get it right for me: this patch

indexes data directly (if referred to in a metadata record) and adds this

information to the metadata records to improve search results.

That the point.

Possible practical issue: for WFS, even if you’re using maxFeatures (as in

to use a relatively small amount of features for indexing.

True, an idea, could be also to remove all non-text fields which

sounds not really useful at first glance.

But if using just

a few features, maybe the data returned is not representative enough. So

there is some balance to find here (worth experimenting…). But still, I

think it improves matching search results to queries.

And in some cases, when data could change quickly in time, the indexes may

become outdated, possibly resulting incorrect search results.

True also, but t record, everytime somebody

parsed again (maybe we should only update the popularity value in the

index but for the time being the full record is reindex).

But again:

this is just in very rare cases… I think these are just minor issues;

things to find out if they really do occur. Do you have some results / demo

maybe?

Not really, just had a try with some WFS I know about.

And to enable this feature, maybe add an extra queryable as well? To search

on the data (only) or maybe disable searches on data somehow? Would that be

p

For that, we could create a specific field in the index; “any”

contains metadata full text info, another field to store data info.

Easy.

Maybe this field could be updated on a regular basis in a background task.

Thanks for! the comm ockquote type=“cite”>Francois.

< ote>

best regards,

Thijs

Francois Prunayre schreef:

Hi list, this is more an experiment on how to index related documents

which could be referenced in a metadata records.

For example having a kml document or a related WFS services in the

distribution section, we could then try to retrieve the document (GML

content of those remote document.

This will slow down the index process for sure but could be useful in some

ways.

Attached a quick patch adding the feature to the index mechanism for

iso19139 records.

Any thoughts ? Any people working on that direction ?

Ciao.



Come build with us! The BlackBerry® Developer Conference in SF, CA

developing skills, take BlackBerry mobile applications to market and stay

ahead of the curve. Join us from November 9 er now!

http://p.sf.net/sfu/devconf



GeoNetwork-devel mailing list

GeoNetwork-devel@lists.sourceforge.net <mailto:GeoNetwork-devel@lists.sourceforge.net>

GeoNetwork OpenSource is maintained at

< /b blockquote>

http://sourceforge.net/projects/geonetwork

------------------------------------------------------------ – ckquote>

Come build with us! The BlackBerry® Developer Conference in SF, CA

is the only developer event you need to attend this year. Jumpstart your

developing skills, take BlackBerry mobile applications to market and stay

ahead of the curve. Join us from November 9-12, 2009. Register now!

htt p: >


GeoNetwork-devel@lists.sourceforge.net <mailto:GeoNetwork-devel@lists.sourceforge.net>

https://lists.sourceforge.net/lists/listinfo/geonetwork-devel

GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork


Come build with us! The BlackBerry® Developer Conference in SF, CA

is the only developer event you need to attend this year. Jumpstart your

developing skills, take BlackBerry mobile applications to market and stay

ahead of the curve. Join us from November 9-12, 2009. Reg! ister now p://p.sf.net/sfu/devconf">http://p.sf.net/sfu/devconf


GeoNetwork-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/geonetwork-devel

GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

------------------------------------------------------------------------------

Come build with us! The BlackBerry® Developer Conference in SF, CA

is the only developer event you need to attend this year. Jumpstart your

developing skills, take BlackBerry mobile applications to market and stay

ahead of the curve. Join us from November 9-12, 2009. Register now!

http://p.sf.net/sfu/devconf

_______________________________________________

blockquote type=“cite”>

GeoNetwork-devel mailing list

GeoNetwork-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/geonetwork-devel

GeoNetwork OpenSource is ! maintaine et/projects/geonetwork

==== Ted Habermann ===========================

Enterprise Data Systems Group Leader

NOAA, National Geophysical Data Center

V: 303.497.6472 F: 303.497.6513

"If you want to go quickly, go alone.

If you want to go far, go together"

Old Proverb

==== Ted.Habermann@anonymised.com <mailto:Ted.Habermann@anonymised.com> ==================

------------------------------------------------------------------------


Come build with us! The BlackBerry® Developer Conference in SF, CA

is the only developer event you need to attend this year. Jumpstart your

developing skills, take Bla ck to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now!

http://p.sf.net/sfu/devconf

------------------------------------------------------------------------

_______________________________________________

GeoNetwork-devel mailing list

GeoNetwork-devel@lists.sourceforge.net

[https://lists.sourceforge.net/lists/listinfo/geonetwork-devel](https://li
st
istinfo/geonetwork-devel)

GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

==== Ted Habermann ===========================
Enterprise Data Systems Group Leader
NOAA, National Geophysical Data Center
V: 303.497.6472 F: 303.497.6513
"If you want to go quickly, go alone.
If you want to go far, go toget he ld Proverb
==== Ted.Habermann@anonymised.com ==================

(attachments)

Ted Habermann.vcf (390 Bytes)

Simon,

it has been awhile since we worked on this (too many things going on). I believe that we were working on the Citation > Contact > OnlineResurce chain of fragments. These three are all fragments (components) in the current tool we are using so it made sense to us. We are also very interested in DS_Series as a root for ISO metadata which makes MD/MI_Metadata a fragment. I think this is actually relevant to the WMS problem where a service with several layers may end up being represented as some sort of aggregate…

One thing that you point out in your proposal is the xlink resolver. This is going to be a critical and interesting bit. Also, how (if) the other elements of the xlink standard might be used. For example, we were thinking of xlink:role as a role for a CI_ResponsibleParty that was represented by the fragment. I be is being done in SML, but am not sure it is really copacetic.

Ted

On Sep 27, 2009, at 8:01 AM, Simon Pigot wrote:

Hi Ted,

Thanks - I think the proposal is at pains to point out that the idea of composed (or componentized or composite or even ‘relational’) metadata is not new :slight_smile: (is there anything new ‘under the sun’?). I think we even discussed it on this group sometime ago using terms like ‘woolly’ to cover the vague ideas we were throwing around at the time. I’ll add a link to ComponentsAndComposites proposal so this conceptual equivalence is clear and thanks for the memory jog!

There is an implementation of the composed metadata proposal for the case where the composed metadata records are created from fragments harvested out of a WFS and we’re working on moving our THREDDS metadata harvester to use the same fragment based idea. Both are (a time being) in the BlueNetMEST sandbox.

For the WFS harvesting we have been using deegree WFS as it has nice support for modelling database table relationships and server-side XML transformations using XSLT. I’d like to add a test case using GeoServer WFS which makes sense as GeoServer comes with GeoNetwork.

Very much like to develop our cooperation over this and other ideas we have discussed elsewhere.

Cheers,
Simon

==== Ted Habermann ===========================

Enterprise Data Systems Group Leader
NOAA, National Geophysical Data Center
V: 303.497.6472 F: 303.497.6513
"If you want to go quickly, go alone.
If you want to go far, go t og sp;Old Proverb
==== Ted.Habermann@anonymised.com ==================

(attachments)

Ted Habermann.vcf (390 Bytes)