Apologies link is ComposedMetadataRecords – GeoNetwork opensource Developer website
________________________________________
From: Simon Pigot [Simon.Pigot@anonymised.com]
Sent: Saturday, 26 September 2009 7:56 PM
To: Francois Prunayre
Cc: Devel geonetwork-devel@lists.sourceforge.net
Subject: Re: [GeoNetwork-devel] Related document indexing eg. kml and wfs indexing
Hi All,
Looks good Francois - with regard to the WFS, the proposal crosses over
with the proposal to harvest metadata from a WFS by converting features
to ISO metadata fragments which can be linked into records
(ComposedMetadata proposal in the list of proposals on
http://trac.osgeo.org/geonetwork/proposals). I guess by comparison the
composed metadata records harvested from WFS approach is an attempt to
structure the info from the WFS rather than dump it directly into the
index for free text search (both are valid approaches - composing the
metadata records requires more work but permits targetted searching and
because it uses a GN harvester & the xlink cache indexing is still speedy).
Would also be interesting to index content from attached document
resources like pdf or doc files, maybe using the apache tika content
analysis toolkit too? (Apache Tika – Apache Tika)
Cheers,
Simon
Francois Prunayre wrote:
Hi Thijs,
2009/9/25 Thijs Brentjens <lists@anonymised.com>:
Great idea. Could be very powerful! Just to get it right for me: this patch
indexes data directly (if referred to in a metadata record) and adds this
information to the metadata records to improve search results.
That the point.
Possible practical issue: for WFS, even if you're using maxFeatures (as in
the patch), still the indexes could grow quickly, so I think one does want
to use a relatively small amount of features for indexing.
True, an idea, could be also to remove all non-text fields which
sounds not really useful at first glance.
But if using just
a few features, maybe the data returned is not representative enough. So
there is some balance to find here (worth experimenting..). But still, I
think it improves matching search results to queries.
And in some cases, when data could change quickly in time, the indexes may
become outdated, possibly resulting incorrect search results.
True also, but the index is updated for a record, everytime somebody
look at it (due to popularity increase) and related documents will be
parsed again (maybe we should only update the popularity value in the
index but for the time being the full record is reindex).
But again:
this is just in very rare cases.. I think these are just minor issues;
things to find out if they really do occur. Do you have some results / demo
maybe?
Not really, just had a try with some WFS I know about.
And to enable this feature, maybe add an extra queryable as well? To search
on the data (only) or maybe disable searches on data somehow? Would that be
possible?
For that, we could create a specific field in the index; "any"
contains metadata full text info, another field to store data info.
Easy.
Maybe this field could be updated on a regular basis in a background task.
Thanks for the comments.
Francois.
best regards,
Thijs
Francois Prunayre schreef:
Hi list, this is more an experiment on how to index related documents
which could be referenced in a metadata records.
For example having a kml document or a related WFS services in the
distribution section, we could then try to retrieve the document (GML
or KML) and index them in the full text search criteria (ie. any) the
content of those remote document.
This will slow down the index process for sure but could be useful in some
ways.
Attached a quick patch adding the feature to the index mechanism for
iso19139 records.
Any thoughts ? Any people working on that direction ?
Ciao.
Francois
------------------------------------------------------------------------
------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
Best Open Source Mac Front-Ends 2024
------------------------------------------------------------------------
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
geonetwork-devel List Signup and Options
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork
------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
Best Open Source Mac Front-Ends 2024
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
geonetwork-devel List Signup and Options
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork
------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork