#623: Implement NullCheck operator in CSW
--------------------------------+-------------------------------------------
Reporter: josegar74 | Owner: geonetwork-devel@…
Type: defect | Status: new
Priority: major | Milestone: v2.6.5
Component: Metadata standards | Version: v2.6.3
Keywords: |
--------------------------------+-------------------------------------------
!NullCheck operator is mandatory in INSPIRE, but !GeoNetwork although it
returns the operator in the Capabilities document, it's not implemented.
Search for null/empty values with lucene is not that easy. A solution that
seem working well is index these values as a predefined dummy string as
proposed in some forums related to Lucene. Seem next versions of Lucene
will manage null values better, so will need some changes when update the
version in !GeoNetwork.
The solution will work like this when using !PropertyIsNull in CSW
queries:
{{{
<PropertyIsNull>
<PropertyName>Abstract</PropertyName>
</PropertyIsNull>
}}}
is translated to lucene like (suppose dummy value = ZZZZZZZZZZZZZZZ):
{{{
abstract: ZZZZZZZZZZZZZZZ
}}}
A limitation is for sure if the "dummy" string is meaningful in any
document, but selecting it with a non "usual" value should be fine.
Will try to provide a patch later to test, but any comment in this
solution is very welcome.
--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/623>
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/>
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.
#623: Implement NullCheck operator in CSW
--------------------------------+-------------------------------------------
Reporter: josegar74 | Owner: geonetwork-devel@…
Type: defect | Status: new
Priority: major | Milestone: v2.6.5
Component: Metadata standards | Version: v2.6.3
Keywords: |
--------------------------------+-------------------------------------------
Comment(by heikki):
It sounds OK to me, but 2 remarks :
* maybe use an even less common dummy value than ZZZZZZZZZZ, for example
"dsfgerg3453fdsgdfgdf" ;
* I think you should make sure that these values are filtered /
transformed away whenever the metadata is viewed, edited, exported,
retrieved by CSW, etc.
--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/623#comment:1>
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/>
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.
#623: Implement NullCheck operator in CSW
--------------------------------+-------------------------------------------
Reporter: josegar74 | Owner: geonetwork-devel@…
Type: defect | Status: new
Priority: major | Milestone: v2.6.5
Component: Metadata standards | Version: v2.6.3
Keywords: |
--------------------------------+-------------------------------------------
Comment(by josegar74):
Instead of using a dummy value in the actual lucene index fields and to
avoid any side effect in xsl/java code that uses lucene index values,
implementing this solution:
If a metadata field is empty is indexed a field named {{{fieldName_Null}}}
with the value "yes". Searches for null values in the field become:
{{{
fieldName_Null: yes
}}}
This way, no changes to actual lucene index fields content.
Checked also that this way doesn't duplicate the number of indexed fields
in Lucene: when the field {{{fieldName_Null}}} is indexed, {{{fieldName}}}
field is not indexed and the other way around.
--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/623#comment:2>
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/>
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.
#623: Implement NullCheck operator in CSW
--------------------------------+-------------------------------------------
Reporter: josegar74 | Owner: geonetwork-devel@…
Type: defect | Status: new
Priority: major | Milestone: v2.6.5
Component: Metadata standards | Version: v2.6.3
Keywords: |
--------------------------------+-------------------------------------------
Comment(by simonp):
Maybe the last suggestion in http://www.gossamer-threads.com/lists/lucene
/java-dev/64663 is worth a try?
--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/623#comment:3>
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/>
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.
#623: Implement NullCheck operator in CSW
--------------------------------+-------------------------------------------
Reporter: josegar74 | Owner: geonetwork-devel@…
Type: defect | Status: new
Priority: major | Milestone: v2.6.5
Component: Metadata standards | Version: v2.6.3
Keywords: |
--------------------------------+-------------------------------------------
Comment(by josegar74):
Hi Simon
Changed to:
{{{
<xsl:template match="ogc:PropertyIsNull">
<BooleanQuery>
<BooleanClause required="true" prohibited="false">
<MatchAllDocsQuery required="true"
prohibited="false"/>
</BooleanClause>
<BooleanClause required="false" prohibited="true">
<RangeQuery fld="{ogc:PropertyName}" lowerTxt="*"
upperTxt="*" inclusive="true"/>
</BooleanClause>
</BooleanQuery>
</xsl:template>
}}}
The query is translated to:
{{{
+(+*:* -abstract:[ TO ]) +_isTemplate:n
}}}
One problem with this solution is * is removed by analyzer. But also
trying this query in Luke:
{{{
+(+*:* -abstract:[* TO *])
}}}
but get all results. In the link about lucene, there's a comment: although
i can't remember if *:* is a Solr extension of part of hte core
QueryParser
So no clear if this should work.
Anyway if you find any bug in the new expression that prevents it to work
properly i can change it (not an expert in Lucene). Otherwise, I'll go for
patch proposed and later if any simple solution we can update
--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/623#comment:4>
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/>
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.
#623: Implement NullCheck operator in CSW
--------------------------------+-------------------------------------------
Reporter: josegar74 | Owner: geonetwork-devel@…
Type: defect | Status: new
Priority: major | Milestone: v2.6.5
Component: Metadata standards | Version: v2.6.3
Keywords: |
--------------------------------+-------------------------------------------
Comment(by simonp):
Jose,
A variation on that email seems to work for me: In my lucene index I have
31 documents, 2 documents have a value indexed in field altTitle
(alternate title).
If using Luke (1.0.1) I enter a query in the search tab of:
*:* -altTitle:*
I get the 29 documents that don't have a value indexed for altTitle but
this requires 'Allow leading * in wildcard queries' to be checked in the
queryparser settings of the search tab on Luke so this will require some
additional work to make sure GeoNetwork lucene calls can do this too.....?
--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/623#comment:5>
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/>
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.
#623: Implement NullCheck operator in CSW
--------------------------------+-------------------------------------------
Reporter: josegar74 | Owner: geonetwork-devel@…
Type: defect | Status: new
Priority: major | Milestone: v2.6.5
Component: Metadata standards | Version: v2.6.3
Keywords: |
--------------------------------+-------------------------------------------
Comment(by simonp):
Actually the following in csw/filter-to-lucene.xsl:
{{{
<xsl:template match="ogc:PropertyIsNull">
<BooleanQuery>
<BooleanClause required="true" prohibited="false">
<MatchAllDocsQuery required="true" prohibited="false"/>
</BooleanClause>
<BooleanClause required="false" prohibited="true">
<WildcardQuery fld="{ogc:PropertyName}" txt="*"/>
</BooleanClause>
</BooleanQuery>
</xsl:template>
}}}
and a CSW query with:
{{{
<PropertyIsNull>
<PropertyName>altTitle</PropertyName>
</PropertyIsNull>
}}}
Generates a lucene query in GeoNetwork like the one in Luke:
Lucene query: +(+(+*:* -altTitle:*) +_isTemplate:n) +(_op0:2 _op0:1 _op0:0
_op0:-1 _owner:1)
which seems to work ie. returns all the records that don't have an
altTitle field indexed.
--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/623#comment:6>
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/>
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.
#623: Implement NullCheck operator in CSW
--------------------------------+-------------------------------------------
Reporter: josegar74 | Owner: geonetwork-devel@…
Type: defect | Status: new
Priority: major | Milestone: v2.6.5
Component: Metadata standards | Version: v2.6.3
Keywords: |
--------------------------------+-------------------------------------------
Comment(by josegar74):
Tried also and works for me. Going to commit.
Many thanks Simon, you're great!
--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/623#comment:7>
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/>
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.
#623: Implement NullCheck operator in CSW
---------------------------------+------------------------------------------
Reporter: josegar74 | Owner: geonetwork-devel@…
Type: defect | Status: closed
Priority: major | Milestone: v2.6.5
Component: Metadata standards | Version: v2.6.3
Resolution: fixed | Keywords:
---------------------------------+------------------------------------------
Changes (by josegar74):
* status: new => closed
* resolution: => fixed
--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/623#comment:8>
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/>
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.