[GeoNetwork-devel] Remote search; use of repositories.xml

The remote search functionality is currently "hidden
away" in main-page.xsl and elsewhere;
I need it, so I've been
using "main.home?intermap=off&remote=on&extended=on"
to make it visible in my browser.

I guess remote searching is not going back into
the trunk soon (or is it?), so the following is not in any
way urgent - I report this here because it
matters to me . . .

I spent some time scratching my head trying to figure
out the meaning of repositories.xml - it is not
obvious!

In the repositories.xml(.tem) file supplied with GN,
there are exactly the same number of "Collection",
"Repository", and "Instance" elements, and
the instance_dn, collection_dn, and repository_dn
attributes match up beautifully byte-for-byte.

I think this hides an error in main-page.xsl.

From my reading of the JZKit source code (and the
GN code which uses JZKit) it seems that Z39.50 remote
searching is only by "Collection", not by "Instance".
The idea is that a "Collection" may be accessible
from more than one "Repository" - for each
"Collection", there will be one or more "Instance"s
of "Repository"s (Z39.50 servers) that can be used
to search that "Collection". (It turns out the JZKit
code always uses the last "Instance" when choosing
a "Repository" to connect to to search a "Collection".
For the record, this is in the second createTask()
method of
jzkit/src/com/k_int/hss/HeterogeneousSetOfSearchable.java.)

JZKit does not appear to support
searching "Instance"s, only "Collection"s.

So the GN remote search page should present a list of
the "Collection"s, _not_ a list of "Instance"s,
and the <servers> values sent back by the search
form (which are passed on to JZKit) should be the
collection_dn values, not the instance_dn values.

If I'm right, the attached patch should do the
trick. As I said earlier, this makes absolutely no
difference to the generated HTML search page
_at the moment_ because everything in the supplied
repositories.xml.tem file matches up so perfectly.

If I'm wrong, I must be _really_ confused - someone
please enlighten me!

--
Richard Walker
Software Improvements Pty Ltd
Phone: +61 2 6273 2055
Fax: +61 2 6273 2082

(attachments)

gnpatch6.txt (1011 Bytes)

Hi Richard,
Without exact feedback on your suggested fix (I would need more time for that while others are also working on Z39.50), I can say that although the remote search interface is not visible in the homepage now, it is something that we can bring back. I see there's a need for it from several requesting people. Archie was working on improving the Z39.50 search as was Simon. I some time ago integrated the search panel back into the homepage but didn't commit that code since some things at the back end needed checking and Archie was working on that.
The repositories file is something that was used in that form by JZKit as you noticed. It's an ugly format with lots of repetition. Archie was looking at moving the repositories information to a table so it can be updated more easily. Same thing for searching "new/unknown" catalogs on the fly.
So I would just encourage you, Simon and Archie to work on the Z39.50 integration and improvement, write a short proposal on the WIKI and than we can decide to move it back in.
Ciao,
Jeroen

On Apr 3, 2008, at 7:36 AM, Software Improvements gn-devel wrote:

The remote search functionality is currently "hidden
away" in main-page.xsl and elsewhere;
I need it, so I've been
using "main.home?intermap=off&remote=on&extended=on"
to make it visible in my browser.

I guess remote searching is not going back into
the trunk soon (or is it?), so the following is not in any
way urgent - I report this here because it
matters to me . . .

I spent some time scratching my head trying to figure
out the meaning of repositories.xml - it is not
obvious!

In the repositories.xml(.tem) file supplied with GN,
there are exactly the same number of "Collection",
"Repository", and "Instance" elements, and
the instance_dn, collection_dn, and repository_dn
attributes match up beautifully byte-for-byte.

I think this hides an error in main-page.xsl.

From my reading of the JZKit source code (and the
GN code which uses JZKit) it seems that Z39.50 remote
searching is only by "Collection", not by "Instance".
The idea is that a "Collection" may be accessible
from more than one "Repository" - for each
"Collection", there will be one or more "Instance"s
of "Repository"s (Z39.50 servers) that can be used
to search that "Collection". (It turns out the JZKit
code always uses the last "Instance" when choosing
a "Repository" to connect to to search a "Collection".
For the record, this is in the second createTask()
method of
jzkit/src/com/k_int/hss/HeterogeneousSetOfSearchable.java.)

JZKit does not appear to support
searching "Instance"s, only "Collection"s.

So the GN remote search page should present a list of
the "Collection"s, _not_ a list of "Instance"s,
and the <servers> values sent back by the search
form (which are passed on to JZKit) should be the
collection_dn values, not the instance_dn values.

If I'm right, the attached patch should do the
trick. As I said earlier, this makes absolutely no
difference to the generated HTML search page
_at the moment_ because everything in the supplied
repositories.xml.tem file matches up so perfectly.

If I'm wrong, I must be _really_ confused - someone
please enlighten me!

--
Richard Walker
Software Improvements Pty Ltd
Phone: +61 2 6273 2055
Fax: +61 2 6273 2082

Index: web/geonetwork/xsl/main-page.xsl

--- web/geonetwork/xsl/main-page.xsl (revision 1246)
+++ web/geonetwork/xsl/main-page.xsl (working copy)
@@ -1003,11 +1003,9 @@
          <td class="padded">
            <select class="content" name="servers" size="6" multiple="true"
              onchange="serverSelected()">
- <xsl:for-each select="/root/gui/repositories/Instance">
- <xsl:variable name="name" select="@instance_dn"/>
- <xsl:variable name="collection" select="@collection_dn"/>
- <xsl:variable name="description"
- select="/root/gui/repositories/Collection[@collection_dn=$collection]/@collection_name"/>
+ <xsl:for-each select="/root/gui/repositories/Collection">
+ <xsl:variable name="name" select="@collection_dn"/>
+ <xsl:variable name="description" select="@collection_name"/>
                <option>
                  <xsl:if
                    test="/root/gui/searchDefaults/servers/server[string(.)=$name]">
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Jeroen Ticheler wrote:

The repositories file is something that was used in that form by JZKit as you noticed. It's an ugly format with lots of repetition.

Well, to be more generous:

The format of repositories.xml is not too bad - it offers
a lot of flexibility that, at least, in theory, is
worth having. I gave the example of having multiple
"Instance"s of a "Collection". It's just that the JZKit
code doesn't let you use that flexibility - there's
no point in defining more than one "Instance" of
a "Collection" because the code always chooses
the last one.

The "problem" of the repetition in repositories.xml
could be solved using a small amount of XSL. For
example, you could remove all of the Collection
and Instance elements (i.e., leaving just
the Repository elements), and use something
like the following script to add them again:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform&quot; version="1.0">

   <xsl:output
       method="xml"
       encoding="UTF-8"
       indent="yes"
       />

   <xsl:template match="RepositoryDirectory">
     <xsl:copy>
       <TypeMapping type="Z3950" class="com.k_int.z3950.IRClient.Z3950Origin" />
       <TypeMapping type="HSS" class="com.k_int.srw.client.SRWSearchable" />
       <xsl:apply-templates select="*" />
     </xsl:copy>
   </xsl:template>

   <xsl:template match="Repository">
     <xsl:variable name="dn" select="@repository_dn" />
     <xsl:variable name="name" select="@name" />
     <Collection collection_dn="{$dn}" collection_name="{$name}" />
     <xsl:copy-of select="." />
     <Instance instance_dn="{$dn}" collection_dn="{$dn}" repository_dn="{$dn}" local_name="{$name}" />
   </xsl:template>

</xsl:stylesheet>

--
Richard Walker
Software Improvements Pty Ltd
Phone: +61 2 6273 2055
Fax: +61 2 6273 2082