[Geonetwork-devel] Re: FAO-EDINA CSW 2.0 RFQ proposal (the lack of) (fwd)

Hi Chris,
As discussed on the phone with you, here’s a list of things you could think of when starting to work with in the GeoNetwork opensource project. We can discuss these in greater detail once you did a first assessment.

1- Finish the DescribeRecord operation

2- Create a CQL parser (in java, as a library) and improve the XSL stylesheet that converts CQL into Filter expressions.

3- Investigate how to implement all spatial queries. Currently a number of them are implemented through a Lucene index, but not all. As discussed on the phone, this may need to be implemented as a separate search routine that produces a result set that should be combined with the Lucene build result set. (this requires a good knowledge of GN’s internals)

4- We currently don’t implement metadata services. To add them we have to improve the search engine.

5- Implement the harvest operations (another complex task)

6- Implement the transaction operations (another complex task)

7- Design an approach for the use of controlled vocabularies. I agree with Rob that we should no longer ignore these :slight_smile:

8- Think about approaches to handle data set and feature level metadata. We do too :wink:

OK, that’s more than enough I think for a first go :wink:

I suggest that in designing and implementing these, we keep an active communication and discussion. We may need to consider a face to face meeting at some stage of the process between us. We are planning a similar meeting with FGDC and we could think of doing something combined!?

I have copied Doug Nebert and Michelle Antony to ensure they are also fully aware of what is going on here. I really want to put in all efforts possible to ensure we don’t implement things twice or have divergent approaches that may result in forks. In the long run i think we need to jointly work on a design plan toward GeoNetwork opensource 3.0. I hope to start such a design early next year once we get GeoNetwork opensource moved into the OSGEO foundation.

Looking forward to hear from you, greetings from Rome,
Jeroen


Jeroen Ticheler
FAO-UN
Tel: +39 06 57056041
http://www.fao.org/geonetwork
42.07420°N 12.34343°E

On May 10, 2006, at 3:04 PM, Rob Atkinson wrote:

The key issue in metadata creation, and searching, is the use of controlled vocabularies.

A CSW implementation can either ignore these (vocabularies are someone elses problem) or actually build in these. Using the ebRIM profile, vocabularies are first-class objects within a registry, and can be ordered into hierarchical classification schemes.

The work that EDINA will need to do for MOTIIVE is going to exploit this concept. It would be useful to consider the “information architecture” of the proposed work, the existing codebase and geonetworks’ planned future.

Interoperability with service catalogues will depend on flexibility to handle multiple artefact types required for service binding, and the integration of vocabulary concepts.

Rob Atkinson

Chris Higgins wrote:

Hi Jeroen,

We are happy to continue this dialogue and will have plenty opportunity to
discuss next month.

Meanwhile though, here on the ground in Edinburgh, we have an immediate
requirement to make our metadata catalogue CSW 2.0 compliant and have an
experienced software engineer ready to start work on this asap, eg,
Monday.

In our discussions last week, you mentioned that the metadata catalogue
component of GeoNetwork was more or less already CSW 2.0 compliant and
that you were considering putting it into Beta in the next few months for
a release in September.

Would it be possible for EDINA to get an immediate copy of the code as it
is and commence work under the direction of the technical lead? Depends
on how you want to play it, and what works for you. What we need is our
metadata catalogue made CSW 2.0 compliant and we dont think it makes much
sense for us to be rolling our own if there is a quality open source
solution. Even better if we can contribute.

Longer term, I suggest we consider the discussion with the OGC. If we can
find a way of collaborating on GeoNetwork that works for both of us, then
EDINA will be testing interoperability with GoGeo (our UK academic sector
geoportal). I would think it likely, and desirable, that interoperability
with the catalogues used for the INSPIRE data harmonisation projects
Motiive and RISE also come into the equation at the appropriate point.

Regards

Chris

---------- Forwarded message ----------
Date: Tue, 9 May 2006 07:49:40 -0400
From: George Percivall <gpercivall@anonymised.com>
To: Jeroen Ticheler <Jeroen.Ticheler@anonymised.com>
Cc: Mark Reichardt <mreichardt@anonymised.com>,
John Latham <John.Latham@anonymised.com>, Chris Higgins <chris.higgins@anonymised.com>,
Rob Atkinson <rob@anonymised.com>, Doug Nebert <ddnebert@anonymised.com>,
Michelle L Anthony <anthony@…99…>,
Andrea Carboni <acarboni@anonymised.com>,
Mick Wilson <Mick.Wilson@anonymised.com>, Raj Singh <rsingh@anonymised.com>
Subject: Re: FAO-EDINA CSW 2.0 RFQ proposal (the lack of)

Jeroen,

Thank you for your message and interest in OGC catalog developments.

The emphasis on CSW-ebRIM-SOAP in OWS-4/GPW is driven by sponsor
requirements that were informed by the OWS-3 results on catalog
(ebRIM for registering a variety of items) and the target environment
for the sponsor (SOAP). This will not be a CITE Reference
Implementation

The CSW Reference Implementation in OWS-4/CITE is not restricted to a
profile which presents some challenges as without identifying a
profile the verifying interoperability is difficult.

Looking forward I would encourage your group to have a continued
discussion about developing a Reference Implementation for CSW ISO
profile. There are several possible routes to achieve this. A
Reference Implementation can do directly to the CTIE SC for
consideration to be approved by the PC. Your group might consider an
Interoperability Experiment to develop the RI. Additionally your
group could develop sponsorship to conduct a IP Testbed focused on
the topic. Raj Singh and I are here to help you in considering these
approaches.

George

On May 9, 2006, at 4:26 AM, Jeroen Ticheler wrote:

Dear George and Mark,

As you may have noticed, we have not put in a proposal for the CSW
2.0 Reference Implementation. It was my intention to call you about
this on Friday, but I got distracted. However, I wish to take this
opportunity to provide you some background on why we did not submit
a proposal in the end.

We had discussions with key people in the Catalog Services
“world” (Chris Higgins, Rob Atkinson and Doug Nebert) as we tried
to make our submission a joint proposal between FAO and EDINA with
support from FGDC, UNEP and a wider UN community and CGIAR
community using GeoNetwork opensource as their base spatial catalog
application.

FAO’s initial idea was to focus on the implementation of the ISO
profile and than implement the ebRim profile with the support of
EDINA, making GeoNetwork opensource a candidate open source
reference implementation for both profiles. This implementation
would go hand in hand with a set of CITE test suite scripts based
on the NG engine.

However, after going in depth through all requirements and through
a drafting phase, we concluded that the requirements from the RFQ
were restricted to the implementation of the ebRim profile. To our
common understanding, the ebRim profile still lacks essential
mapping parts for us to be able to implement it in a consistent and
indisputable manner. At that point we have concluded that there was
too much risk involved in the development process, making it
impossible for us to submit a sound proposal to OGC.

Nevertheless, I hope that OGC will be able to find sponsors in
another OWS round that are willing to support the development of a
reference implementation in open source based on the ISO profile.
The same is true for an implementation based on the North American
profile currently under development. I am not sure if you have
received any submissions for an open source ebRim based solution?
If not, that might be another candidate once the issues currently
surrounding the ebRim profile have been resolved.

I hope you understand our considerations and that you find this
information useful as input in the future planning of OWS test beds.

Looking forward to hear from you, kind regards,

Jeroen


Jeroen Ticheler
FAO-UN
SDRN - Room F817
Viale delle Terme di Caracalla, 00100 Rome - Italy
Tel/Fax: +39 06 570 56041/53369
http://www.fao.org/geonetwork
http://metart.fao.org

George Percivall
Open Geospatial Consortium
http://www.opengeospatial.org/
E-mail: percivall@anonymised.com
Voice: 1+301-560-6439



This communication, including attachments, is for the exclusive use
of addressee(s). If you are not the intended recipient, any use,
copying, disclosure, dissemination or distribution is strictly
prohibited. If you are not the intended recipient, please notify the
sender immediately by return email and delete this communication and
destroy all copies.



Hi Jeroen,

An awful lot of this stuff is basic services stuff. It all depends on how you bind to the persistence layer underneath whether you can use existing opensource implementations.

I wouldnt think its too hard to build a lucene based dataStore under the geotools framework, for example.

I'm more interested in the actual capabilities of the planned catalogue to do "catalogue things". Our experience of semantic interoperability implementation suggests strongly that its the formal relationships between registered objects that contain the exploitable semantics. i.e. you want to know that a particula WMS service exposes a particula data set via a particular layer (relationships between three objects, and you may only need to search against one for discovery). A layer supports a CRS (another relationship).

When you connect to real data (via a WFS) you need a lot more catalogued/registered information, and the relationships are even more critical.

So, the "information architecture" supported becomes critical as soon as you think about metadata to drive service discovery or data access.

I'm not sure how I'll find an engagement that lets me work on this in detail with you, but I'd be grateful if you'd bear me in mind if you are looking into these issues in a strategic way.

Regards
Rob Atkinson

Jeroen Ticheler wrote:

Hi Chris,
As discussed on the phone with you, here's a list of things you could think of when starting to work with in the GeoNetwork opensource project. We can discuss these in greater detail once you did a first assessment.

1- Finish the DescribeRecord operation

2- Create a CQL parser (in java, as a library) and improve the XSL stylesheet that converts CQL into Filter expressions.

3- Investigate how to implement all spatial queries. Currently a number of them are implemented through a Lucene index, but not all. As discussed on the phone, this may need to be implemented as a separate search routine that produces a result set that should be combined with the Lucene build result set. (this requires a good knowledge of GN's internals)

4- We currently don't implement metadata services. To add them we have to improve the search engine.

5- Implement the harvest operations (another complex task)

6- Implement the transaction operations (another complex task)

7- Design an approach for the use of controlled vocabularies. I agree with Rob that we should no longer ignore these :slight_smile:

8- Think about approaches to handle data set and feature level metadata. We do too :wink:

OK, that's more than enough I think for a first go :wink:

I suggest that in designing and implementing these, we keep an active communication and discussion. We may need to consider a face to face meeting at some stage of the process between us. We are planning a similar meeting with FGDC and we could think of doing something combined!?

I have copied Doug Nebert and Michelle Antony to ensure they are also fully aware of what is going on here. I really want to put in all efforts possible to ensure we don't implement things twice or have divergent approaches that may result in forks. In the long run i think we need to jointly work on a design plan toward GeoNetwork opensource 3.0. I hope to start such a design early next year once we get GeoNetwork opensource moved into the OSGEO foundation.

Looking forward to hear from you, greetings from Rome,
Jeroen

_______________________
Jeroen Ticheler
FAO-UN
Tel: +39 06 57056041
http://www.fao.org/geonetwork
42.07420°N 12.34343°E

On May 10, 2006, at 3:04 PM, Rob Atkinson wrote:

The key issue in metadata creation, and searching, is the use of controlled vocabularies.

A CSW implementation can either ignore these (vocabularies are someone elses problem) or actually build in these. Using the ebRIM profile, vocabularies are first-class objects within a registry, and can be ordered into hierarchical classification schemes.

The work that EDINA will need to do for MOTIIVE is going to exploit this concept. It would be useful to consider the "information architecture" of the proposed work, the existing codebase and geonetworks' planned future.

Interoperability with service catalogues will depend on flexibility to handle multiple artefact types required for service binding, and the integration of vocabulary concepts.

Rob Atkinson

Chris Higgins wrote:

Hi Jeroen,

We are happy to continue this dialogue and will have plenty opportunity to
discuss next month.

Meanwhile though, here on the ground in Edinburgh, we have an immediate
requirement to make our metadata catalogue CSW 2.0 compliant and have an
experienced software engineer ready to start work on this asap, eg,
Monday.

In our discussions last week, you mentioned that the metadata catalogue
component of GeoNetwork was more or less already CSW 2.0 compliant and
that you were considering putting it into Beta in the next few months for
a release in September.

Would it be possible for EDINA to get an immediate copy of the code as it
is and commence work under the direction of the technical lead? Depends
on how you want to play it, and what works for you. What we need is our
metadata catalogue made CSW 2.0 compliant and we dont think it makes much
sense for us to be rolling our own if there is a quality open source
solution. Even better if we can contribute.

Longer term, I suggest we consider the discussion with the OGC. If we can
find a way of collaborating on GeoNetwork that works for both of us, then
EDINA will be testing interoperability with GoGeo (our UK academic sector
geoportal). I would think it likely, and desirable, that interoperability
with the catalogues used for the INSPIRE data harmonisation projects
Motiive and RISE also come into the equation at the appropriate point.

Regards

Chris

---------- Forwarded message ----------
Date: Tue, 9 May 2006 07:49:40 -0400
From: George Percivall <gpercivall@anonymised.com <mailto:gpercivall@anonymised.com>>
To: Jeroen Ticheler <Jeroen.Ticheler@anonymised.com <mailto:Jeroen.Ticheler@anonymised.com>>
Cc: Mark Reichardt <mreichardt@anonymised.com <mailto:mreichardt@anonymised.com>>,
     John Latham <John.Latham@anonymised.com <mailto:John.Latham@anonymised.com>>, Chris Higgins <chris.higgins@anonymised.com <mailto:chris.higgins@anonymised.com>>,
     Rob Atkinson <rob@anonymised.com <mailto:rob@anonymised.com>>, Doug Nebert <ddnebert@anonymised.com <mailto:ddnebert@anonymised.com>>,
     Michelle L Anthony <anthony@anonymised.com <mailto:anthony@anonymised.com>>,
     Andrea Carboni <acarboni@anonymised.com <mailto:acarboni@anonymised.com>>,
     Mick Wilson <Mick.Wilson@anonymised.com <mailto:Mick.Wilson@anonymised.com>>, Raj Singh <rsingh@anonymised.com <mailto:rsingh@anonymised.com>>
Subject: Re: FAO-EDINA CSW 2.0 RFQ proposal (the lack of)

Jeroen,

Thank you for your message and interest in OGC catalog developments.

The emphasis on CSW-ebRIM-SOAP in OWS-4/GPW is driven by sponsor
requirements that were informed by the OWS-3 results on catalog
(ebRIM for registering a variety of items) and the target environment
for the sponsor (SOAP). This will not be a CITE Reference
Implementation

The CSW Reference Implementation in OWS-4/CITE is not restricted to a
profile which presents some challenges as without identifying a
profile the verifying interoperability is difficult.

Looking forward I would encourage your group to have a continued
discussion about developing a Reference Implementation for CSW ISO
profile. There are several possible routes to achieve this. A
Reference Implementation can do directly to the CTIE SC for
consideration to be approved by the PC. Your group might consider an
Interoperability Experiment to develop the RI. Additionally your
group could develop sponsorship to conduct a IP Testbed focused on
the topic. Raj Singh and I are here to help you in considering these
approaches.

George

On May 9, 2006, at 4:26 AM, Jeroen Ticheler wrote:

Dear George and Mark,

As you may have noticed, we have not put in a proposal for the CSW
2.0 Reference Implementation. It was my intention to call you about
this on Friday, but I got distracted. However, I wish to take this
opportunity to provide you some background on why we did not submit
a proposal in the end.

We had discussions with key people in the Catalog Services
"world" (Chris Higgins, Rob Atkinson and Doug Nebert) as we tried
to make our submission a joint proposal between FAO and EDINA with
support from FGDC, UNEP and a wider UN community and CGIAR
community using GeoNetwork opensource as their base spatial catalog
application.

FAO's initial idea was to focus on the implementation of the ISO
profile and than implement the ebRim profile with the support of
EDINA, making GeoNetwork opensource a candidate open source
reference implementation for both profiles. This implementation
would go hand in hand with a set of CITE test suite scripts based
on the NG engine.

However, after going in depth through all requirements and through
a drafting phase, we concluded that the requirements from the RFQ
were restricted to the implementation of the ebRim profile. To our
common understanding, the ebRim profile still lacks essential
mapping parts for us to be able to implement it in a consistent and
indisputable manner. At that point we have concluded that there was
too much risk involved in the development process, making it
impossible for us to submit a sound proposal to OGC.

Nevertheless, I hope that OGC will be able to find sponsors in
another OWS round that are willing to support the development of a
reference implementation in open source based on the ISO profile.
The same is true for an implementation based on the North American
profile currently under development. I am not sure if you have
received any submissions for an open source ebRim based solution?
If not, that might be another candidate once the issues currently
surrounding the ebRim profile have been resolved.

I hope you understand our considerations and that you find this
information useful as input in the future planning of OWS test beds.

Looking forward to hear from you, kind regards,

Jeroen

____________________________________________________
Jeroen Ticheler
FAO-UN
SDRN - Room F817
Viale delle Terme di Caracalla, 00100 Rome - Italy
Tel/Fax: +39 06 570 56041/53369
http://www.fao.org/geonetwork
http://metart.fao.org

George Percivall
Open Geospatial Consortium
http://www.opengeospatial.org/
E-mail: percivall@anonymised.com <mailto:percivall@anonymised.com>
Voice: 1+301-560-6439

************************************************************************
****
This communication, including attachments, is for the exclusive use
of addressee(s). If you are not the intended recipient, any use,
copying, disclosure, dissemination or distribution is strictly
prohibited. If you are not the intended recipient, please notify the
sender immediately by return email and delete this communication and
destroy all copies.
************************************************************************
****

Hi Rob,
Thanks for the reaction! see below:

On May 12, 2006, at 1:41 AM, Rob Atkinson wrote:

Hi Jeroen,

An awful lot of this stuff is basic services stuff. It all depends on how you bind to the persistence layer underneath whether you can use existing opensource implementations.

Sure, that's what we need as the basis I think, basic working services. We will have to see how the persistence layer should evolve, but I think we have a fairly solid and flexible one so far.

I wouldnt think its too hard to build a lucene based dataStore under the geotools framework, for example.

We've been carefully looking at geotools, as I really felt we shouldn't do things in isolation if possible. We had a couple of issues that made us not to go with geotools so far. One of them being that there was nothing CSW and metadata related in there yet, just placeholders. Another is that in geotools everything is available in the form of objects that map XML structures while we use dom objects as the basis for manipulating XML (using JDOM). Andrea will be happy to correct my wording where needed :wink:

I'm more interested in the actual capabilities of the planned catalogue to do "catalogue things". Our experience of semantic interoperability implementation suggests strongly that its the formal relationships between registered objects that contain the exploitable semantics. i.e. you want to know that a particula WMS service exposes a particula data set via a particular layer (relationships between three objects, and you may only need to search against one for discovery). A layer supports a CRS (another relationship).

When you connect to real data (via a WFS) you need a lot more catalogued/registered information, and the relationships are even more critical.

So, the "information architecture" supported becomes critical as soon as you think about metadata to drive service discovery or data access.

I'm not sure how I'll find an engagement that lets me work on this in detail with you, but I'd be grateful if you'd bear me in mind if you are looking into these issues in a strategic way.

It would be really great to have you put in ideas and strategies!! I'm very much open to have that kind of input and support. I'm sure many others involved in the project would agree. Our goal is really to develop a quality catalog application while at the same time come out with a user friendly front end to deal with these complex things. This means that not all is perfect from the start, but that the evolution of the software goes hand in hand with what we know works in the catalog "arena". CSW 2 is only in its infancy I think. Having a catalog that already deals with much of the basic things our users require is however a big benefit up to now. We can only improve from here.

I'm thinking of the most appropriate strategy toward the future regarding system design. We want to move GeoNetwork opensource into the OSGEO foundation when possible. At that point we can also benefit of the collabnet (!?) functions provided through OSGEO.

Thanks again,
Jeroen

Regards
Rob Atkinson

I wouldnt think its too hard to build a lucene based dataStore under the geotools framework, for example.

We've been carefully looking at geotools, as I really felt we shouldn't do things in isolation if possible. We had a couple of issues that made us not to go with geotools so far. One of them being that there was nothing CSW and metadata related in there yet, just placeholders. Another is that in geotools everything is available in the form of objects that map XML structures while we use dom objects as the basis for manipulating XML (using JDOM). Andrea will be happy to correct my wording where needed :wink:

Understand. Geoserver went to a lot of effort to use SAX so they could stream large amounts of data. Its probably reasonably simple to create a SAX interface to a DOM in-memory model, or to create a DOM output

I'm more interested in the actual capabilities of the planned catalogue to do "catalogue things". Our experience of semantic interoperability implementation suggests strongly that its the formal relationships between registered objects that contain the exploitable semantics. i.e. you want to know that a particula WMS service exposes a particula data set via a particular layer (relationships between three objects, and you may only need to search against one for discovery). A layer supports a CRS (another relationship).

When you connect to real data (via a WFS) you need a lot more catalogued/registered information, and the relationships are even more critical.

So, the "information architecture" supported becomes critical as soon as you think about metadata to drive service discovery or data access.

I'm not sure how I'll find an engagement that lets me work on this in detail with you, but I'd be grateful if you'd bear me in mind if you are looking into these issues in a strategic way.

It would be really great to have you put in ideas and strategies!! I'm very much open to have that kind of input and support. I'm sure many others involved in the project would agree. Our goal is really to develop a quality catalog application while at the same time come out with a user friendly front end to deal with these complex things.

Totally agree here - IMHO we will achieve a degree of usability and usefulness only when we achieve minimisation of maintenance effort through modularity - and this means having a client and service (open source reference implementations particularly) that agree on the modular breakdown. Metadata creation thus involves a lot of discovery of reusable metadata components, so we have the need for quite a powerful content discovery and management capability.

This means that not all is perfect from the start, but that the evolution of the software goes hand in hand with what we know works in the catalog "arena". CSW 2 is only in its infancy I think. Having a catalog that already deals with much of the basic things our users require is however a big benefit up to now. We can only improve from here.

I'm personally reasonable agnostic about the catalog protocol - I believe the information model and governance structures are critical, and we can probably support alternative protocols easily (CSW, Z39.50, ebXML, UDDI etc)

I'm thinking of the most appropriate strategy toward the future regarding system design. We want to move GeoNetwork opensource into the OSGEO foundation when possible. At that point we can also benefit of the collabnet (!?) functions provided through OSGEO.

Sounds exciting. I think the bit I can contribute to best is having "extended" experience using catalogues to bind to vocabulary and WFS data access services in a variety of Use Cases, a strong theoretical understanding of the ISO abstract models and computing issues and having been focussed on understanding the role of governance in SDI management and semantic interoperability for the last few years.

Rob

Thanks again,
Jeroen

Regards
Rob Atkinson