[GeoNetwork-devel] CFV: Vote to change harvesting schedule

Dear PSC members,

Please check the proposal http://trac.osgeo.org/geonetwork/wiki/proposals/HarvestingSchedule for voting.

Actual schedule for harvesters allow only to define an interval to run periodically the harvesters. The main disadvantage is that this interval is relative to the time the harvester is activated, being not possible to define specific hour to run it.

This proposal modifies the harvesters schedule to be similar as Lucene Index Optimizer schedule, allowing the user to define the initial hour to run them and an interval to reschedule (from 1 h to 1 week).

A patch is going to be add beginning next week.

Thanks and regards,

Jose García


GeoCat Bridge for ArcGIS allows instant publishing of data and metadata on GeoServer and GeoNetwork. Visit http://geocat.net for details.


Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net

Hi Jose,

Is the implementation already done?

Jesse

On Fri, Apr 27, 2012 at 8:55 AM, Jose Garcia <jose.garcia@anonymised.com> wrote:

Dear PSC members,

Please check the proposal
http://trac.osgeo.org/geonetwork/wiki/proposals/HarvestingSchedule for
voting.

Actual schedule for harvesters allow only to define an interval to run
periodically the harvesters. The main disadvantage is that this interval is
relative to the time the harvester is activated, being not possible to
define specific hour to run it.

This proposal modifies the harvesters schedule to be similar as Lucene
Index Optimizer schedule, allowing the user to define the initial hour to
run them and an interval to reschedule (from 1 h to 1 week).

A patch is going to be add beginning next week.

Thanks and regards,

Jose García

--
*
GeoCat Bridge for ArcGIS allows instant publishing of data and metadata on
GeoServer and GeoNetwork. Visit http://geocat.net for details.
_________________________
Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net/&gt;

*

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

Hi Jesse

I did for a custom project in 2.6.X and need only some small changes to
move to trunk.

Regards,
Jose García

On Fri, Apr 27, 2012 at 9:08 AM, Jesse Eichar
<jesse.eichar@anonymised.com>wrote:

Hi Jose,

Is the implementation already done?

Jesse

On Fri, Apr 27, 2012 at 8:55 AM, Jose Garcia <jose.garcia@anonymised.com>wrote:

Dear PSC members,

Please check the proposal
http://trac.osgeo.org/geonetwork/wiki/proposals/HarvestingSchedule for
voting.

Actual schedule for harvesters allow only to define an interval to run
periodically the harvesters. The main disadvantage is that this interval is
relative to the time the harvester is activated, being not possible to
define specific hour to run it.

This proposal modifies the harvesters schedule to be similar as Lucene
Index Optimizer schedule, allowing the user to define the initial hour to
run them and an interval to reschedule (from 1 h to 1 week).

A patch is going to be add beginning next week.

Thanks and regards,

Jose García

--
*
GeoCat Bridge for ArcGIS allows instant publishing of data and metadata
on GeoServer and GeoNetwork. Visit http://geocat.net for details.
_________________________
Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net/&gt;

*

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

--
*
GeoCat Bridge for ArcGIS allows instant publishing of data and metadata on
GeoServer and GeoNetwork. Visit http://geocat.net for details.
_________________________
Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net/&gt;

*

Ok,

I have had some problems with the harvesting scheduler so on geocat I have
changed to use quartz as the scheduling mechanism. It allows cron like
definitions of when to run as well as almost any other scheduling pattern
you can imagine. It looks like your solution is a simplified version of
cron scheduling.

That is fine but my concern is the database changes. My changes do not
require any changes to the database or any migrations since it just parses
the every field either as an interval or as a cron definition. I was just
working on making a proposal for merging the Quartz work to trunk. Do you
think we can work together on this?

Jesse

On Fri, Apr 27, 2012 at 9:10 AM, Jose Garcia <jose.garcia@anonymised.com> wrote:

Hi Jesse

I did for a custom project in 2.6.X and need only some small changes to
move to trunk.

Regards,
Jose García

On Fri, Apr 27, 2012 at 9:08 AM, Jesse Eichar <jesse.eichar@anonymised.com
> wrote:

Hi Jose,

Is the implementation already done?

Jesse

On Fri, Apr 27, 2012 at 8:55 AM, Jose Garcia <jose.garcia@anonymised.com>wrote:

Dear PSC members,

Please check the proposal
http://trac.osgeo.org/geonetwork/wiki/proposals/HarvestingSchedule for
voting.

Actual schedule for harvesters allow only to define an interval to run
periodically the harvesters. The main disadvantage is that this interval is
relative to the time the harvester is activated, being not possible to
define specific hour to run it.

This proposal modifies the harvesters schedule to be similar as Lucene
Index Optimizer schedule, allowing the user to define the initial hour to
run them and an interval to reschedule (from 1 h to 1 week).

A patch is going to be add beginning next week.

Thanks and regards,

Jose García

--
*
GeoCat Bridge for ArcGIS allows instant publishing of data and metadata
on GeoServer and GeoNetwork. Visit http://geocat.net for details.
_________________________
Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net/&gt;

*

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

--
*
GeoCat Bridge for ArcGIS allows instant publishing of data and metadata on
GeoServer and GeoNetwork. Visit http://geocat.net for details.
_________________________
Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net/&gt;

*

PS. I am on skype and irc if you want to chat.

Jesse

On Fri, Apr 27, 2012 at 9:14 AM, Jesse Eichar
<jesse.eichar@anonymised.com>wrote:

Ok,

I have had some problems with the harvesting scheduler so on geocat I have
changed to use quartz as the scheduling mechanism. It allows cron like
definitions of when to run as well as almost any other scheduling pattern
you can imagine. It looks like your solution is a simplified version of
cron scheduling.

That is fine but my concern is the database changes. My changes do not
require any changes to the database or any migrations since it just parses
the every field either as an interval or as a cron definition. I was just
working on making a proposal for merging the Quartz work to trunk. Do you
think we can work together on this?

Jesse

On Fri, Apr 27, 2012 at 9:10 AM, Jose Garcia <jose.garcia@anonymised.com>wrote:

Hi Jesse

I did for a custom project in 2.6.X and need only some small changes to
move to trunk.

Regards,
Jose García

On Fri, Apr 27, 2012 at 9:08 AM, Jesse Eichar <
jesse.eichar@anonymised.com> wrote:

Hi Jose,

Is the implementation already done?

Jesse

On Fri, Apr 27, 2012 at 8:55 AM, Jose Garcia <jose.garcia@anonymised.com>wrote:

Dear PSC members,

Please check the proposal
http://trac.osgeo.org/geonetwork/wiki/proposals/HarvestingSchedule for
voting.

Actual schedule for harvesters allow only to define an interval to run
periodically the harvesters. The main disadvantage is that this interval is
relative to the time the harvester is activated, being not possible to
define specific hour to run it.

This proposal modifies the harvesters schedule to be similar as Lucene
Index Optimizer schedule, allowing the user to define the initial hour to
run them and an interval to reschedule (from 1 h to 1 week).

A patch is going to be add beginning next week.

Thanks and regards,

Jose García

--
*
GeoCat Bridge for ArcGIS allows instant publishing of data and metadata
on GeoServer and GeoNetwork. Visit http://geocat.net for details.
_________________________
Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net/&gt;

*

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond.
Discussions
will include endpoint security, mobile security and the latest in
malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

--
*
GeoCat Bridge for ArcGIS allows instant publishing of data and metadata
on GeoServer and GeoNetwork. Visit http://geocat.net for details.
_________________________
Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net/&gt;

*

Hi Jose,

I really would like this option because I harvest from at least 23 nodes
around Australia. During that process I have noticed that if I am
harvesting too many nodes at one time the system runs out of memory so I
have to wait for one harvest to finish before I can start another.

It would be very nice to have the option to select many different nodes
to harvest and then run them sequentially. IE. when the harvesting of
one node finishes another will start until they have all the selected
nodes been harvested.

Have you considered adding the option to select many harvesting nodes,
add the option to sequentially harvest those nodes and also have the
option to start the harvesting process at a specific time?

Thanks.

John Hockaday

On Fri, 2012-04-27 at 08:55 +0200, Jose Garcia wrote:

Dear PSC members,

Please check the
proposal http://trac.osgeo.org/geonetwork/wiki/proposals/HarvestingSchedule for voting.
Actual schedule for harvesters allow only to define an interval to run
periodically the harvesters. The main disadvantage is that this
interval is relative to the time the harvester is activated, being not
possible to define specific hour to run it.

This proposal modifies the harvesters schedule to be similar as Lucene
Index Optimizer schedule, allowing the user to define the initial hour
to run them and an interval to reschedule (from 1 h to 1 week).

A patch is going to be add beginning next week.

Thanks and regards,

Jose García

--
GeoCat Bridge for ArcGIS allows instant publishing of data and
metadata on GeoServer and GeoNetwork. Visit http://geocat.net for
details.
_________________________
Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________ GeoNetwork-devel mailing list GeoNetwork-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/geonetwork-devel GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

I am not sure if it made it to this list, but I have implemented harvesting in Geocat on Quartz scheduling. It allows for limiting the number of concurrent threads, very detailed ways to control how jobs are rescheduled if too many tasks run at the same time. Many methods of controlling when exactly the jobs are executed and how they repeat. One of the things we are going to discuss this week is how to combine that with the work that Jose did.

jesse

On Tue, May 1, 2012 at 2:50 AM, john.hockaday <john.hockaday@anonymised.com> wrote:

Hi Jose,

I really would like this option because I harvest from at least 23 nodes
around Australia. During that process I have noticed that if I am
harvesting too many nodes at one time the system runs out of memory so I
have to wait for one harvest to finish before I can start another.

It would be very nice to have the option to select many different nodes
to harvest and then run them sequentially. IE. when the harvesting of
one node finishes another will start until they have all the selected
nodes been harvested.

Have you considered adding the option to select many harvesting nodes,
add the option to sequentially harvest those nodes and also have the
option to start the harvesting process at a specific time?

Thanks.

John Hockaday

On Fri, 2012-04-27 at 08:55 +0200, Jose Garcia wrote:

Dear PSC members,

Please check the
proposal http://trac.osgeo.org/geonetwork/wiki/proposals/HarvestingSchedule for voting.
Actual schedule for harvesters allow only to define an interval to run
periodically the harvesters. The main disadvantage is that this
interval is relative to the time the harvester is activated, being not
possible to define specific hour to run it.

This proposal modifies the harvesters schedule to be similar as Lucene
Index Optimizer schedule, allowing the user to define the initial hour
to run them and an interval to reschedule (from 1 h to 1 week).

A patch is going to be add beginning next week.

Thanks and regards,

Jose García


GeoCat Bridge for ArcGIS allows instant publishing of data and
metadata on GeoServer and GeoNetwork. Visit http://geocat.net for
details.


Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net


Live Security Virtual Conference
Exclusive live event will cover all the ways today’s security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________ GeoNetwork-devel mailing list GeoNetwork-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/geonetwork-devel GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork


Live Security Virtual Conference
Exclusive live event will cover all the ways today’s security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Hi Jesse,

I remember seeing your post on the Quartz scheduling. It would be very
nice to include that in the harvesting options along with what Jose is
doing.

Thanks for the prompt reply.

John

On Tue, 2012-05-01 at 07:23 +0200, Jesse Eichar wrote:

I am not sure if it made it to this list, but I have implemented
harvesting in Geocat on Quartz scheduling. It allows for limiting the
number of concurrent threads, very detailed ways to control how jobs
are rescheduled if too many tasks run at the same time. Many methods
of controlling when exactly the jobs are executed and how they
repeat. One of the things we are going to discuss this week is how to
combine that with the work that Jose did.

jesse

On Tue, May 1, 2012 at 2:50 AM, john.hockaday
<john.hockaday@anonymised.com> wrote:
        Hi Jose,
        
        I really would like this option because I harvest from at
        least 23 nodes
        around Australia. During that process I have noticed that if I
        am
        harvesting too many nodes at one time the system runs out of
        memory so I
        have to wait for one harvest to finish before I can start
        another.
        
        It would be very nice to have the option to select many
        different nodes
        to harvest and then run them sequentially. IE. when the
        harvesting of
        one node finishes another will start until they have all the
        selected
        nodes been harvested.
        
        Have you considered adding the option to select many
        harvesting nodes,
        add the option to sequentially harvest those nodes and also
        have the
        option to start the harvesting process at a specific time?
        
        Thanks.
        
        John Hockaday
        
        On Fri, 2012-04-27 at 08:55 +0200, Jose Garcia wrote:
        > Dear PSC members,
        >
        >
        > Please check the
        > proposal
        http://trac.osgeo.org/geonetwork/wiki/proposals/HarvestingSchedule for voting.
        > Actual schedule for harvesters allow only to define an
        interval to run
        > periodically the harvesters. The main disadvantage is that
        this
        > interval is relative to the time the harvester is activated,
        being not
        > possible to define specific hour to run it.
        >
        > This proposal modifies the harvesters schedule to be similar
        as Lucene
        > Index Optimizer schedule, allowing the user to define the
        initial hour
        > to run them and an interval to reschedule (from 1 h to 1
        week).
        >
        > A patch is going to be add beginning next week.
        >
        > Thanks and regards,
        >
        > Jose García
        >
        >
        >
        > --
        > GeoCat Bridge for ArcGIS allows instant publishing of data
        and
        > metadata on GeoServer and GeoNetwork. Visit
        http://geocat.net for
        > details.
        > _________________________
        > Jose García
        > GeoCat bv
        > Veenderweg 13
        > 6721 WD Bennekom
        > The Netherlands
        > http://GeoCat.net
        >
        >
        >
        >
        
        >
        ------------------------------------------------------------------------------
        > Live Security Virtual Conference
        > Exclusive live event will cover all the ways today's
        security and
        > threat landscape has changed and how IT managers can
        respond. Discussions
        > will include endpoint security, mobile security and the
        latest in malware
        > threats.
        http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
        > _______________________________________________
        GeoNetwork-devel mailing list
        GeoNetwork-devel@lists.sourceforge.net
        https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
        GeoNetwork OpenSource is maintained at
        http://sourceforge.net/projects/geonetwork
        
        ------------------------------------------------------------------------------
        Live Security Virtual Conference
        Exclusive live event will cover all the ways today's security
        and
        threat landscape has changed and how IT managers can respond.
        Discussions
        will include endpoint security, mobile security and the latest
        in malware
        threats.
        http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
        _______________________________________________
        GeoNetwork-devel mailing list
        GeoNetwork-devel@lists.sourceforge.net
        https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
        GeoNetwork OpenSource is maintained at
        http://sourceforge.net/projects/geonetwork

Dear PSC members,

The proposal http://trac.osgeo.org/geonetwork/wiki/proposals/HarvestingSchedule has been updated with Jesse additions to use use Quartz scheduler for harvesters. Also the configuration has been improved so now is possible to specify the days of week that a harvester should run. Thanks a lot to Jesse for improvements!

A patch to review is provided in http://trac.osgeo.org/geonetwork/ticket/772

Please, today we want to create branch for 2.8, so if you can vote for the proposal would be very nice to commit before creating the branch.

Thanks and regards,
Jose García

On Fri, Apr 27, 2012 at 8:55 AM, Jose Garcia <jose.garcia@anonymised.com> wrote:

Dear PSC members,

Please check the proposal http://trac.osgeo.org/geonetwork/wiki/proposals/HarvestingSchedule for voting.

Actual schedule for harvesters allow only to define an interval to run periodically the harvesters. The main disadvantage is that this interval is relative to the time the harvester is activated, being not possible to define specific hour to run it.

This proposal modifies the harvesters schedule to be similar as Lucene Index Optimizer schedule, allowing the user to define the initial hour to run them and an interval to reschedule (from 1 h to 1 week).

A patch is going to be add beginning next week.

Thanks and regards,

Jose García


GeoCat Bridge for ArcGIS allows instant publishing of data and metadata on GeoServer and GeoNetwork. Visit http://geocat.net for details.


Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net


GeoCat Bridge for ArcGIS allows instant publishing of data and metadata on GeoServer and GeoNetwork. Visit http://geocat.net for details.


Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net

Jose and Jesse,

I have two questions about this :

(1) it’s using Quartz scheduling library. Why ? Does this use any feature that is not already available in the excellent standard Java API (ScheduledFuture, etc.) ?

(2) with this, you can specify at which exact time to start a harvester job, no matter when GN was started up. But is it still possible to define a schedule interval of e.g. 26 hours, so that after the first run, the harvester runs 2 hours later each day ?

Kind regards
Heikki Doeleman

On Tue, May 15, 2012 at 11:58 AM, Jose Garcia <jose.garcia@…437…> wrote:

Dear PSC members,

The proposal http://trac.osgeo.org/geonetwork/wiki/proposals/HarvestingSchedule has been updated with Jesse additions to use use Quartz scheduler for harvesters. Also the configuration has been improved so now is possible to specify the days of week that a harvester should run. Thanks a lot to Jesse for improvements!

A patch to review is provided in http://trac.osgeo.org/geonetwork/ticket/772

Please, today we want to create branch for 2.8, so if you can vote for the proposal would be very nice to commit before creating the branch.

Thanks and regards,
Jose García

On Fri, Apr 27, 2012 at 8:55 AM, Jose Garcia <jose.garcia@anonymised.com> wrote:

Dear PSC members,

Please check the proposal http://trac.osgeo.org/geonetwork/wiki/proposals/HarvestingSchedule for voting.

Actual schedule for harvesters allow only to define an interval to run periodically the harvesters. The main disadvantage is that this interval is relative to the time the harvester is activated, being not possible to define specific hour to run it.

This proposal modifies the harvesters schedule to be similar as Lucene Index Optimizer schedule, allowing the user to define the initial hour to run them and an interval to reschedule (from 1 h to 1 week).

A patch is going to be add beginning next week.

Thanks and regards,

Jose García


GeoCat Bridge for ArcGIS allows instant publishing of data and metadata on GeoServer and GeoNetwork. Visit http://geocat.net for details.


Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net


GeoCat Bridge for ArcGIS allows instant publishing of data and metadata on GeoServer and GeoNetwork. Visit http://geocat.net for details.


Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net


Live Security Virtual Conference
Exclusive live event will cover all the ways today’s security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

On Tue, May 15, 2012 at 12:36 PM, heikki <tropicano@anonymised.com> wrote:

Jose and Jesse,

I have two questions about this :

(1) it’s using Quartz scheduling library. Why ? Does this use any feature that is not already available in the excellent standard Java API (ScheduledFuture, etc.) ?

Some benefits I considered:

  • It has many features including some that could be useful for clustering. For example you could define a case where only one server does the harvesting so that multiple servers are not trying to harvest.
  • It has extremely flexible configuration for when the schedules are executed.
  • It has a cron parser,
  • It is easy to externalize the configuration of the scheduler pools (again useful for clustering).
  • Since it is designed for clustering type environments so it has solutions for managing the data required by a Job even when the job may be run on a different instance.
  • It takes into account what should be done when there are not enough threads to handle all of the jobs that are being requested
  • It has a nice solution for adding listeners for notification of job status and execution
    There are several other features but those were the main ones that jumped out at me when I was researching which solution to use.

(2) with this, you can specify at which exact time to start a harvester job, no matter when GN was started up. But is it still possible to define a schedule interval of e.g. 26 hours, so that after the first run, the harvester runs 2 hours later each day ?

The UI allows this and of course the Quartz backend allows this and more.

Jesse

hi Jesse,

thanks for your answers.

As for Quartz, it would be nice to know if any of its clustering support can actually be used in GeoNetwork’s clustering implementation (where only a JMS provider knows which servers participate in the clustering).

As for the 26 hour interval example, according to Jose this is not possible in the GUI for this proposal (see also proposal page). This is in contrast to the current GUI in GeoNetwork, which allows this.

Though, says Jose, it could be re-instated. In my opinion, the possibility to define a user-chosen interval should remain, as I think certain users use such intervals to move the harvester run time up by an hour or two each day.

Kind regards,
Heikki Doeleman

On Tue, May 15, 2012 at 12:50 PM, Jesse Eichar <jesse.eichar@anonymised.com> wrote:

On Tue, May 15, 2012 at 12:36 PM, heikki <tropicano@anonymised.com> wrote:

Jose and Jesse,

I have two questions about this :

(1) it’s using Quartz scheduling library. Why ? Does this use any feature that is not already available in the excellent standard Java API (ScheduledFuture, etc.) ?

Some benefits I considered:

  • It has many features including some that could be useful for clustering. For example you could define a case where only one server does the harvesting so that multiple servers are not trying to harvest.
  • It has extremely flexible configuration for when the schedules are executed.
  • It has a cron parser,
  • It is easy to externalize the configuration of the scheduler pools (again useful for clustering).
  • Since it is designed for clustering type environments so it has solutions for managing the data required by a Job even when the job may be run on a different instance.
  • It takes into account what should be done when there are not enough threads to handle all of the jobs that are being requested
  • It has a nice solution for adding listeners for notification of job status and execution
    There are several other features but those were the main ones that jumped out at me when I was researching which solution to use.

(2) with this, you can specify at which exact time to start a harvester job, no matter when GN was started up. But is it still possible to define a schedule interval of e.g. 26 hours, so that after the first run, the harvester runs 2 hours later each day ?

The UI allows this and of course the Quartz backend allows this and more.

Jesse

On Tue, May 15, 2012 at 1:33 PM, heikki <tropicano@anonymised.com> wrote:

hi Jesse,

thanks for your answers.

As for Quartz, it would be nice to know if any of its clustering support can actually be used in GeoNetwork’s clustering implementation (where only a JMS provider knows which servers participate in the clustering).

As for the 26 hour interval example, according to Jose this is not possible in the GUI for this proposal (see also proposal page). This is in contrast to the current GUI in GeoNetwork, which allows this.

Oh, you are correct. I incorrectly remembered the UI as I didn’t work on it so much.

Though, says Jose, it could be re-instated. In my opinion, the possibility to define a user-chosen interval should remain, as I think certain users use such intervals to move the harvester run time up by an hour or two each day.

Kind regards,
Heikki Doeleman

On Tue, May 15, 2012 at 12:50 PM, Jesse Eichar <jesse.eichar@anonymised.com> wrote:

On Tue, May 15, 2012 at 12:36 PM, heikki <tropicano@anonymised.com> wrote:

Jose and Jesse,

I have two questions about this :

(1) it’s using Quartz scheduling library. Why ? Does this use any feature that is not already available in the excellent standard Java API (ScheduledFuture, etc.) ?

Some benefits I considered:

  • It has many features including some that could be useful for clustering. For example you could define a case where only one server does the harvesting so that multiple servers are not trying to harvest.
  • It has extremely flexible configuration for when the schedules are executed.
  • It has a cron parser,
  • It is easy to externalize the configuration of the scheduler pools (again useful for clustering).
  • Since it is designed for clustering type environments so it has solutions for managing the data required by a Job even when the job may be run on a different instance.
  • It takes into account what should be done when there are not enough threads to handle all of the jobs that are being requested
  • It has a nice solution for adding listeners for notification of job status and execution
    There are several other features but those were the main ones that jumped out at me when I was researching which solution to use.

(2) with this, you can specify at which exact time to start a harvester job, no matter when GN was started up. But is it still possible to define a schedule interval of e.g. 26 hours, so that after the first run, the harvester runs 2 hours later each day ?

The UI allows this and of course the Quartz backend allows this and more.

Jesse

Hi Jose and Jesse,
A +1 from me for this change.
Cheers,
Jeroen

On 15 mei 2012, at 11:58, Jose Garcia wrote:

Dear PSC members,

The proposal http://trac.osgeo.org/geonetwork/wiki/proposals/HarvestingSchedule has been updated with Jesse additions to use use Quartz scheduler for harvesters. Also the configuration has been improved so now is possible to specify the days of week that a harvester should run. Thanks a lot to Jesse for improvements!

A patch to review is provided in http://trac.osgeo.org/geonetwork/ticket/772

Please, today we want to create branch for 2.8, so if you can vote for the proposal would be very nice to commit before creating the branch.

Thanks and regards,
Jose García

On Fri, Apr 27, 2012 at 8:55 AM, Jose Garcia <jose.garcia@anonymised.com> wrote:

Dear PSC members,

Please check the proposal http://trac.osgeo.org/geonetwork/wiki/proposals/HarvestingSchedule for voting.

Actual schedule for harvesters allow only to define an interval to run periodically the harvesters. The main disadvantage is that this interval is relative to the time the harvester is activated, being not possible to define specific hour to run it.

This proposal modifies the harvesters schedule to be similar as Lucene Index Optimizer schedule, allowing the user to define the initial hour to run them and an interval to reschedule (from 1 h to 1 week).

A patch is going to be add beginning next week.

Thanks and regards,

Jose García


GeoCat Bridge for ArcGIS allows instant publishing of data and metadata on GeoServer and GeoNetwork. Visit http://geocat.net for details.


Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net


GeoCat Bridge for ArcGIS allows instant publishing of data and metadata on GeoServer and GeoNetwork. Visit http://geocat.net for details.


Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net


Live Security Virtual Conference
Exclusive live event will cover all the ways today’s security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

+1 for me.

Thanks Jesse & Jose.

Francois

2012/5/15 Jeroen Ticheler <jeroen.ticheler@anonymised.com>:

Hi Jose and Jesse,
A +1 from me for this change.
Cheers,
Jeroen

On 15 mei 2012, at 11:58, Jose Garcia wrote:

Dear PSC members,

The
proposal http://trac.osgeo.org/geonetwork/wiki/proposals/HarvestingSchedule has
been updated with Jesse additions to use use Quartz scheduler for
harvesters. Also the configuration has been improved so now is possible to
specify the days of week that a harvester should run. Thanks a lot to Jesse
for improvements!

A patch to review is provided in http://trac.osgeo.org/geonetwork/ticket/772

Please, today we want to create branch for 2.8, so if you can vote for the
proposal would be very nice to commit before creating the branch.

Thanks and regards,
Jose García

On Fri, Apr 27, 2012 at 8:55 AM, Jose Garcia <jose.garcia@anonymised.com> wrote:

Dear PSC members,

Please check the
proposal http://trac.osgeo.org/geonetwork/wiki/proposals/HarvestingSchedule
for voting.

Actual schedule for harvesters allow only to define an interval to run
periodically the harvesters. The main disadvantage is that this interval is
relative to the time the harvester is activated, being not possible to
define specific hour to run it.

This proposal modifies the harvesters schedule to be similar as Lucene
Index Optimizer schedule, allowing the user to define the initial hour to
run them and an interval to reschedule (from 1 h to 1 week).

A patch is going to be add beginning next week.

Thanks and regards,

Jose García

--
GeoCat Bridge for ArcGIS allows instant publishing of data and metadata on
GeoServer and GeoNetwork. Visit http://geocat.net for details.
_________________________
Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net

--
GeoCat Bridge for ArcGIS allows instant publishing of data and metadata on
GeoServer and GeoNetwork. Visit http://geocat.net for details.
_________________________
Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats.
http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

Jose & Jesse,

+1 from me

Cheers and thanks,
Simon
________________________________________
From: Jose Garcia [jose.garcia@anonymised.com]
Sent: Tuesday, 15 May 2012 7:58 PM
To: Devel geonetwork-devel@lists.sourceforge.net
Subject: Re: [GeoNetwork-devel] CFV: Vote to change harvesting schedule

Dear PSC members,

The proposal proposals/HarvestingSchedule – GeoNetwork opensource Developer website has been updated with Jesse additions to use use Quartz scheduler for harvesters. Also the configuration has been improved so now is possible to specify the days of week that a harvester should run. Thanks a lot to Jesse for improvements!

A patch to review is provided in #772 (Improve harvesters schedule (use similar configuration as for Lucene Index Optimizer schedule)) – GeoNetwork opensource Developer website

Please, today we want to create branch for 2.8, so if you can vote for the proposal would be very nice to commit before creating the branch.

Thanks and regards,
Jose García

On Fri, Apr 27, 2012 at 8:55 AM, Jose Garcia <jose.garcia@anonymised.com<mailto:jose.garcia@anonymised.com>> wrote:
Dear PSC members,

Please check the proposal proposals/HarvestingSchedule – GeoNetwork opensource Developer website for voting.

Actual schedule for harvesters allow only to define an interval to run periodically the harvesters. The main disadvantage is that this interval is relative to the time the harvester is activated, being not possible to define specific hour to run it.

This proposal modifies the harvesters schedule to be similar as Lucene Index Optimizer schedule, allowing the user to define the initial hour to run them and an interval to reschedule (from 1 h to 1 week).

A patch is going to be add beginning next week.

Thanks and regards,

Jose García

--
GeoCat Bridge for ArcGIS allows instant publishing of data and metadata on GeoServer and GeoNetwork. Visit http://geocat.net/&gt; for details.
_________________________
Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net/&gt;

--
GeoCat Bridge for ArcGIS allows instant publishing of data and metadata on GeoServer and GeoNetwork. Visit http://geocat.net/&gt; for details.
_________________________
Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net/&gt;

+1 for me.

   Ciao,
   Emanuele

Alle 11:58:32 di Tuesday 15 May 2012, Jose Garcia ha scritto:

Dear PSC members,

The proposal
http://trac.osgeo.org/geonetwork/wiki/proposals/HarvestingSchedule has been
updated with Jesse additions to use use Quartz scheduler for harvesters.
Also the configuration has been improved so now is possible to specify the
days of week that a harvester should run. Thanks a lot to Jesse for
improvements!

A patch to review is provided in
http://trac.osgeo.org/geonetwork/ticket/772

Please, today we want to create branch for 2.8, so if you can vote for the
proposal would be very nice to commit before creating the branch.

Thanks and regards,
Jose García

On Fri, Apr 27, 2012 at 8:55 AM, Jose Garcia <jose.garcia@anonymised.com> wrote:
> Dear PSC members,
>
> Please check the proposal
> http://trac.osgeo.org/geonetwork/wiki/proposals/HarvestingSchedule for
> voting.
>
> Actual schedule for harvesters allow only to define an interval to run
> periodically the harvesters. The main disadvantage is that this interval
> is relative to the time the harvester is activated, being not possible
> to define specific hour to run it.
>
> This proposal modifies the harvesters schedule to be similar as Lucene
> Index Optimizer schedule, allowing the user to define the initial hour to
> run them and an interval to reschedule (from 1 h to 1 week).
>
> A patch is going to be add beginning next week.
>
> Thanks and regards,
>
> Jose García
>
> --
> *
> GeoCat Bridge for ArcGIS allows instant publishing of data and metadata
> on GeoServer and GeoNetwork. Visit http://geocat.net for details.
> _________________________
> Jose García
> GeoCat bv
> Veenderweg 13
> 6721 WD Bennekom
> The Netherlands
> http://GeoCat.net/&gt;
>
> *

--
-------------------------------------------------------
Ing. Emanuele Tajariol
Senior Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584962313
fax: +39 0584962313
mob: +39 3802116282

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://twitter.com/geosolutions_it
-------------------------------------------------------