[GeoNetwork-devel] [GeoNetwork opensource Developer website] #995: Exception thrown in child thread causes GN to stall?

#995: Exception thrown in child thread causes GN to stall?
----------------------+-----------------------------------------------------
Reporter: simonp | Owner: geonetwork-devel@…
     Type: defect | Status: new
Priority: critical | Milestone: v2.8.0 RC0
Component: General | Version: v2.8.0RC0
Keywords: |
----------------------+-----------------------------------------------------
Seems that if an exception is thrown in a child thread (eg. harvester or
when rebuilding thesauri on startup), GN may stall ie. become unresponsive
because services hang on dispatch. More research required - has anyone
else noticed this?

--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/995&gt;
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/&gt;
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.

#995: Exception thrown in child thread causes GN to stall?
-----------------------+----------------------------------------------------
  Reporter: simonp | Owner: geonetwork-devel@…
      Type: defect | Status: closed
  Priority: critical | Milestone: v2.8.0 RC0
Component: General | Version: v2.8.0RC0
Resolution: invalid | Keywords:
-----------------------+----------------------------------------------------
Changes (by simonp):

  * status: new => closed
  * resolution: => invalid

Comment:

Turns out that this is not due to GN and also not due to the exception in
the child thread.

The symptoms in more detail are that GN would either stall in one service
or become entirely unresponsive - services would dispatch but hang - kill
-3 on the java process id shows stack traces of all threads which seemed
to be deadlocked during 'parking'. A consistent problem was 'batch import'
service - util.import: stack traces showed threads related to dbms and
futures trying to 'park'.

This turns this was happening on just one Linux amd64 machine I had access
to which was running ubuntu 10.10 (kernel 2.6.32-41-server) with GN using
an Oracle db. It doesn't occur on other machines I've tested running later
versions of ubuntu and linux kernel (eg. 12.04 and 3.0.29) - jdk 1.6
(version doesn't seem to matter - tried all the most recent ones from
1.6.0_30 onwards).

I was able to work around this by altering util.import to use a single
threaded approach so I guess the problem lies not in GN but in the 1.6 JVM
or most likely in the linux kernel being used as it doesn't happen on
later versions linux with 1.6 JVM. I'm closing this as invalid and
describing the symptoms in case anyone comes across anything similar.

--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/995#comment:1&gt;
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/&gt;
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.

#995: Exception thrown in child thread causes GN to stall?
-----------------------+----------------------------------------------------
  Reporter: simonp | Owner: geonetwork-devel@…
      Type: defect | Status: closed
  Priority: critical | Milestone: v2.8.0 RC0
Component: General | Version: v2.8.0RC0
Resolution: invalid | Keywords:
-----------------------+----------------------------------------------------

Comment(by landry):

I think i've reproduced similar symptoms, but in a different usecase : ie
a user hitting 'save and validate'.

In certain circumstances, two threads are fighting for the data access.

- this one is waiting to do the validation :
"http-9080-9" daemon prio=10 tid=0x000000004204c000 nid=0x5da3 waiting for
monitor entry [0x00007fd5349c1000]
  java.lang.Thread.State: BLOCKED (on object monitor)
     at
org.fao.geonet.kernel.DataManager.getXSDXmlReport(DataManager.java:926)
     waiting to lock <0x00000000aa0cfcb0> (at
org.fao.geonet.kernel.DataManager)
     at org.fao.geonet.kernel.DataManager.doValidate(DataManager.java:1874)
     at
org.fao.geonet.services.metadata.AjaxEditUtils.validateMetadataEmbedded(AjaxEditUtils.java:576)
     at org.fao.geonet.services.metadata.Validate.exec(Validate.java:76)

- this one took a lock in updateMetadata/updateFixedInfo, and seems to
never return.

   "http-9080-3" daemon prio=10 tid=0x0000000041f42800 nid=0x5d1c runnable
[0x00007fd5359b5000]
      java.lang.Thread.State: RUNNABLE
           at java.lang.Throwable.fillInStackTrace(Native Method)
           - locked <0x00000000a0e15198> (a
java.util.ConcurrentModificationException)
           at java.lang.Throwable.<init>(Throwable.java:181)
           at java.lang.Exception.<init>(Exception.java:29)
           at java.lang.RuntimeException.<init>(RuntimeException.java:32)
......
           at
net.sf.saxon.instruct.ApplyTemplates.applyTemplates(ApplyTemplates.java:333)
           at
net.sf.saxon.Controller.transformDocument(Controller.java:1807)
           at net.sf.saxon.Controller.transform(Controller.java:1621)
           at jeeves.utils.Xml.transform(Xml.java:477)
           at jeeves.utils.Xml.transform(Xml.java:362)
           at
org.fao.geonet.kernel.DataManager.updateFixedInfo(DataManager.java:2590)
           at
org.fao.geonet.kernel.DataManager.updateMetadata(DataManager.java:1716)
           - locked <0x00000000aa0cfcb0> (a
org.fao.geonet.kernel.DataManager)
           at
org.fao.geonet.services.metadata.EditUtils.updateContent(EditUtils.java:171)
           at
org.fao.geonet.services.metadata.AjaxEditUtils.updateContent(AjaxEditUtils.java:34)
           at org.fao.geonet.services.metadata.Update.exec(Update.java:114)
           at
jeeves.server.dispatchers.ServiceInfo.execService(ServiceInfo.java:230)
           at
jeeves.server.dispatchers.ServiceInfo.execServices(ServiceInfo.java:139)

At that point, geonetwork takes 100% cpu, and a tomcat restart is
mandatory. I'll dig a bit more to figure out why updateFixedInfo timeouts,
this might be a problem in my schemas..

--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/995#comment:2&gt;
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/&gt;
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.

#995: Exception thrown in child thread causes GN to stall?
-----------------------+----------------------------------------------------
  Reporter: simonp | Owner: geonetwork-devel@…
      Type: defect | Status: closed
  Priority: critical | Milestone: v2.8.0 RC0
Component: General | Version: v2.8.0RC0
Resolution: invalid | Keywords:
-----------------------+----------------------------------------------------

Comment(by landry):

I'm not proficient in java programming, but the full trace of the thread
doing updatedFixedInfo looks bizarre :

{{{
"http-9080-3" daemon prio=10 tid=0x0000000041f42800 nid=0x5d1c runnable
[0x00007fd5359b5000]
    java.lang.Thread.State: RUNNABLE
         at java.lang.Throwable.fillInStackTrace(Native Method)
         - locked <0x00000000a0703200> (a
java.util.ConcurrentModificationException)
         at java.lang.Throwable.<init>(Throwable.java:181)
         at java.lang.Exception.<init>(Exception.java:29)
         at java.lang.RuntimeException.<init>(RuntimeException.java:32)
         at
java.util.ConcurrentModificationException.<init>(ConcurrentModificationException.java:57)
         at
java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
         at java.util.TreeMap$ValueIterator.next(TreeMap.java:1145)
         at
eu.medsea.mimeutil.MimeDetectorRegistry.getMimeTypes(MimeUtil2.java:1033)
         at eu.medsea.mimeutil.MimeUtil2.getMimeTypes(MimeUtil2.java:428)
         at eu.medsea.mimeutil.MimeUtil2.getMimeTypes(MimeUtil2.java:395)
         at eu.medsea.mimeutil.MimeUtil.getMimeTypes(MimeUtil.java:281)
         at
org.fao.geonet.util.MimeTypeFinder.detectMimeTypeFile(MimeTypeFinder.java:78)
         at sun.reflect.GeneratedMethodAccessor278.invoke(Unknown Source)
         at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
.....
lots of saxon internals
.....

         at net.sf.saxon.Controller.transformDocument(Controller.java:1807)
         at net.sf.saxon.Controller.transform(Controller.java:1621)
         at jeeves.utils.Xml.transform(Xml.java:477)
         at jeeves.utils.Xml.transform(Xml.java:362)
         at
org.fao.geonet.kernel.DataManager.updateFixedInfo(DataManager.java:2590)
         at
org.fao.geonet.kernel.DataManager.updateMetadata(DataManager.java:1716)
         - locked <0x00000000aa0cfcb0> (a
org.fao.geonet.kernel.DataManager)
         at
org.fao.geonet.services.metadata.EditUtils.updateContent(EditUtils.java:171)
         at
org.fao.geonet.services.metadata.AjaxEditUtils.updateContent(AjaxEditUtils.java:34)
         at org.fao.geonet.services.metadata.Update.exec(Update.java:114)
}}}

The thread calls in saxon while having the lock, then comes back to
geonetwork in org.fao.geonet.util.MimeTypeFinder.detectMimeTypeFile. Is it
safe, thread-wise ?

It also triggers a ConcurrentModificationException(), but i cant figure
out if its deadlocking with itself or with the other thread.

--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/995#comment:3&gt;
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/&gt;
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.

#995: Exception thrown in child thread causes GN to stall?
-----------------------------+----------------------------------------------
  Reporter: simonp | Owner: geonetwork-devel@…
      Type: defect | Status: reopened
  Priority: critical | Milestone: v2.9.0
Component: Catalog server | Version: v2.8.0
Resolution: | Keywords:
-----------------------------+----------------------------------------------
Changes (by landry):

  * status: closed => reopened
  * resolution: invalid =>
  * version: v2.8.0RC0 => v2.8.0
  * component: General => Catalog server
  * milestone: v2.8.0 RC0 => v2.9.0

Comment:

Reopening, since i had that issue again yesterday on 2.8.x. Same
conditions, xml.metadata.validate service never returning and eating 100%
cpu.

--
Ticket URL: <http://trac.osgeo.org/geonetwork/ticket/995#comment:4&gt;
GeoNetwork opensource Developer website <http://sourceforge.net/projects/geonetwork/&gt;
GeoNetwork opensource is a standards based, Free and Open Source catalog application to manage spatially referenced resources through the web. It provides powerful metadata editing and search functions as well as an embedded interactive web map viewer. This website contains information related to the development of the software.