Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvesting: Harvesting clients never complete harvesting of updates of at least 2 smaller sites. #4548

Closed
kcondon opened this issue Mar 28, 2018 · 16 comments

Comments

@kcondon
Copy link
Contributor

kcondon commented Mar 28, 2018

This is happening in the new Harvard AWS production architecture.

Harvesting of TDR and Madrono do not complete and appear to hang or pause and never complete according to both the in progress status in the UI and the harvest logs. Harvesting of both these sites works relatively quickly (about 5 minutes or less) on the test environment. Is this a network connectivity issue of some type related to the new architecture?

Update: both clients process 10, and 11 identifiers before halting in 27 and 19 seconds, respectively.
Update 2: though these hanged jobs were run manually, successful initial harvests for tdr on 02-05 and 02-06 were scheduled and logged on both web instances. (Note: this is from pre migration testing and logs were cloned)
Update 3: the harvest from date in 03-26 log is 02-14 but there is no 02-14 harvest log on either instance. (Note: this is because 02-14 harvest occurred pre migration and log was not copied over).
Update4:
The last successful harvest of tdr, according to existing log files on cloud1 &2, was on 02-06-18 and resulted in 0 datasets. Directly requesting list identifiers from that date results in 100 identifier changes (new/delete) and 2 subsequent harvests since 02-06 resulted in getting, 10 (9 new, 1 delete), and 10 (9 new and 1 delete) from 03-26 and 03-27. (Note: the from date may actually be 02-14 because it appears as the from date in the 03-26 log. It does not appear on the AWS servers because harvest occurred pre migration and harvest log was not copied over. The list identifiers command from 02-14 still shows 100 identifiers)
Update 5: the madrono list identifiers also shows 100 identifiers available from last harvest of 02-11-18
https://edatos.consorciomadrono.es/oai?verb=ListIdentifiers&from=2018-02-11&metadataPrefix=oai_ddi&set=openaire_data

Here is the tdr list identifiers list:
https://dataverse.tdl.org/oai?verb=ListIdentifiers&from=2018-02-06&metadataPrefix=oai_ddi&set=TDR

2018-03-28T16:20:27Z
https://dataverse.tdl.org/oai

doi:10.18738/T8/0H1MWK 2018-03-08T08:00:02Z TDR doi:10.18738/T8/0SL6NE 2018-03-18T08:00:02Z TDR doi:10.18738/T8/0VXSGP 2018-03-18T08:00:02Z TDR doi:10.18738/T8/1HXBCV 2018-03-08T08:00:02Z TDR doi:10.18738/T8/1NPB8F 2018-03-18T08:00:02Z TDR doi:10.18738/T8/33NF9Y 2018-03-08T08:00:02Z TDR doi:10.18738/T8/3TTOWS 2018-02-14T08:00:02Z TDR doi:10.18738/T8/4813VQ 2018-03-18T08:00:02Z TDR doi:10.18738/T8/4Z1GAK 2018-02-14T08:00:02Z TDR doi:10.18738/T8/5NTI0C 2018-02-14T08:00:02Z TDR doi:10.18738/T8/5O59LD 2018-03-08T08:00:02Z TDR doi:10.18738/T8/6CO7XD 2018-03-07T08:00:01Z TDR doi:10.18738/T8/7BTM0D 2018-03-13T08:00:02Z TDR doi:10.18738/T8/7CUCM8 2018-03-18T08:00:02Z TDR doi:10.18738/T8/7YRHOS 2018-02-14T08:00:02Z TDR doi:10.18738/T8/8KJA6T 2018-02-14T08:00:02Z TDR doi:10.18738/T8/8RUKOV 2018-03-08T08:00:02Z TDR doi:10.18738/T8/9CC4KZ 2018-03-18T08:00:02Z TDR doi:10.18738/T8/9EBL3A 2018-03-10T08:00:02Z TDR doi:10.18738/T8/9ON0VB 2018-03-18T08:00:02Z TDR doi:10.18738/T8/9UXITA 2018-02-23T08:00:02Z TDR doi:10.18738/T8/9VYDE0 2018-03-08T08:00:02Z TDR doi:10.18738/T8/9XV8WN 2018-03-18T08:00:02Z TDR doi:10.18738/T8/A6DIXF 2018-02-14T08:00:02Z TDR doi:10.18738/T8/A8DMKB 2018-02-14T08:00:02Z TDR doi:10.18738/T8/ACWIR0 2018-03-08T08:00:02Z TDR doi:10.18738/T8/AMBBEZ 2018-02-13T08:00:02Z TDR doi:10.18738/T8/ASJ3RA 2018-03-18T08:00:02Z TDR doi:10.18738/T8/ASU4TG 2018-03-18T08:00:02Z TDR doi:10.18738/T8/BBC71O 2018-02-14T08:00:02Z TDR doi:10.18738/T8/BGPA1O 2018-03-18T08:00:02Z TDR doi:10.18738/T8/BS0G9T 2018-02-14T08:00:02Z TDR doi:10.18738/T8/BXB2BI 2018-02-14T08:00:02Z TDR doi:10.18738/T8/C4EQ3W 2018-02-14T08:00:02Z TDR doi:10.18738/T8/CKRKJJ 2018-02-14T08:00:02Z TDR doi:10.18738/T8/CRBUOB 2018-03-09T08:00:02Z TDR doi:10.18738/T8/CRIWDF 2018-02-14T08:00:02Z TDR doi:10.18738/T8/CRLHCI 2018-03-18T08:00:02Z TDR doi:10.18738/T8/CXFO4Y 2018-03-08T08:00:02Z TDR doi:10.18738/T8/EA4GZC 2018-02-14T08:00:02Z TDR doi:10.18738/T8/ELAIH3 2018-02-28T08:00:02Z TDR doi:10.18738/T8/F1IDM3 2018-03-18T08:00:02Z TDR doi:10.18738/T8/FCZMRZ 2018-03-18T08:00:02Z TDR doi:10.18738/T8/FKTPIP 2018-03-18T08:00:02Z TDR doi:10.18738/T8/GGE2PG 2018-03-18T08:00:02Z TDR doi:10.18738/T8/GVL6ZD 2018-03-18T08:00:02Z TDR doi:10.18738/T8/H2RTCA 2018-03-08T08:00:02Z TDR doi:10.18738/T8/H77VLX 2018-02-14T08:00:02Z TDR doi:10.18738/T8/HJYOT5 2018-02-14T08:00:02Z TDR doi:10.18738/T8/HKHQGP 2018-02-14T08:00:02Z TDR doi:10.18738/T8/IAPEY0 2018-02-13T08:00:02Z TDR doi:10.18738/T8/IQT2DF 2018-03-18T08:00:02Z TDR doi:10.18738/T8/K2JAWP 2018-02-14T08:00:02Z TDR doi:10.18738/T8/K6YBV4 2018-03-28T08:00:02Z TDR doi:10.18738/T8/KGXMXS 2018-02-13T08:00:02Z TDR doi:10.18738/T8/L1RG0K 2018-03-08T08:00:02Z TDR doi:10.18738/T8/LHK7DY 2018-03-08T08:00:02Z TDR doi:10.18738/T8/LWKNHE 2018-03-18T08:00:02Z TDR doi:10.18738/T8/LXRZZ2 2018-03-08T08:00:02Z TDR doi:10.18738/T8/MBRE4N 2018-02-14T08:00:02Z TDR doi:10.18738/T8/N6XE5X 2018-02-14T08:00:02Z TDR doi:10.18738/T8/NBFBQT 2018-03-08T08:00:02Z TDR doi:10.18738/T8/NJJAYV 2018-02-14T08:00:02Z TDR doi:10.18738/T8/NNLYVC 2018-02-23T08:00:02Z TDR doi:10.18738/T8/O3HV9R 2018-03-18T08:00:02Z TDR doi:10.18738/T8/OJX6UE 2018-03-08T08:00:02Z TDR doi:10.18738/T8/OXVTA5 2018-02-14T08:00:02Z TDR doi:10.18738/T8/P66RQH 2018-02-14T08:00:02Z TDR doi:10.18738/T8/PJYL8P 2018-03-08T08:00:02Z TDR doi:10.18738/T8/PLT37J 2018-02-14T08:00:02Z TDR doi:10.18738/T8/PQ8UMS 2018-03-08T08:00:02Z TDR doi:10.18738/T8/QKPPM1 2018-02-14T08:00:02Z TDR doi:10.18738/T8/R6RODU 2018-02-14T08:00:02Z TDR doi:10.18738/T8/RLTN4P 2018-03-14T08:00:02Z TDR doi:10.18738/T8/RREJJF 2018-03-18T08:00:02Z TDR doi:10.18738/T8/RVDCIW 2018-02-14T08:00:02Z TDR doi:10.18738/T8/SHYF4J 2018-03-08T08:00:02Z TDR doi:10.18738/T8/SU8RF3 2018-03-08T08:00:02Z TDR doi:10.18738/T8/SVMBKS 2018-03-18T08:00:02Z TDR doi:10.18738/T8/ULLYCX 2018-03-08T08:00:02Z TDR doi:10.18738/T8/V0W5JE 2018-03-18T08:00:02Z TDR doi:10.18738/T8/V6B9VA 2018-02-09T08:00:02Z TDR doi:10.18738/T8/VCUJQR 2018-03-18T08:00:02Z TDR doi:10.18738/T8/VNO1Q6 2018-03-18T08:00:02Z TDR doi:10.18738/T8/VXATC2 2018-02-14T08:00:02Z TDR doi:10.18738/T8/WE2CRT 2018-03-18T08:00:02Z TDR doi:10.18738/T8/WVJ33R 2018-02-14T08:00:02Z TDR doi:10.18738/T8/XALAED 2018-02-14T08:00:02Z TDR doi:10.18738/T8/Y95FWM 2018-03-02T08:00:02Z TDR doi:10.18738/T8/Y9LQQC 2018-02-23T08:00:01Z TDR doi:10.18738/T8/YGGQ10 2018-02-14T08:00:02Z TDR doi:10.18738/T8/YIRZN9 2018-03-18T08:00:02Z TDR doi:10.18738/T8/YSSPV0 2018-02-26T08:00:02Z TDR doi:10.18738/T8/YUIGHP 2018-02-14T08:00:02Z TDR doi:10.18738/T8/YXFPMD 2018-03-08T08:00:02Z TDR doi:10.18738/T8/Z4WIUV 2018-03-18T08:00:02Z TDR doi:10.18738/T8/ZF8QQZ 2018-02-14T08:00:02Z TDR doi:10.18738/T8/ZHBD9I 2018-03-18T08:00:02Z TDR doi:10.18738/T8/ZKKW1L 2018-03-08T08:00:02Z TDR doi:10.18738/T8/ZQZNQG 2018-03-18T08:00:02Z TDR
@matthew-a-dunlap
Copy link
Contributor

@kcondon Did you see any log entries that may indicate why the harvest is failing?

@kcondon
Copy link
Contributor Author

kcondon commented May 14, 2018

@matthew-a-dunlap Everything I know is in the ticket.

@juancorr
Copy link
Contributor

juancorr commented May 15, 2018 via email

@pdurbin
Copy link
Member

pdurbin commented May 15, 2018

@juancorr sounds good since you're looking at pull request #4664 anyway. Thanks!

@juancorr
Copy link
Contributor

@matthew-a-dunlap and @kcondon, there is a bug in oai requests with deleted (deaccessioned) records. We have a test installation in http://oaimadrono.uned.es:8080 . The view-source:http://oaimadrono.uned.es:8080/oai?verb=ListIdentifiers&metadataPrefix=oai_dc&set=openaire_data requests return:

2018-05-17T08:15:07Zhttps://oaimadrono.uned.es/oai

doi:10.21950/05CZKM2018-05-15T00:00:01Zopenaire_data<oai_dc:dc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">dc:titleFederico-Tena World Trade Historical Database : Asia</dc:title>
[..... All ok until first deleted record]
doi:10.21950/1CMNK72018-03-22T14:13:04Zopenaire_data

In first deleted record, xml is finished because Dataverse try insert metadata and obtain an OAIServlet exception:

[2018-05-17T10:15:09.189+0200] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.dataverse.harvest.server.web.servlet.OAIServlet] [tid: _ThreadID=29 _ThreadName=http-listener-1(3)] [timeMillis: 1526544909189] [levelValue: 900] [[
IO exception in Get; Xrecord: failed to open metadata stream.]]

[2018-05-17T10:15:09.192+0200] [glassfish 4.1] [WARNING] [] [javax.enterprise.web] [tid: _ThreadID=29 _ThreadName=http-listener-1(3)] [timeMillis: 1526544909192] [levelValue: 900] [[
StandardWrapperValve[OAIServlet]: Servlet.service() for servlet OAIServlet threw exception
javax.servlet.ServletException: IO Exception in Get
at edu.harvard.iq.dataverse.harvest.server.web.servlet.OAIServlet.processRequest(OAIServlet.java:263)
at edu.harvard.iq.dataverse.harvest.server.web.servlet.OAIServlet.doGet(OAIServlet.java:223)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.apache.catalina.core.StandardWrapper.service(StandardWrapper.java:1682)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:344)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
at org.glassfish.tyrus.servlet.TyrusServletFilter.doFilter(TyrusServletFilter.java:295)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
at org.ocpsoft.rewrite.servlet.RewriteFilter.doFilter(RewriteFilter.java:205)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:316)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:160)
at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:734)
at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:673)
at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:99)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:174)
at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:415)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:282)
at com.sun.enterprise.v3.services.impl.ContainerMapper$HttpHandlerCallable.call(ContainerMapper.java:459)
at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:167)
at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler.java:201)
at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:175)
at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpServerFilter.java:235)
at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:119)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:284)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:201)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:133)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:112)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:561)
at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:112)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:117)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:56)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:137)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:565)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:545)
at java.lang.Thread.run(Thread.java:748)
]]

[2018-05-17T10:20:49.024+0200] [glassfish 4.1] [FINE] [] [edu.harvard.iq.dataverse.harvest.server.xoai.XsetRepository] [tid: _ThreadID=31 _ThreadName=http-listener-1(5)] [timeMillis: 1526545249024] [levelValue: 500] [CLASSNAME: edu.harvard.iq.dataverse.harvest.server.xoai.XsetRepository] [METHODNAME: supportSets] [[
calling supportSets()]]

These tests have been done in a dataverse v4.8.5 release without source modifications.

@juancorr
Copy link
Contributor

I have checked the ListRecords request with v4.8.6. and develop branchs and I obtain the same exception.

@juancorr
Copy link
Contributor

XrecordDeleted.patch.txt

Here is a patch for the edu/harvard/iq/dataverse/harvest/server/xoai/Xrecord.java file that can repair the bug

@pdurbin
Copy link
Member

pdurbin commented May 18, 2018

@juancorr great! Do you mind making that patch into a pull request?

juancorr added a commit to Consorcio-Madrono/dataverse that referenced this issue May 18, 2018
@juancorr
Copy link
Contributor

@pdurbin , I have created my first pull request to Dataverse . If its correct, Monday I will create the other for the #4664

@pdurbin
Copy link
Member

pdurbin commented May 18, 2018

@juancorr great! Thanks for making pull request #4691! I'll drag it to Code Review at https://waffle.io/IQSS/dataverse

@pdurbin
Copy link
Member

pdurbin commented May 18, 2018

At standup I offered to review the pull request and left a comment at #4691 (review) but I'd like @landreev to take a look. Others too, if they feel like it. While I was in there I noticed that we're still using a deprecated method related to #4683 so I'm wondering if we should address that issue now as well. Here's a screenshot of the deprecated method:

screen shot 2018-05-18 at 12 15 46 pm

@pdurbin pdurbin removed their assignment May 18, 2018
pdurbin added a commit to Consorcio-Madrono/dataverse that referenced this issue May 18, 2018
@pdurbin
Copy link
Member

pdurbin commented May 18, 2018

@landreev and I took a look at the pull request and I added some javadoc in 36837f6 to clarify why the fix works, that header.getStatus() is only non-null when it's indicating "deleted". We agreed to go ahead and move this to QA.

@djbrooke
Copy link
Contributor

Great! Thanks @juancorr for the PR!

@kcondon kcondon self-assigned this May 18, 2018
@kcondon
Copy link
Contributor Author

kcondon commented May 18, 2018

I have tried reproducing the problem before and after the patch but was unable to do so, consistent with my initial testing. I am not sure why but harvest works in the patch so I will merge it. Thanks for the help.

kcondon added a commit that referenced this issue May 18, 2018
@juancorr
Copy link
Contributor

Hi @kcondon , you can reproduce the exception with the next steps. You can start with a Dataverse installation with a Harvester server with less than one hundred dataverses in a set and at least two datasets.

  1. Go to the ListRecords URL in your server. Ex: https://edatos.consorciomadrono.es/oai?verb=ListRecords&metadataPrefix=oai_dc&set=openaire_data
  2. Select a dataset from the oai response and deaccession it.
  3. Execute the "Run Export" action from the "Manage Harvesting Server" page
  4. Go to the ListRecords URL again.

Last ListRecords page ends with a not closed label and not with
You will have the javax.servlet.ServletException in the server.log traces too

@kcondon
Copy link
Contributor Author

kcondon commented May 21, 2018

Thanks @juancorr, I was able to reproduce the bug and saw that it was fixed using your instructions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants