-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Harvesting: Harvesting clients never complete harvesting of updates of at least 2 smaller sites. #4548
Comments
@kcondon Did you see any log entries that may indicate why the harvest is failing? |
@matthew-a-dunlap Everything I know is in the ticket. |
Hi @matthew-a-dunlap <https://github.com/matthew-a-dunlap> and @kcondon
<https://github.com/kcondon>,
In Madrono (e-cienciaDatos) there was a problem with deleted
(deaccessioned) records. The oai requests obtained a null pointer
exception. We had done some changes to complain OpenAIRE requirements and
we do not know if we have create the bug, or the bug is part of original
Dataverse Code. A month ago we have repaired the bug.
Tomorrow we will install the development Dataverse branch with the new
OpenAIRE dataverse code to check it and we will check if we obtain the null
pointer exception without our code changes.
Juan Corrales
2018-05-14 20:02 GMT+02:00 Kevin Condon <[email protected]>:
… @matthew-a-dunlap <https://github.com/matthew-a-dunlap> Everything I know
is in the ticket.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#4548 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAT5CNRMf-zsA0QYcGX0p-gRFbu7vriMks5tycbIgaJpZM4S-0oE>
.
|
@matthew-a-dunlap and @kcondon, there is a bug in oai requests with deleted (deaccessioned) records. We have a test installation in http://oaimadrono.uned.es:8080 . The view-source:http://oaimadrono.uned.es:8080/oai?verb=ListIdentifiers&metadataPrefix=oai_dc&set=openaire_data requests return: 2018-05-17T08:15:07Zhttps://oaimadrono.uned.es/oai doi:10.21950/05CZKM2018-05-15T00:00:01Zopenaire_data<oai_dc:dc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">dc:titleFederico-Tena World Trade Historical Database : Asia</dc:title>[..... All ok until first deleted record] doi:10.21950/1CMNK72018-03-22T14:13:04Zopenaire_data In first deleted record, xml is finished because Dataverse try insert metadata and obtain an OAIServlet exception: [2018-05-17T10:15:09.189+0200] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.dataverse.harvest.server.web.servlet.OAIServlet] [tid: _ThreadID=29 _ThreadName=http-listener-1(3)] [timeMillis: 1526544909189] [levelValue: 900] [[ [2018-05-17T10:15:09.192+0200] [glassfish 4.1] [WARNING] [] [javax.enterprise.web] [tid: _ThreadID=29 _ThreadName=http-listener-1(3)] [timeMillis: 1526544909192] [levelValue: 900] [[ [2018-05-17T10:20:49.024+0200] [glassfish 4.1] [FINE] [] [edu.harvard.iq.dataverse.harvest.server.xoai.XsetRepository] [tid: _ThreadID=31 _ThreadName=http-listener-1(5)] [timeMillis: 1526545249024] [levelValue: 500] [CLASSNAME: edu.harvard.iq.dataverse.harvest.server.xoai.XsetRepository] [METHODNAME: supportSets] [[ These tests have been done in a dataverse v4.8.5 release without source modifications. |
I have checked the ListRecords request with v4.8.6. and develop branchs and I obtain the same exception. |
Here is a patch for the edu/harvard/iq/dataverse/harvest/server/xoai/Xrecord.java file that can repair the bug |
@juancorr great! Do you mind making that patch into a pull request? |
@juancorr great! Thanks for making pull request #4691! I'll drag it to Code Review at https://waffle.io/IQSS/dataverse |
At standup I offered to review the pull request and left a comment at #4691 (review) but I'd like @landreev to take a look. Others too, if they feel like it. While I was in there I noticed that we're still using a deprecated method related to #4683 so I'm wondering if we should address that issue now as well. Here's a screenshot of the deprecated method: |
Great! Thanks @juancorr for the PR! |
I have tried reproducing the problem before and after the patch but was unable to do so, consistent with my initial testing. I am not sure why but harvest works in the patch so I will merge it. Thanks for the help. |
Hi @kcondon , you can reproduce the exception with the next steps. You can start with a Dataverse installation with a Harvester server with less than one hundred dataverses in a set and at least two datasets.
Last ListRecords page ends with a not closed label and not with |
Thanks @juancorr, I was able to reproduce the bug and saw that it was fixed using your instructions. |
This is happening in the new Harvard AWS production architecture.
Harvesting of TDR and Madrono do not complete and appear to hang or pause and never complete according to both the in progress status in the UI and the harvest logs. Harvesting of both these sites works relatively quickly (about 5 minutes or less) on the test environment. Is this a network connectivity issue of some type related to the new architecture?
Update: both clients process 10, and 11 identifiers before halting in 27 and 19 seconds, respectively.
Update 2: though these hanged jobs were run manually, successful initial harvests for tdr on 02-05 and 02-06 were scheduled and logged on both web instances. (Note: this is from pre migration testing and logs were cloned)
Update 3: the harvest from date in 03-26 log is 02-14 but there is no 02-14 harvest log on either instance. (Note: this is because 02-14 harvest occurred pre migration and log was not copied over).
Update4:
The last successful harvest of tdr, according to existing log files on cloud1 &2, was on 02-06-18 and resulted in 0 datasets. Directly requesting list identifiers from that date results in 100 identifier changes (new/delete) and 2 subsequent harvests since 02-06 resulted in getting, 10 (9 new, 1 delete), and 10 (9 new and 1 delete) from 03-26 and 03-27. (Note: the from date may actually be 02-14 because it appears as the from date in the 03-26 log. It does not appear on the AWS servers because harvest occurred pre migration and harvest log was not copied over. The list identifiers command from 02-14 still shows 100 identifiers)
Update 5: the madrono list identifiers also shows 100 identifiers available from last harvest of 02-11-18
https://edatos.consorciomadrono.es/oai?verb=ListIdentifiers&from=2018-02-11&metadataPrefix=oai_ddi&set=openaire_data
Here is the tdr list identifiers list:
doi:10.18738/T8/0H1MWK 2018-03-08T08:00:02Z TDR doi:10.18738/T8/0SL6NE 2018-03-18T08:00:02Z TDR doi:10.18738/T8/0VXSGP 2018-03-18T08:00:02Z TDR doi:10.18738/T8/1HXBCV 2018-03-08T08:00:02Z TDR doi:10.18738/T8/1NPB8F 2018-03-18T08:00:02Z TDR doi:10.18738/T8/33NF9Y 2018-03-08T08:00:02Z TDR doi:10.18738/T8/3TTOWS 2018-02-14T08:00:02Z TDR doi:10.18738/T8/4813VQ 2018-03-18T08:00:02Z TDR doi:10.18738/T8/4Z1GAK 2018-02-14T08:00:02Z TDR doi:10.18738/T8/5NTI0C 2018-02-14T08:00:02Z TDR doi:10.18738/T8/5O59LD 2018-03-08T08:00:02Z TDR doi:10.18738/T8/6CO7XD 2018-03-07T08:00:01Z TDR doi:10.18738/T8/7BTM0D 2018-03-13T08:00:02Z TDR doi:10.18738/T8/7CUCM8 2018-03-18T08:00:02Z TDR doi:10.18738/T8/7YRHOS 2018-02-14T08:00:02Z TDR doi:10.18738/T8/8KJA6T 2018-02-14T08:00:02Z TDR doi:10.18738/T8/8RUKOV 2018-03-08T08:00:02Z TDR doi:10.18738/T8/9CC4KZ 2018-03-18T08:00:02Z TDR doi:10.18738/T8/9EBL3A 2018-03-10T08:00:02Z TDR doi:10.18738/T8/9ON0VB 2018-03-18T08:00:02Z TDR doi:10.18738/T8/9UXITA 2018-02-23T08:00:02Z TDR doi:10.18738/T8/9VYDE0 2018-03-08T08:00:02Z TDR doi:10.18738/T8/9XV8WN 2018-03-18T08:00:02Z TDR doi:10.18738/T8/A6DIXF 2018-02-14T08:00:02Z TDR doi:10.18738/T8/A8DMKB 2018-02-14T08:00:02Z TDR doi:10.18738/T8/ACWIR0 2018-03-08T08:00:02Z TDR doi:10.18738/T8/AMBBEZ 2018-02-13T08:00:02Z TDR doi:10.18738/T8/ASJ3RA 2018-03-18T08:00:02Z TDR doi:10.18738/T8/ASU4TG 2018-03-18T08:00:02Z TDR doi:10.18738/T8/BBC71O 2018-02-14T08:00:02Z TDR doi:10.18738/T8/BGPA1O 2018-03-18T08:00:02Z TDR doi:10.18738/T8/BS0G9T 2018-02-14T08:00:02Z TDR doi:10.18738/T8/BXB2BI 2018-02-14T08:00:02Z TDR doi:10.18738/T8/C4EQ3W 2018-02-14T08:00:02Z TDR doi:10.18738/T8/CKRKJJ 2018-02-14T08:00:02Z TDR doi:10.18738/T8/CRBUOB 2018-03-09T08:00:02Z TDR doi:10.18738/T8/CRIWDF 2018-02-14T08:00:02Z TDR doi:10.18738/T8/CRLHCI 2018-03-18T08:00:02Z TDR doi:10.18738/T8/CXFO4Y 2018-03-08T08:00:02Z TDR doi:10.18738/T8/EA4GZC 2018-02-14T08:00:02Z TDR doi:10.18738/T8/ELAIH3 2018-02-28T08:00:02Z TDR doi:10.18738/T8/F1IDM3 2018-03-18T08:00:02Z TDR doi:10.18738/T8/FCZMRZ 2018-03-18T08:00:02Z TDR doi:10.18738/T8/FKTPIP 2018-03-18T08:00:02Z TDR doi:10.18738/T8/GGE2PG 2018-03-18T08:00:02Z TDR doi:10.18738/T8/GVL6ZD 2018-03-18T08:00:02Z TDR doi:10.18738/T8/H2RTCA 2018-03-08T08:00:02Z TDR doi:10.18738/T8/H77VLX 2018-02-14T08:00:02Z TDR doi:10.18738/T8/HJYOT5 2018-02-14T08:00:02Z TDR doi:10.18738/T8/HKHQGP 2018-02-14T08:00:02Z TDR doi:10.18738/T8/IAPEY0 2018-02-13T08:00:02Z TDR doi:10.18738/T8/IQT2DF 2018-03-18T08:00:02Z TDR doi:10.18738/T8/K2JAWP 2018-02-14T08:00:02Z TDR doi:10.18738/T8/K6YBV4 2018-03-28T08:00:02Z TDR doi:10.18738/T8/KGXMXS 2018-02-13T08:00:02Z TDR doi:10.18738/T8/L1RG0K 2018-03-08T08:00:02Z TDR doi:10.18738/T8/LHK7DY 2018-03-08T08:00:02Z TDR doi:10.18738/T8/LWKNHE 2018-03-18T08:00:02Z TDR doi:10.18738/T8/LXRZZ2 2018-03-08T08:00:02Z TDR doi:10.18738/T8/MBRE4N 2018-02-14T08:00:02Z TDR doi:10.18738/T8/N6XE5X 2018-02-14T08:00:02Z TDR doi:10.18738/T8/NBFBQT 2018-03-08T08:00:02Z TDR doi:10.18738/T8/NJJAYV 2018-02-14T08:00:02Z TDR doi:10.18738/T8/NNLYVC 2018-02-23T08:00:02Z TDR doi:10.18738/T8/O3HV9R 2018-03-18T08:00:02Z TDR doi:10.18738/T8/OJX6UE 2018-03-08T08:00:02Z TDR doi:10.18738/T8/OXVTA5 2018-02-14T08:00:02Z TDR doi:10.18738/T8/P66RQH 2018-02-14T08:00:02Z TDR doi:10.18738/T8/PJYL8P 2018-03-08T08:00:02Z TDR doi:10.18738/T8/PLT37J 2018-02-14T08:00:02Z TDR doi:10.18738/T8/PQ8UMS 2018-03-08T08:00:02Z TDR doi:10.18738/T8/QKPPM1 2018-02-14T08:00:02Z TDR doi:10.18738/T8/R6RODU 2018-02-14T08:00:02Z TDR doi:10.18738/T8/RLTN4P 2018-03-14T08:00:02Z TDR doi:10.18738/T8/RREJJF 2018-03-18T08:00:02Z TDR doi:10.18738/T8/RVDCIW 2018-02-14T08:00:02Z TDR doi:10.18738/T8/SHYF4J 2018-03-08T08:00:02Z TDR doi:10.18738/T8/SU8RF3 2018-03-08T08:00:02Z TDR doi:10.18738/T8/SVMBKS 2018-03-18T08:00:02Z TDR doi:10.18738/T8/ULLYCX 2018-03-08T08:00:02Z TDR doi:10.18738/T8/V0W5JE 2018-03-18T08:00:02Z TDR doi:10.18738/T8/V6B9VA 2018-02-09T08:00:02Z TDR doi:10.18738/T8/VCUJQR 2018-03-18T08:00:02Z TDR doi:10.18738/T8/VNO1Q6 2018-03-18T08:00:02Z TDR doi:10.18738/T8/VXATC2 2018-02-14T08:00:02Z TDR doi:10.18738/T8/WE2CRT 2018-03-18T08:00:02Z TDR doi:10.18738/T8/WVJ33R 2018-02-14T08:00:02Z TDR doi:10.18738/T8/XALAED 2018-02-14T08:00:02Z TDR doi:10.18738/T8/Y95FWM 2018-03-02T08:00:02Z TDR doi:10.18738/T8/Y9LQQC 2018-02-23T08:00:01Z TDR doi:10.18738/T8/YGGQ10 2018-02-14T08:00:02Z TDR doi:10.18738/T8/YIRZN9 2018-03-18T08:00:02Z TDR doi:10.18738/T8/YSSPV0 2018-02-26T08:00:02Z TDR doi:10.18738/T8/YUIGHP 2018-02-14T08:00:02Z TDR doi:10.18738/T8/YXFPMD 2018-03-08T08:00:02Z TDR doi:10.18738/T8/Z4WIUV 2018-03-18T08:00:02Z TDR doi:10.18738/T8/ZF8QQZ 2018-02-14T08:00:02Z TDR doi:10.18738/T8/ZHBD9I 2018-03-18T08:00:02Z TDR doi:10.18738/T8/ZKKW1L 2018-03-08T08:00:02Z TDR doi:10.18738/T8/ZQZNQG 2018-03-18T08:00:02Z TDRhttps://dataverse.tdl.org/oai?verb=ListIdentifiers&from=2018-02-06&metadataPrefix=oai_ddi&set=TDR
2018-03-28T16:20:27Z
https://dataverse.tdl.org/oai
The text was updated successfully, but these errors were encountered: