Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvesting OAI with URL identifier fail #5245

Closed
tcoupin opened this issue Oct 30, 2018 · 7 comments
Closed

Harvesting OAI with URL identifier fail #5245

tcoupin opened this issue Oct 30, 2018 · 7 comments

Comments

@tcoupin
Copy link
Member

tcoupin commented Oct 30, 2018

Where an OAI server uses url identifier (http://doi.org/...), Dataverse fail to parse globalId.

Log:

[2018-10-30T10:22:14.276+0000] [glassfish 4.1] [FINE] [] [edu.harvard.iq.dataverse.api.imports.ImportServiceBean] [tid: _ThreadID=155 _ThreadName=__ejb-thread-pool7] [timeMillis: 1540894934276] [levelValue: 500] [CLASSNAME: edu.harvard.iq.dataverse.api.imports.ImportServiceBean] [METHODNAME: doImportHarvestedDataset] [[
  importing DC /tmp/meta1606584145857252274.tmp]]

[2018-10-30T10:22:14.278+0000] [glassfish 4.1] [FINE] [] [edu.harvard.iq.dataverse.api.imports.ImportGenericServiceBean] [tid: _ThreadID=155 _ThreadName=__ejb-thread-pool7] [timeMillis: 1540894934278] [levelValue: 500] [CLASSNAME: edu.harvard.iq.dataverse.api.imports.ImportGenericServiceBean] [METHODNAME: processXMLElement] [[
  entering processXMLElement; (:)]]

[2018-10-30T10:22:14.286+0000] [glassfish 4.1] [FINE] [] [edu.harvard.iq.dataverse.api.imports.ImportGenericServiceBean] [tid: _ThreadID=155 _ThreadName=__ejb-thread-pool7] [timeMillis: 1540894934286] [levelValue: 500] [CLASSNAME: edu.harvard.iq.dataverse.api.imports.ImportGenericServiceBean] [METHODNAME: processOAIDCxml] [[
  Imported identifier: https://doi.org/10.5072/FK2/YIX6QZ]]

[2018-10-30T10:22:14.287+0000] [glassfish 4.1] [FINE] [] [edu.harvard.iq.dataverse.api.imports.ImportGenericServiceBean] [tid: _ThreadID=155 _ThreadName=__ejb-thread-pool7] [timeMillis: 1540894934287] [levelValue: 500] [CLASSNAME: edu.harvard.iq.dataverse.api.imports.ImportGenericServiceBean] [METHODNAME: reassignIdentifierAsGlobalId] [[
  Processing DOI identifier formatted as a resolver URL: https://doi.org/10.5072/FK2/YIX6QZ]]

....

Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -10
        at java.lang.String.substring(String.java:1967)
        at edu.harvard.iq.dataverse.api.imports.ImportGenericServiceBean.reassignIdentifierAsGlobalId(ImportGenericServiceBean.java:434)
        at edu.harvard.iq.dataverse.api.imports.ImportGenericServiceBean.processOAIDCxml(ImportGenericServiceBean.java:225)
        at sun.reflect.GeneratedMethodAccessor1075.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.glassfish.ejb.security.application.EJBSecurityManager.runMethod(EJBSecurityManager.java:1081)
        at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBSecurityManager.java:1153)
        at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.java:4786)
        at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:656)

PR to come...

@tcoupin
Copy link
Member Author

tcoupin commented Jan 4, 2019

Happy new year!

Still work nedded?

@poikilotherm
Copy link
Contributor

Maybe @pdurbin could move this to Code Review again? The failing IT tests seem to have been worked out recently...

@pdurbin
Copy link
Member

pdurbin commented Jan 4, 2019

@tcoupin hi! Before I move this to code review, can you please merge the latest from "develop" into your branch? We released 4.10. Thanks!

@pdurbin
Copy link
Member

pdurbin commented Jan 4, 2019

@tcoupin on second thought, I went ahead and moved this issue to code review but if you could please merge the latest from develop into your branch soon it would be appreciated.

@landreev
Copy link
Contributor

landreev commented Jan 4, 2019

I see, this was introduced in May - when .lastIndexOf('/') was changed to .indexOf('/') in line 362... Which I'm assuming was done to fix another problem... OK, so these are all consequences of switching to treating the DOI "shoulder" as part of the identifier...

@landreev
Copy link
Contributor

landreev commented Jan 4, 2019

OK, this PR fixes the problem at hand, so I'm moving this along as soon as the branch is merged w/ develop.
But, looking at the rest of the code as it is now, it appears that we now assume that there can never be a '/' separator in the authority part of the identifier... I'm wondering if this is a safe assumption (with handles especially?).

@tcoupin
Copy link
Member Author

tcoupin commented Jan 7, 2019

@kcondon kcondon self-assigned this Jan 7, 2019
kcondon added a commit that referenced this issue Jan 7, 2019
#5245 - fix global id url parsing when parsing OAI
@kcondon kcondon closed this as completed Jan 7, 2019
@kcondon kcondon removed the Status: QA label Jan 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants