Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible error in oai-pmh xml output file (dc elements namespace and uses) #3368

Closed
rmo-cdsp opened this issue Sep 21, 2016 · 9 comments
Closed

Comments

@rmo-cdsp
Copy link
Contributor

rmo-cdsp commented Sep 21, 2016

Hello.

First, I have to say that I'm not an oai-pmh specialist or what, I'm simply reporting a problem one of my dataverse's users have.
This user wanted to register our dataverse to an harvesting tool, Isidore, which uses oai-pmh format. But the guys from Isidore reported a problem with the xml output of dataverse, and gave us this url for example:
https://catalogues.cdsp.sciences-po.fr/oai?verb=GetRecord&metadataPrefix=oai_dc&identifier=doi:10.21410/dshs_2016/SBRPPQ (catalogues.cdsp.sciences-po.fr is a new french dataverse, ours ;) )

As you can see in the oai_dc element, there is a namespace declaration 'xmlns:dc="http://purl.org/dc/elements/1.1/"'. Then multiples dc/elements are used. The problem here, is that some dc/elements written in the xml file, like dc:isReferencedBy, dc:modified and dc:issued, are not part of dc/elements, but part of dc/terms namespace. So, as Isidore's guys said, the datas seems to not follow the terms index for dcmi metadata.

I'm not sure if I'm clear here, I am myself not very confident in all this things, but when you look at the index of terms at http://purl.org/dc/elements/1.1/ (which is a redirection for dcmi), it seems logical.

Always available for more informations :)

@rmo-cdsp rmo-cdsp changed the title Possible error in oai-pmh xml output file (dc elements declaration) Possible error in oai-pmh xml output file (dc elements namespace and uses) Sep 21, 2016
@djbrooke
Copy link
Contributor

@landreev - this may be relevant to the work you're doing in #3307

@pdurbin
Copy link
Member

pdurbin commented Sep 26, 2016

At http://irclog.iq.harvard.edu/dataverse/2016-09-23#i_41932 @rmo-cdsp mentioned he think it's related.

@landreev
Copy link
Contributor

Hi,
Sorry I didn't comment sooner.
You are correct, we are dumping "Extended DC"/DCTERMS fields into "plain DC" records.
We should of course skip these fields when we produce records under the original, http://purl.org/dc/elements/1.1/ schema.

I will incorporate this fix into the next release, as part of the OAI work I'm doing under #3307.

@djbrooke
Copy link
Contributor

Hey @rmo-cdsp - I'm going to close this out as it will be handled as part of another issue. Please follow along with #3307 - @landreev has some code available and it would be great if you could add comments or questions in the issue.

Cheers!

@pdurbin
Copy link
Member

pdurbin commented Oct 19, 2016

@rmo-cdsp specifically, please keep an eye on pull request #3409.

@djbrooke
Copy link
Contributor

Hi @rmo-cdsp - oops. I closed this one prematurely. @landreev is not going to handle this in #3409 after all. I'm going to reopen this and move this issue as well as the correct PR #3378. This is in code review currently, so it should be addressed soon.

Apologies for the confusion!

@landreev
Copy link
Contributor

landreev commented Nov 2, 2016

@kcondon Kevin, I'm passing this to QA.
This is super straightforward:
a) What we are giving to users as Dublin Core on the dataset page, under the "metadata" tab - NO CHANGES. This is the "extended DC"/aka DCTERMS, same schema, same fields.
b) The "DC classic", "simple DC": we are not serving it to the users under the metadata tab on the page. But you can explicitly ask for it using the metadata API call (as oai_dc). This is what the OAI standard specifies as the required, non-optional metadata format. This should still be using the same schema in the header as before, but will now have only the original fields, and no fancy new fields, such as "isReferencedBy".

pdurbin pushed a commit that referenced this issue Nov 4, 2016
Made a distinct method for OAI-DC export kind. Related to #3368
@pdurbin pdurbin mentioned this issue Nov 4, 2016
11 tasks
@kcondon kcondon closed this as completed Nov 4, 2016
@kcondon kcondon removed the Status: QA label Nov 4, 2016
@pdurbin
Copy link
Member

pdurbin commented Nov 14, 2016

The status of this issue isn't very clear and @rmo-cdsp is asking about it and the pull request at http://irclog.iq.harvard.edu/dataverse/2016-11-14#i_44995

Originally @rmo-cdsp created pull request #3378 but it got behind "develop" and was difficult to test.

I created pull request #3458 based on the commit by @rmo-cdsp (I used git cherry-pick) and this pull request was tested and merged into the "develop" branch. Since the next release is 4.6, I'm adding that milestone to this issue.

While investigating this, I tried https://waffle.io/IQSS/dataverse?search=3368 but couldn't find this issue. I expected to find this issue in the "Done" column but maybe it's archived or something? @djbrooke would probably know more about this.

Thank you @rmo-cdsp for your commit! Long story short, your fix will be in the next release, which is 4.6. You can read more about our future releases at http://dataverse.org/goals-roadmap-and-releases

@pdurbin pdurbin added this to the 4.6 - File Replace milestone Nov 14, 2016
@djbrooke
Copy link
Contributor

Yes, it was archived in Waffle. This happens automatically after 7 days in the "Done" column or it can be done manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants