Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enrich JATS being exposed through OAI #11012

Open
1 of 9 tasks
jalperin opened this issue Mar 1, 2025 · 4 comments
Open
1 of 9 tasks

Enrich JATS being exposed through OAI #11012

jalperin opened this issue Mar 1, 2025 · 4 comments
Assignees
Labels
Community:2:Priority Any issue that has been identified through research or feedback as a major community priority.

Comments

@jalperin
Copy link
Member

jalperin commented Mar 1, 2025

Description:
Now that the JATS Plugin is being used to expose metadata through OAI, it is an opportunity to capture richer metadata. This metadata will be used for indexing by sources like OpenAlex and Dimensions when journals do not use DOIs, so it is crucial to get it right.

Metadata enrichments (in order of priority):

  • Confirm that RORs are included with affiliation (I was told they are, but need to confirm)
  • Mark Corresponding Author. I believe OJS always has at least one author as corresponding. Can be coded as follows (adding <xref> to <contrib> tag for that author and the <author-notes> with the <corresp> tag). Note that <corresp> is allowed to have any string in there, but since we don't have a place to enter this in OJS, we can just populate it with the email as shown.
<contrib-group>
    <contrib contrib-type="author">
        <name>
            <surname>Doe</surname>
            <given-names>John</given-names>
        </name>
        <xref ref-type="corresp" rid="cor1">*</xref>
    </contrib>
</contrib-group>

<author-notes>
    <corresp id="cor1"><email>[email protected]</email></corresp>
</author-notes>
  • Include more article dates of the history of an article (submission, first decision, acceptance). See comment from Mike on spec.
  • Funding (see comment for details)
  • data availability statement. Can be included if collected, as per (#8290). It can be added inside the <back>, in its own <sec> (before <ref-list>), as follows. Contents inside the <p> can be what is pulled up with getLocalizedData('dataAvailability'), with any links placed in <ext-link> tags (but this is less important, you can just put it all inside a <p> as plaintext
...
<sec sec-type="data-availability">
<title>Data Availability</title>
<p>All data for replication can be found online at: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.7910/DVN/OCZNVY">https://doi.org/10.7910/DVN/OCZNVY</ext-link></p>
</sec>
...
  • Can remove contrib-group of editors, which seems to be filling in all Editor users, not necessarily the real editorial board, nor the editor users who participated in a submission

For Future:

  • Figure out if we can add the number of reviewers of an article, perhaps as a custom tag?
  • Indicate if an article has been peer-reviewed. Currently this is placed in article type. A bit complicated, may need to defer. See comment with details.
  • Consider bringing back <contrib-group> for editors if they are being authenticated (as with the PFL)
@jalperin jalperin added the Community:2:Priority Any issue that has been identified through research or feedback as a major community priority. label Mar 1, 2025
@asmecher
Copy link
Member

asmecher commented Mar 1, 2025

RORs are supported only starting with 3.5, including in the JATS. They look like this:

https://gist.github.com/asmecher/adeaf41253bb0b65965b69b762537d8e/96f9e43da8d93e4262c6386cacd4001faf78d9e8#file-gistfile1-txt-L106..L111

(...with refs in the contributor list.)

RORs are, and will continue to be, optional. Plaintext affiliations will continue to be common.

@AhemNason
Copy link

Ok, so, if I were making a wishlist of stuff we needed to be pushing downstream, it probably wouldn't include peer-review information just because I think you can't rely on the system knowing everything and I think peer review is in crisis anyway. But, don't call me a deciding vote on that. But, if we do include it, there's precedent in the Crossref Schema for this sort of thing (we could also include it in our Crossref deposits... we do not currently).

Article type does make sense, I think. But I think what's confusing to users here is probably that there's two places to enter them. One is the COAR list for OpenAIRE (good, yes, 100%) and the other is a write-in field (less good, but I know Erudit use it).

Image

I think the key pieces for open scholarly infrastructure and emerging needs in this space include:

  • funder ID
    • right now this is contingent on the funding plugin... increasingly in the narrative of publicly funded research, there's a lot of "how do we know where funded research ended up" stuff. Our funder plugin uses the Crossref funder registry which will eventually be replaced with ROR (for funding organizations, you see), but also does not require a Crossref membership to use... it might not be a bad idea to just have this metadata be stock?
  • grant ID
    • ditto
  • and relational metadata (preprint link, dataset link)
  • ROR
  • ORCID
  • DOI
  • publication/submission dates

I don't especially feel we need to be too worried about:

  • references
    • references is complicated. I think, for one, it just hugely expands the size of the data we're pushing around here and I'm not convinced the load is worth what we'd get at this juncture. We have no "cited by" stuff like Crossref does. We're not tracking citations. The field is totally optional in publication metadata, and these references don't get structured unless the reference linking plugin is enabled, which does use a Crossref API.

@jalperin
Copy link
Member Author

jalperin commented Mar 3, 2025

For info, here is how the PFL Plugin figures out number of reviewers.

@AhemNason
Copy link

Alright, just following up on what I think is easy to expose via OAI-PMH with additions to the JATS Template at this late juncture. Let's work through it in order from chillest to least chill and then I'll defer to Juan for the next piece.

References (Chillest)

These already exist in the JATS Template and are just a flat dump of references by line. This is fine!

Recommendation: Include in exposed OAI

Dates

Proposal for additional date metadata related to publication. At minimum, acceptance and publication (these can be sent to Crossref as well). The latest version of any record should exist at <pub-date>, but other dates related to the publication exist under <pub-history>.

https://jats.nlm.nih.gov/publishing/tag-library/1.4/element/pub-date.html

When a new publication format is produced, any previous elements should be described as s inside . This leaves the as the single source of the latest date of publication.

https://jats.nlm.nih.gov/publishing/tag-library/1.4/element/pub-history.html

...
<article-meta>
 ...
 <pub-history>
  <event event-type="received">
   <event-desc>Received: <date date-type="received" iso-8601-date="2017-09-12">
    <day>12</day><month>September</month><year>2017</year></date></event-desc>
  </event>

  <event event-type="accepted">
   <event-desc>Accepted: <date date-type="accepted" iso-8601-date="2018-05-26">
    <day>26</day><month>May</month><year>2018</year></date></event-desc>
  </event>

<!-- for any older versions of a published article -->
  <event event-type="pub">
   <event-desc>Version of Record published: <pub-date date-type="pub" iso-8601-date="2018-06-13">
    <day>13</day><month>June</month><year>2018</year></pub-date> (version 2)</event-desc>
  </event>
 </pub-history>
</article-meta>
...

event-type attribute is controlled: https://jats.nlm.nih.gov/publishing/tag-library/1.4/attribute/event-type.html

Funder ID

With the provision that this would be from 3.5 and up, Funder ID and Grant ID should be added if we can swing it. You can find the JATS language here: https://jats.nlm.nih.gov/publishing/tag-library/1.4/element/funding-group.html They have examples of both with and without implementation with the Crossref funder registry.

With:

<article-meta>
 ...
 <funding-group specific-use="crossref">
  <award-group>
   <funding-source id="gs1" country="US">
    <institution-wrap>
     <institution>National Institutes of Health</institution>
     <institution-id institution-id-type="doi"
       vocab="open-funder-registry"
       vocab-identifier="10.13039/open_funder_registry">10.13039/100000002</institution-id>
    </institution-wrap>
   </funding-source>
   <award-id>GM18458</award-id>
  </award-group>

  <award-group>
   <funding-source id="gs2" country="US">
    <institution-wrap>
     <institution>National Science Foundation</institution>
     <institution-id institution-id-type="doi"
       vocab="open-funder-registry"
       vocab-identifier="10.13039/open_funder_registry">10.13039/100000001</institution-id>
    </institution-wrap>
   </funding-source>
   <award-id>DMS-0204674</award-id>
   <award-id>DMS-0244638</award-id>
  </award-group>
 </funding-group>
 ...
</article-meta>

Without:

<funding-group>
 <award-group id="award01">
  <funding-source>Humanities and Social Sciences, 
   Ministry of Education of China</funding-source>
  <award-id>12YJA740081</award-id>
 </award-group>

This is actually instructive. If the funding plugin has all the shorthand identifiers, country metadata, institution ID from the registry, that's great. If all we populate is <funding-source> and '`, basically everything in the without category, that's an ok first step. I don't know everything the funding plugin generates.

Peer Review - Least Chill, by a mile

JATS4R recommendations around peer review metadata do exist and it would be possible to incorporate them in a JATS record if there were time/will. The questions are more about whether this would be sustainable long term and how likely we are to codify peer review taxonomies. It's important to temper expectations, though, because JATS4R is very much intended to specifically support the availability of open peer review documents and the spec requires access to these materials with their own identifiers.

But, there is a provision to get at least some peer review-related metadata using custom-meta-group: https://jats4r.niso.org/peer-review-materials/#parent-article-document

Most of JATS4R is about publishing peer reviews themselves in open peer review, but this specific piece of the document is about forwarding peer review information in the parent article. It leverages this OSF taxonomy for peer-review terms: https://osf.io/aynr5 So, a sample in a journal article might look like this:

<custom-meta-group>
    <custom-meta>
        <meta-name>peer-review-identity-transparency</meta-name>.       
        <meta-value>All identities visible</meta-value>
     </custom-meta>
    <custom-meta>
        <meta-name>reviewer-interacts-with</meta-name>
        <meta-value>Other reviewer(s)</meta-value>
     </custom-meta>
    <custom-meta>
        <meta-name>review-information-published</meta-name>
        <meta-value>Review reports reviewer opt in</meta-value>
     </custom-meta>
    <custom-meta>
         <meta-name>post-publication-commenting</meta-name>
         <meta-value>Open</meta-value>
    </custom-meta>
</custom-meta-group>
</article-meta>

Enormous caveats here that this would need to match the OSF taxonomy, but you could cherry-pick the primary stages of peer review or display all stages of peer review here. I suspect this would require a bit of a overhaul on how we're storing peer review stage metadata in the database. That's a guess. Here's one more example of this for an article pushed from a preprint service to a publisher and then accepted with minor revisions.

<custom-meta>
    <meta-name>peer-review-stage</meta-name>
    <meta-value>pre-publication</meta-value>
</custom-meta>
<custom-meta>
    <meta-name>transfer</meta-name>
    <meta-value>yes</meta-value>
</custom-meta>
<custom-meta>
    <meta-name>transferred-from</meta-name>
    <meta-value>biorxiv</meta-value>
</custom-meta>
<custom-meta>
    <meta-name>peer-review-recommendation</meta-name>
    <meta-value>minor-revision</meta-value>
</custom-meta>
</custom-meta-group>

I think this has potential, but I don't think it would be easy to accomodate quickly. This is my review! Of the things! My vote is for whatever is the lowest hanging fruit, but I do think funder metadata would be big to expose via JATS OAI (and would position us well, I think), references looks easy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community:2:Priority Any issue that has been identified through research or feedback as a major community priority.
Projects
None yet
Development

No branches or pull requests

3 participants