Skip to content

Commit a7a2985

Browse files
committed
Merge branch 'develop' into 9309-oai-earliest-date
Resolved merge conflict with #9310 in OAIServlet (#9309)
2 parents 8a4ec02 + dea8eb5 commit a7a2985

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+1732
-911
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
The Schema.org metadata export and the schema.org metadata embedded in dataset pages has been updated to improve compliance with Schema.org's schema and Google's recommendations.
2+
3+
Backward compatibility - the "citation"/"text" key has been replaced by a "citation"/"name" key.
+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
### Default Values for Database Connections Fixed
2+
3+
Introduced in Dataverse release 5.3 a regression might have hit you:
4+
the announced default values for the database connection never actually worked.
5+
6+
With the update to Payara 5.2022.3 it was possible to introduce working
7+
defaults. The documentation has been changed accordingly.
8+
9+
Together with this change, you can now enable advanced connection pool
10+
configurations useful for debugging and monitoring. Of particular interest may be `sslmode=require`. See the docs for details.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
Datasets that are part of linked dataverse collections will now be displayed in
2+
their linking dataverse collections. In order to fix the display of collections
3+
that have already been linked you must re-index the linked collections. This
4+
query will provide a list of commands to re-index the effected collections:
5+
6+
select 'curl http://localhost:8080/api/admin/index/dataverses/'
7+
|| tmp.dvid from (select distinct dataverse_id as dvid
8+
from dataverselinkingdataverse) as tmp
9+
10+
The result of the query will be a list of re-index commands such as:
11+
12+
curl http://localhost:8080/api/admin/index/dataverses/633
13+
14+
where '633' is the id of the linked collection.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
### Support for cleaning up files in datasets' storage
2+
3+
Experimental feature: the leftover files stored in the Dataset storage location that are not in the file list of that Dataset, but are named following the Dataverse technical convetion for dataset files, can be removed with the new native API call [Cleanup storage of a Dataset](https://guides.dataverse.org/en/latest/api/native-api.html#cleanup-storage-api).

doc/sphinx-guides/source/admin/harvestclients.rst

+2
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@ Clients are managed on the "Harvesting Clients" page accessible via the :doc:`da
2121

2222
The process of creating a new, or editing an existing client, is largely self-explanatory. It is split into logical steps, in a way that allows the user to go back and correct the entries made earlier. The process is interactive and guidance text is provided. For example, the user is required to enter the URL of the remote OAI server. When they click *Next*, the application will try to establish a connection to the server in order to verify that it is working, and to obtain the information about the sets of metadata records and the metadata formats it supports. The choices offered to the user on the next page will be based on this extra information. If the application fails to establish a connection to the remote archive at the address specified, or if an invalid response is received, the user is given an opportunity to check and correct the URL they entered.
2323

24+
Note that as of 5.13, a new entry "Custom HTTP Header" has been added to the Step 1. of Create or Edit form. This optional field can be used to configure this client with a specific HTTP header to be added to every OAI request. This is to accommodate a (rare) use case where the remote server may require a special token of some kind in order to offer some content not available to other clients. Most OAI servers offer the same publicly-available content to all clients, so few admins will have a use for this feature. It is however on the very first, Step 1. screen in case the OAI server requires this token even for the "ListSets" and "ListMetadataFormats" requests, which need to be sent in the Step 2. of creating or editing a client. Multiple headers can be supplied separated by `\\n` - actual "backslash" and "n" characters, not a single "new line" character.
25+
2426
How to Stop a Harvesting Run in Progress
2527
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2628

doc/sphinx-guides/source/api/native-api.rst

+109-5
Original file line numberDiff line numberDiff line change
@@ -730,21 +730,20 @@ The fully expanded example above (without environment variables) looks like this
730730
731731
curl -H "X-Dataverse-key:$API_TOKEN" https://demo.dataverse.org/api/datasets/:persistentId/versions/:draft?persistentId=doi:10.5072/FK2/J8SJZB
732732
733-
734-
|CORS| Show the dataset whose id is passed:
733+
|CORS| Show the dataset whose database id is passed:
735734

736735
.. code-block:: bash
737736
738737
export SERVER_URL=https://demo.dataverse.org
739-
export ID=408730
738+
export ID=24
740739
741740
curl $SERVER_URL/api/datasets/$ID
742741
743742
The fully expanded example above (without environment variables) looks like this:
744743

745744
.. code-block:: bash
746745
747-
curl https://demo.dataverse.org/api/datasets/408730
746+
curl https://demo.dataverse.org/api/datasets/24
748747
749748
The dataset id can be extracted from the response retrieved from the API which uses the persistent identifier (``/api/datasets/:persistentId/?persistentId=$PERSISTENT_IDENTIFIER``).
750749

@@ -1513,6 +1512,38 @@ The fully expanded example above (without environment variables) looks like this
15131512
15141513
curl -H X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -X POST https://demo.dataverse.org/api/datasets/:persistentId/add?persistentId=doi:10.5072/FK2/J8SJZB -F 'jsonData={"description":"A remote image.","storageIdentifier":"trsa://themes/custom/qdr/images/CoreTrustSeal-logo-transparent.png","checksumType":"MD5","md5Hash":"509ef88afa907eaf2c17c1c8d8fde77e","label":"testlogo.png","fileName":"testlogo.png","mimeType":"image/png"}'
15151514
1515+
.. _cleanup-storage-api:
1516+
1517+
Cleanup storage of a Dataset
1518+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1519+
1520+
This is an experimental feature and should be tested on your system before using it in production.
1521+
Also, make sure that your backups are up-to-date before using this on production servers.
1522+
It is advised to first call this method with the ``dryrun`` parameter set to ``true`` before actually deleting the files.
1523+
This will allow you to manually inspect the files that would be deleted if that parameter is set to ``false`` or is omitted (a list of the files that would be deleted is provided in the response).
1524+
1525+
If your Dataverse installation has been configured to support direct uploads, or in some other situations,
1526+
you could end up with some files in the storage of a dataset that are not linked to that dataset directly. Most commonly, this could
1527+
happen when an upload fails in the middle of a transfer, i.e. if a user does a UI direct upload and leaves the page without hitting cancel or save,
1528+
Dataverse doesn't know and doesn't clean up the files. Similarly in the direct upload API, if the final /addFiles call isn't done, the files are abandoned.
1529+
1530+
All the files stored in the Dataset storage location that are not in the file list of that Dataset (and follow the naming pattern of the dataset files) can be removed, as shown in the example below.
1531+
1532+
.. code-block:: bash
1533+
1534+
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
1535+
export SERVER_URL=https://demo.dataverse.org
1536+
export PERSISTENT_ID=doi:10.5072/FK2/J8SJZB
1537+
export DRYRUN=true
1538+
1539+
curl -H "X-Dataverse-key: $API_TOKEN" -X GET "$SERVER_URL/api/datasets/:persistentId/cleanStorage?persistentId=$PERSISTENT_ID&dryrun=$DRYRUN"
1540+
1541+
The fully expanded example above (without environment variables) looks like this:
1542+
1543+
.. code-block:: bash
1544+
1545+
curl -H X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -X GET https://demo.dataverse.org/api/datasets/:persistentId/cleanStorage?persistentId=doi:10.5072/FK2/J8SJZB&dryrun=true
1546+
15161547
Adding Files To a Dataset via Other Tools
15171548
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
15181549

@@ -2060,6 +2091,77 @@ The response is a JSON object described in the :doc:`/api/external-tools` sectio
20602091
Files
20612092
-----
20622093

2094+
Get JSON Representation of a File
2095+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2096+
2097+
.. note:: Files can be accessed using persistent identifiers. This is done by passing the constant ``:persistentId`` where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name ``persistentId``.
2098+
2099+
Example: Getting the file whose DOI is *10.5072/FK2/J8SJZB*:
2100+
2101+
.. code-block:: bash
2102+
2103+
export SERVER_URL=https://demo.dataverse.org
2104+
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB
2105+
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
2106+
2107+
curl -H "X-Dataverse-key:$API_TOKEN" $SERVER_URL/api/files/:persistentId/?persistentId=$PERSISTENT_IDENTIFIER
2108+
2109+
The fully expanded example above (without environment variables) looks like this:
2110+
2111+
.. code-block:: bash
2112+
2113+
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" https://demo.dataverse.org/api/files/:persistentId/?persistentId=doi:10.5072/FK2/J8SJZB
2114+
2115+
You may get its draft version of an unpublished file if you pass an api token with view draft permissions:
2116+
2117+
.. code-block:: bash
2118+
2119+
export SERVER_URL=https://demo.dataverse.org
2120+
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB
2121+
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
2122+
2123+
curl -H "X-Dataverse-key:$API_TOKEN" $SERVER/api/files/:persistentId/?persistentId=$PERSISTENT_IDENTIFIER
2124+
2125+
The fully expanded example above (without environment variables) looks like this:
2126+
2127+
.. code-block:: bash
2128+
2129+
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" https://demo.dataverse.org/api/files/:persistentId/?persistentId=doi:10.5072/FK2/J8SJZB
2130+
2131+
2132+
|CORS| Show the file whose id is passed:
2133+
2134+
.. code-block:: bash
2135+
2136+
export SERVER_URL=https://demo.dataverse.org
2137+
export ID=408730
2138+
2139+
curl $SERVER_URL/api/file/$ID
2140+
2141+
The fully expanded example above (without environment variables) looks like this:
2142+
2143+
.. code-block:: bash
2144+
2145+
curl https://demo.dataverse.org/api/files/408730
2146+
2147+
You may get its draft version of an published file if you pass an api token with view draft permissions and use the draft path parameter:
2148+
2149+
.. code-block:: bash
2150+
2151+
export SERVER_URL=https://demo.dataverse.org
2152+
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB
2153+
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
2154+
2155+
curl -H "X-Dataverse-key:$API_TOKEN" $SERVER/api/files/:persistentId/draft/?persistentId=$PERSISTENT_IDENTIFIER
2156+
2157+
The fully expanded example above (without environment variables) looks like this:
2158+
2159+
.. code-block:: bash
2160+
2161+
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" https://demo.dataverse.org/api/files/:persistentId/draft/?persistentId=doi:10.5072/FK2/J8SJZB
2162+
2163+
The file id can be extracted from the response retrieved from the API which uses the persistent identifier (``/api/datasets/:persistentId/?persistentId=$PERSISTENT_IDENTIFIER``).
2164+
20632165
Adding Files
20642166
~~~~~~~~~~~~
20652167

@@ -3339,7 +3441,8 @@ The following optional fields are supported:
33393441
- archiveDescription: What the name suggests. If not supplied, will default to "This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data."
33403442
- set: The OAI set on the remote server. If not supplied, will default to none, i.e., "harvest everything".
33413443
- style: Defaults to "default" - a generic OAI archive. (Make sure to use "dataverse" when configuring harvesting from another Dataverse installation).
3342-
3444+
- customHeaders: This can be used to configure this client with a specific HTTP header that will be added to every OAI request. This is to accommodate a use case where the remote server requires this header to supply some form of a token in order to offer some content not available to other clients. See the example below. Multiple headers can be supplied separated by `\\n` - actual "backslash" and "n" characters, not a single "new line" character.
3445+
33433446
Generally, the API will accept the output of the GET version of the API for an existing client as valid input, but some fields will be ignored. For example, as of writing this there is no way to configure a harvesting schedule via this API.
33443447
33453448
An example JSON file would look like this::
@@ -3351,6 +3454,7 @@ An example JSON file would look like this::
33513454
"archiveUrl": "https://zenodo.org",
33523455
"archiveDescription": "Moissonné depuis la collection LMOPS de l'entrepôt Zenodo. En cliquant sur ce jeu de données, vous serez redirigé vers Zenodo.",
33533456
"metadataFormat": "oai_dc",
3457+
"customHeaders": "x-oai-api-key: xxxyyyzzz",
33543458
"set": "user-lmops"
33553459
}
33563460

doc/sphinx-guides/source/developers/documentation.rst

+2
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ That's it! Thank you for your contribution! Your pull request will be added manu
2222

2323
Please see https://github.com/IQSS/dataverse/pull/5857 for an example of a quick fix that was merged (the "Files changed" tab shows how a typo was fixed).
2424

25+
Preview your documentation changes which will be built automatically as part of your pull request in Github. It will show up as a check entitled: `docs/readthedocs.org:dataverse-guide — Read the Docs build succeeded!`. For example, this PR built to https://dataverse-guide--9249.org.readthedocs.build/en/9249/.
26+
2527
If you would like to read more about the Dataverse Project's use of GitHub, please see the :doc:`version-control` section. For bug fixes and features we request that you create an issue before making a pull request but this is not at all necessary for quick fixes to the documentation.
2628

2729
.. _admin: https://github.com/IQSS/dataverse/tree/develop/doc/sphinx-guides/source/admin

doc/sphinx-guides/source/index.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
Dataverse Documentation v. |version|
77
====================================
88

9-
These documentation guides are for the |version| version of Dataverse. To find guides belonging to previous versions, :ref:`guides_versions` has a list of all available versions.
9+
These documentation guides are for the |version| version of Dataverse. To find guides belonging to previous or future versions, :ref:`guides_versions` has a list of all available versions.
1010

1111
.. toctree::
1212
:glob:

0 commit comments

Comments
 (0)