diff --git a/README.md b/README.md index 2bdc0e8edde..f52a6e20f83 100644 --- a/README.md +++ b/README.md @@ -18,6 +18,7 @@ Dataverse is a trademark of President and Fellows of Harvard College and is regi [![Dataverse Project logo](src/main/webapp/resources/images/dataverseproject_logo.jpg?raw=true "Dataverse Project")](http://dataverse.org) [![API Test Status](https://jenkins.dataverse.org/buildStatus/icon?job=IQSS-dataverse-develop&subject=API%20Test%20Status)](https://jenkins.dataverse.org/job/IQSS-dataverse-develop/) +[![API Test Coverage](https://img.shields.io/jenkins/coverage/jacoco?jobUrl=https%3A%2F%2Fjenkins.dataverse.org%2Fjob%2FIQSS-dataverse-develop&label=API%20Test%20Coverage)](https://jenkins.dataverse.org/job/IQSS-dataverse-develop/) [![Unit Test Status](https://img.shields.io/travis/IQSS/dataverse?label=Unit%20Test%20Status)](https://travis-ci.org/IQSS/dataverse) [![Unit Test Coverage](https://img.shields.io/coveralls/github/IQSS/dataverse?label=Unit%20Test%20Coverage)](https://coveralls.io/github/IQSS/dataverse?branch=develop) diff --git a/conf/solr/7.7.2/schema.xml b/conf/solr/7.7.2/schema.xml index 6c377d2e92c..da40a8e99fa 100644 --- a/conf/solr/7.7.2/schema.xml +++ b/conf/solr/7.7.2/schema.xml @@ -293,7 +293,7 @@ - + diff --git a/doc/release-notes/4.19-release-notes.md b/doc/release-notes/4.19-release-notes.md index 56977f5b277..70c8711582c 100644 --- a/doc/release-notes/4.19-release-notes.md +++ b/doc/release-notes/4.19-release-notes.md @@ -83,7 +83,7 @@ Additional fields are now available via the Search API, mostly related to inform ## Complete List of Changes -For the complete list of code changes in this release, see the 4.19 milestone in Github. +For the complete list of code changes in this release, see the 4.19 milestone in Github. For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org. diff --git a/doc/release-notes/6485-multiple-stores.md b/doc/release-notes/6485-multiple-stores.md new file mode 100644 index 00000000000..ea2d224d612 --- /dev/null +++ b/doc/release-notes/6485-multiple-stores.md @@ -0,0 +1,36 @@ +# Multiple Store Support +Dataverse can now be configured to store files in more than one place at the same time (multiple file, s3, and/or swift stores). + +General information about this capability can be found in the Configuration Guide - File Storage section. + +**Upgrade Information:** + +**Existing installations will need to make configuration changes to adopt this version, regardless of whether additional stores are to be added or not.** + +Multistore support requires that each store be assigned a label, id, and type - see the documentation for a more complete explanation. For an existing store, the recommended upgrade path is to assign the store id based on it's type, i.e. a 'file' store would get id 'file', an 's3' store would have the id 's3'. + +With this choice, no manual changes to datafile 'storageidentifier' entries are needed in the database. (If you do not name your existing store using this convention, you will need to edit the database to maintain access to existing files!). + +The following set of commands to change the Glassfish JVM options will adapt an existing file or s3 store for this upgrade: +For a file store: + + ./asadmin create-jvm-options "\-Ddataverse.files.file.type=file" + ./asadmin create-jvm-options "\-Ddataverse.files.file.label=file" + ./asadmin create-jvm-options "\-Ddataverse.files.file.directory=" + +For an s3 store: + + ./asadmin create-jvm-options "\-Ddataverse.files.s3.type=s3" + ./asadmin create-jvm-options "\-Ddataverse.files.s3.label=s3" + ./asadmin delete-jvm-options "-Ddataverse.files.s3-bucket-name=" + ./asadmin create-jvm-options "-Ddataverse.files.s3.bucket-name=" + +Any additional S3 options you have set will need to be replaced as well, following the pattern in the last two lines above - delete the option including a '-' after 's3' and creating the same option with the '-' replaced by a '.', using the same value you currently have configured. + +Once these options are set, restarting the glassfish service is all that is needed to complete the change. + +<<<<<<< HEAD +Note that the "\-Ddataverse.files.directory", if defined, continues to control where temporary files are stored (in the /temp subdir of that directory), independent of the location of any 'file' store defined above. +======= +Note that the "\-Ddataverse.files.directory", if defined, continues to control where temporary files are stored (in the /temp subdir of that directory), independent of the location of any 'file' store defined above. +>>>>>>> branch 'IQSS/6485' of https://github.com/TexasDigitalLibrary/dataverse.git diff --git a/doc/release-notes/6510-duplicate-datafiles-and-datatables.md b/doc/release-notes/6510-duplicate-datafiles-and-datatables.md new file mode 100644 index 00000000000..18ac58860d8 --- /dev/null +++ b/doc/release-notes/6510-duplicate-datafiles-and-datatables.md @@ -0,0 +1,22 @@ +We recently discovered a *potential* data integrity issue in +Dataverse databases. One manifests itself as duplicate DataFile +objects created for the same uploaded file (https://github.com/IQSS/dataverse/issues/6522); the other as duplicate +DataTable (tabular metadata) objects linked to the same +DataFile (https://github.com/IQSS/dataverse/issues/6510). This issue impacted approximately .03% of datasets in Harvard's Dataverse. + +To see if any datasets in your installation have been impacted by this data integrity issue, we've provided a diagnostic script here: + +https://github.com/IQSS/dataverse/raw/develop/scripts/issues/6510/check_datafiles_6522_6510.sh + +The script relies on the PostgreSQL utility psql to access the +database. You will need to edit the credentials at the top of the script +to match your database configuration. + +If neither of the two issues is present in your database, you will see +a message "... no duplicate DataFile objects in your database" and "no +tabular files affected by this issue in your database". + +If either, or both kinds of duplicates are detected, the script will +provide further instructions. We will need you to send us the produced +output. We will then assist you in resolving the issues in your +database. diff --git a/doc/release-notes/6522-datafile-duplicates.md b/doc/release-notes/6522-datafile-duplicates.md deleted file mode 100644 index 39abb49cd69..00000000000 --- a/doc/release-notes/6522-datafile-duplicates.md +++ /dev/null @@ -1,27 +0,0 @@ -In this Dataverse release, we are adding a database constraint to -prevent duplicate DataFile objects pointing to the same physical file -from being created. - -Before this release can be deployed, your database must be checked -for any such duplicates that may already exist. If present, -the duplicates will need to be deleted, and the integrity of the -stored physical files verified. - -(We have notified the community about this issue ahead of the release, -so you may have already addressed it. In this case, please disregard -this release note) - -Please run the diagnostic script provided at -https://github.com/IQSS/dataverse/raw/develop/scripts/issues/6522/find_duplicates.sh. -The script relies on the PostgreSQL utility `psql` to access the -database. You will need to edit the credentials at the top of the script -to match your database configuration. - -If this issue is not present in your database, you will see a message -`... no duplicate dvObjects in your database. Your installation is -ready to be upgraded to Dataverse 4.20`. - -If duplicates are detected, it will provide further instructions. We -will need you to send us the produced output. We will then assist you -in resolving this problem in your database. - diff --git a/doc/release-notes/6644-role-name-change.md b/doc/release-notes/6644-role-name-change.md new file mode 100644 index 00000000000..cc4df2fee75 --- /dev/null +++ b/doc/release-notes/6644-role-name-change.md @@ -0,0 +1 @@ +Note for integrators - the role alias has changed, so if anything was hard-coded to "editor" instead of "contributor" it'll need to be updated. \ No newline at end of file diff --git a/doc/release-notes/6711-coverage-badge b/doc/release-notes/6711-coverage-badge new file mode 100644 index 00000000000..fc90da68742 --- /dev/null +++ b/doc/release-notes/6711-coverage-badge @@ -0,0 +1,3 @@ +Integration Test Coverage Reporting + +API-based integration tests are run every time a branch is merged to develop and the percentage of code covered by these integration tests is now shown on a badge at the bottom of the README.md file that serves as the homepage of Dataverse Github Repository. \ No newline at end of file diff --git a/doc/release-notes/6725-analytics-bug.md b/doc/release-notes/6725-analytics-bug.md new file mode 100644 index 00000000000..bbb703fcb58 --- /dev/null +++ b/doc/release-notes/6725-analytics-bug.md @@ -0,0 +1,3 @@ +# Google Analytics Download Tracking Bug + +The button tracking capability discussed in the installation guide (http://guides.dataverse.org/en/4.20/installation/config.html#id88) relies on an analytics-code.html file that must be configured using the :WebAnalyticsCode setting. The example file provided in the installation guide is no longer compatible with recent Dataverse releases (>v4.16). Installations using this feature should update their analytics-code.html file by following the installation instructions using the updated example file. (Alternately, sites can modify their existing files to include the one-line change made in the example file at line 120.) \ No newline at end of file diff --git a/doc/sphinx-guides/source/_static/installation/files/var/www/dataverse/branding/analytics-code.html b/doc/sphinx-guides/source/_static/installation/files/var/www/dataverse/branding/analytics-code.html index 4e6a01f2d5d..ca703dddf11 100644 --- a/doc/sphinx-guides/source/_static/installation/files/var/www/dataverse/branding/analytics-code.html +++ b/doc/sphinx-guides/source/_static/installation/files/var/www/dataverse/branding/analytics-code.html @@ -117,7 +117,7 @@ var row = target.parents('tr')[0]; if(row != null) { //finds the file id/DOI in the Dataset page - label = $(row).find('td.col-file-metadata > a').attr('href'); + label = $(row).find('div.file-metadata-block > a').attr('href'); } else { //finds the file id/DOI in the file page label = $('#fileForm').attr('action'); diff --git a/doc/sphinx-guides/source/admin/dataverses-datasets.rst b/doc/sphinx-guides/source/admin/dataverses-datasets.rst index e542dee2d83..a4bea9f53e7 100644 --- a/doc/sphinx-guides/source/admin/dataverses-datasets.rst +++ b/doc/sphinx-guides/source/admin/dataverses-datasets.rst @@ -38,7 +38,27 @@ Add Dataverse RoleAssignments to Child Dataverses Recursively assigns the users and groups having a role(s),that are in the set configured to be inheritable via the :InheritParentRoleAssignments setting, on a specified dataverse to have the same role assignments on all of the dataverses that have been created within it. The response indicates success or failure and lists the individuals/groups and dataverses involved in the update. Only accessible to superusers. :: - curl -H "X-Dataverse-key: $API_TOKEN" http://$SERVER/api/admin/dataverse/$dataverse-alias//addRoleAssignmentsToChildren + curl -H "X-Dataverse-key: $API_TOKEN" http://$SERVER/api/admin/dataverse/$dataverse-alias/addRoleAssignmentsToChildren + +Configure a Dataverse to store all new files in a specific file store +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +To direct new files (uploaded when datasets are created or edited) for all datasets in a given dataverse, the store can be specified via the API as shown below, or by editing the 'General Information' for a Dataverse on the Dataverse page. Only accessible to superusers. :: + + curl -H "X-Dataverse-key: $API_TOKEN" -X PUT -d $storageDriverLabel http://$SERVER/api/admin/dataverse/$dataverse-alias/storageDriver + +The current driver can be seen using: + + curl -H "X-Dataverse-key: $API_TOKEN" http://$SERVER/api/admin/dataverse/$dataverse-alias/storageDriver + +and can be reset to the default store with: + + curl -H "X-Dataverse-key: $API_TOKEN" -X DELETE http://$SERVER/api/admin/dataverse/$dataverse-alias/storageDriver + +The available drivers can be listed with: + + curl -H "X-Dataverse-key: $API_TOKEN" http://$SERVER/api/admin/storageDrivers + Datasets -------- diff --git a/doc/sphinx-guides/source/api/native-api.rst b/doc/sphinx-guides/source/api/native-api.rst index bc296841581..fac4957bfb3 100644 --- a/doc/sphinx-guides/source/api/native-api.rst +++ b/doc/sphinx-guides/source/api/native-api.rst @@ -977,19 +977,18 @@ The fully expanded example above (without environment variables) looks like this curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE https://demo.dataverse.org/api/datasets/24/versions/:draft -Set Citation Date Field for a Dataset -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Set Citation Date Field Type for a Dataset +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Sets the dataset field type to be used as the citation date for the given dataset (if the dataset does not include the dataset field type, the default logic is used). The name of the dataset field type should be sent in the body of the request. -To revert to the default logic, use ``:publicationDate`` as the ``$DATASET_FIELD_TYPE_NAME``. -Note that the dataset field used has to be a date field. +Sets the dataset citation date field type for a given dataset. ``:publicationDate`` is the default. +Note that the dataset citation date field type must be a date field. .. code-block:: bash export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx export SERVER_URL=https://demo.dataverse.org export ID=24 - export DATASET_FIELD_TYPE_NAME=:publicationDate + export DATASET_FIELD_TYPE_NAME=:dateOfDeposit curl -H "X-Dataverse-key: $API_TOKEN" -X PUT $SERVER_URL/api/datasets/$ID/citationdate --data "$DATASET_FIELD_TYPE_NAME" @@ -997,12 +996,12 @@ The fully expanded example above (without environment variables) looks like this .. code-block:: bash - curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT https://demo.dataverse.org/api/datasets/24/citationdate --data ":publicationDate" + curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT https://demo.dataverse.org/api/datasets/24/citationdate --data ":dateOfDeposit" -Revert Citation Date Field to Default for Dataset -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Revert Citation Date Field Type to Default for Dataset +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Restores the default logic of the field type to be used as the citation date. Same as ``PUT`` with ``:publicationDate`` body: +Restores the default citation date field type, ``:publicationDate``, for a given dataset. .. code-block:: bash @@ -1812,11 +1811,39 @@ Redetect File Type Dataverse uses a variety of methods for determining file types (MIME types or content types) and these methods (listed below) are updated periodically. If you have files that have an unknown file type, you can have Dataverse attempt to redetect the file type. -When using the curl command below, you can pass ``dryRun=true`` if you don't want any changes to be saved to the database. Change this to ``dryRun=false`` (or omit it) to save the change. In the example below, the file is identified by database id "42". +When using the curl command below, you can pass ``dryRun=true`` if you don't want any changes to be saved to the database. Change this to ``dryRun=false`` (or omit it) to save the change. + +A curl example using an ``id`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + + curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/$ID/redetect?dryRun=true" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/24/redetect?dryRun=true" + +A curl example using a ``pid`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 + + curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/:persistentId/redetect?persistentId=$PERSISTENT_ID&dryRun=true" -``export FILE_ID=42`` +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash -``curl -H "X-Dataverse-key:$API_TOKEN" -X POST $SERVER_URL/api/files/$FILE_ID/redetect?dryRun=true`` + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/:persistentId/redetect?persistentId=doi:10.5072/FK2/AAA000&dryRun=true" Currently the following methods are used to detect file types: @@ -1827,39 +1854,162 @@ Currently the following methods are used to detect file types: Replacing Files ~~~~~~~~~~~~~~~ -Replace an existing file where ``id`` is the database id of the file to replace or ``pid`` is the persistent id (DOI or Handle) of the file. Requires the ``file`` to be passed as well as a ``jsonString`` expressing the new metadata. Note that metadata such as description, directoryLabel (File Path) and tags are not carried over from the file being replaced:: +Replace an existing file where ``ID`` is the database id of the file to replace or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file. Requires the ``file`` to be passed as well as a ``jsonString`` expressing the new metadata. Note that metadata such as description, directoryLabel (File Path) and tags are not carried over from the file being replaced. + +A curl example using an ``ID`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + + curl -H "X-Dataverse-key:$API_TOKEN" -X POST -F 'file=@file.extension' -F 'jsonData={json}' $SERVER_URL/api/files/$ID/metadata - POST -F 'file=@file.extension' -F 'jsonData={json}' http://$SERVER/api/files/{id}/metadata?key={apiKey} +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST -F 'file=@data.tsv' \ + -F 'jsonData={"description":"My description.","categories":["Data"],"forceReplace":false}' \ + https://demo.dataverse.org/api/files/24/replace + +A curl example using a ``PERSISTENT_ID`` -Example:: +.. code-block:: bash - curl -H "X-Dataverse-key:$API_TOKEN" -X POST -F 'file=@data.tsv' \ - -F 'jsonData={"description":"My description.","categories":["Data"],"forceReplace":false}'\ - "https://demo.dataverse.org/api/files/$FILE_ID/replace" + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 + + curl -H "X-Dataverse-key:$API_TOKEN" -X POST -F 'file=@file.extension' -F 'jsonData={json}' \ + "$SERVER_URL/api/files/:persistentId/metadata?persistentId=$PERSISTENT_ID" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST -F 'file=@data.tsv' \ + -F 'jsonData={"description":"My description.","categories":["Data"],"forceReplace":false}' \ + "https://demo.dataverse.org/api/files/:persistentId/metadata?persistentId=doi:10.5072/FK2/AAA000" Getting File Metadata ~~~~~~~~~~~~~~~~~~~~~ -Provides a json representation of the file metadata for an existing file where ``id`` is the database id of the file to replace or ``pid`` is the persistent id (DOI or Handle) of the file:: +Provides a json representation of the file metadata for an existing file where ``ID`` is the database id of the file to get metadata from or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file. + +A curl example using an ``ID`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 - GET http://$SERVER/api/files/{id}/metadata + curl $SERVER_URL/api/files/$ID/metadata -The current draft can also be viewed if you have permissions and pass your ``apiKey``:: +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash - GET http://$SERVER/api/files/{id}/metadata/draft?key={apiKey} + curl https://demo.dataverse.org/api/files/24/metadata + +A curl example using a ``PERSISTENT_ID`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 + + curl "$SERVER_URL/api/files/:persistentId/metadata?persistentId=$PERSISTENT_ID" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl "https://demo.dataverse.org/api/files/:persistentId/metadata?persistentId=doi:10.5072/FK2/AAA000" + +The current draft can also be viewed if you have permissions and pass your API token + +A curl example using an ``ID`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + + curl -H "X-Dataverse-key:$API_TOKEN" $SERVER_URL/api/files/$ID/metadata/draft + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" https://demo.dataverse.org/api/files/24/metadata/draft + +A curl example using a ``PERSISTENT_ID`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 + + curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/metadata/draft?persistentId=$PERSISTENT_ID" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/metadata/draft?persistentId=doi:10.5072/FK2/AAA000" Note: The ``id`` returned in the json response is the id of the file metadata version. Updating File Metadata ~~~~~~~~~~~~~~~~~~~~~~ -Updates the file metadata for an existing file where ``id`` is the database id of the file to replace or ``pid`` is the persistent id (DOI or Handle) of the file. Requires a ``jsonString`` expressing the new metadata. No metadata from the previous version of this file will be persisted, so if you want to update a specific field first get the json with the above command and alter the fields you want:: +Updates the file metadata for an existing file where ``ID`` is the database id of the file to update or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file. Requires a ``jsonString`` expressing the new metadata. No metadata from the previous version of this file will be persisted, so if you want to update a specific field first get the json with the above command and alter the fields you want. - POST -F 'jsonData={json}' http://$SERVER/api/files/{id}/metadata?key={apiKey} +A curl example using an ``ID`` -Example:: +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 - curl -H "X-Dataverse-key:{apiKey}" -X POST -F 'jsonData={"description":"My description bbb.","provFreeform":"Test prov freeform","categories":["Data"],"restrict":false}' 'http://localhost:8080/api/files/264/metadata' + curl -H "X-Dataverse-key:$API_TOKEN" -X POST \ + -F 'jsonData={"description":"My description bbb.","provFreeform":"Test prov freeform","categories":["Data"],"restrict":false}' \ + $SERVER_URL/api/files/$ID/metadata + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \ + -F 'jsonData={"description":"My description bbb.","provFreeform":"Test prov freeform","categories":["Data"],"restrict":false}' \ + http://demo.dataverse.org/api/files/24/metadata + +A curl example using a ``PERSISTENT_ID`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 + + curl -H "X-Dataverse-key:$API_TOKEN" -X POST \ + -F 'jsonData={"description":"My description bbb.","provFreeform":"Test prov freeform","categories":["Data"],"restrict":false}' \ + "$SERVER_URL/api/files/:persistentId/metadata?persistentId=$PERSISTENT_ID" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \ + -F 'jsonData={"description":"My description bbb.","provFreeform":"Test prov freeform","categories":["Data"],"restrict":false}' \ + "https://demo.dataverse.org/api/files/:persistentId/metadata?persistentId=doi:10.5072/FK2/AAA000" Also note that dataFileTags are not versioned and changes to these will update the published version of the file. @@ -1868,45 +2018,249 @@ Also note that dataFileTags are not versioned and changes to these will update t Editing Variable Level Metadata ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Updates variable level metadata using ddi xml ``$file``, where ``$id`` is file id:: +Updates variable level metadata using ddi xml ``FILE``, where ``ID`` is file id. - PUT https://$SERVER/api/edit/$id --upload-file $file +A curl example using an ``ID`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + export FILE=dct.xml + + curl -H "X-Dataverse-key:$API_TOKEN" -X PUT $SERVER_URL/api/edit/$ID --upload-file $FILE + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash -Example: ``curl -H "X-Dataverse-key:$API_TOKEN" -X PUT http://localhost:8080/api/edit/95 --upload-file dct.xml`` + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT https://demo.dataverse.org/api/edit/24 --upload-file dct.xml You can download :download:`dct.xml <../../../../src/test/resources/xml/dct.xml>` from the example above to see what the XML looks like. Provenance ~~~~~~~~~~ -Get Provenance JSON for an uploaded file:: - GET http://$SERVER/api/files/{id}/prov-json?key=$apiKey +Get Provenance JSON for an uploaded file +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Get Provenance Description for an uploaded file:: +A curl example using an ``ID`` - GET http://$SERVER/api/files/{id}/prov-freeform?key=$apiKey +.. code-block:: bash -Create/Update Provenance JSON and provide related entity name for an uploaded file:: + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 - POST http://$SERVER/api/files/{id}/prov-json?key=$apiKey&entityName=$entity -H "Content-type:application/json" --upload-file $filePath + curl -H "X-Dataverse-key:$API_TOKEN" $SERVER_URL/api/files/$ID/prov-json -Create/Update Provenance Description for an uploaded file. Requires a JSON file with the description connected to a key named "text":: +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" https://demo.dataverse.org/api/files/24/prov-json + +A curl example using a ``PERSISTENT_ID`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 + + curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/prov-json?persistentId=$PERSISTENT_ID" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/prov-json?persistentId=doi:10.5072/FK2/AAA000" + +Get Provenance Description for an uploaded file +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +A curl example using an ``ID`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + + curl -H "X-Dataverse-key:$API_TOKEN" $SERVER_URL/api/files/$ID/prov-freeform + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" https://demo.dataverse.org/api/files/24/prov-freeform + +A curl example using a ``PERSISTENT_ID`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 + + curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/prov-freeform?persistentId=$PERSISTENT_ID" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/prov-freeform?persistentId=doi:10.5072/FK2/AAA000" + +Create/Update Provenance JSON and provide related entity name for an uploaded file +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +A curl example using an ``ID`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + export ENTITY_NAME="..." + export FILE_PATH=provenance.json + + curl -H "X-Dataverse-key:$API_TOKEN" -X POST $SERVER_URL/api/files/$ID/prov-json?entityName=$ENTITY_NAME -H "Content-type:application/json" --upload-file $FILE_PATH + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/24/prov-json?entityName=..." -H "Content-type:application/json" --upload-file provenance.json + +A curl example using a ``PERSISTENT_ID`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 + export ENTITY_NAME="..." + export FILE_PATH=provenance.json + + curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/:persistentId/prov-json?persistentId=$PERSISTENT_ID&entityName=$ENTITY_NAME" -H "Content-type:application/json" --upload-file $FILE_PATH + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash - POST http://$SERVER/api/files/{id}/prov-freeform?key=$apiKey -H "Content-type:application/json" --upload-file $filePath + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/:persistentId/prov-json?persistentId=doi:10.5072/FK2/AAA000&entityName=..." -H "Content-type:application/json" --upload-file provenance.json -Delete Provenance JSON for an uploaded file:: +Create/Update Provenance Description for an uploaded file +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - DELETE http://$SERVER/api/files/{id}/prov-json?key=$apiKey +Requires a JSON file with the description connected to a key named "text" + +A curl example using an ``ID`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + export FILE_PATH=provenance.json + + curl -H "X-Dataverse-key:$API_TOKEN" -X POST $SERVER_URL/api/files/$ID/prov-freeform -H "Content-type:application/json" --upload-file $FILE_PATH + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST https://demo.dataverse.org/api/files/24/prov-freeform -H "Content-type:application/json" --upload-file provenance.json + +A curl example using a ``PERSISTENT_ID`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 + export FILE_PATH=provenance.json + + curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/:persistentId/prov-freeform?persistentId=$PERSISTENT_ID" -H "Content-type:application/json" --upload-file $FILE_PATH + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/:persistentId/prov-freeform?persistentId=doi:10.5072/FK2/AAA000" -H "Content-type:application/json" --upload-file provenance.json + +Delete Provenance JSON for an uploaded file +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +A curl example using an ``ID`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + + curl -H "X-Dataverse-key:$API_TOKEN" -X DELETE $SERVER_URL/api/files/$ID/prov-json + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE https://demo.dataverse.org/api/files/24/prov-json + +A curl example using a ``PERSISTENT_ID`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 + + curl -H "X-Dataverse-key:$API_TOKEN" -X DELETE "$SERVER_URL/api/files/:persistentId/prov-json?persistentId=$PERSISTENT_ID" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE "https://demo.dataverse.org/api/files/:persistentId/prov-json?persistentId=doi:10.5072/FK2/AAA000" Datafile Integrity ~~~~~~~~~~~~~~~~~~ -Starting the release 4.10 the size of the saved original file (for an ingested tabular datafile) is stored in the database. The following API will retrieve and permanently store the sizes for any already existing saved originals:: +Starting the release 4.10 the size of the saved original file (for an ingested tabular datafile) is stored in the database. The following API will retrieve and permanently store the sizes for any already existing saved originals: - GET http://$SERVER/api/admin/datafiles/integrity/fixmissingoriginalsizes{?limit=N} +.. code-block:: bash + + export SERVER_URL=https://localhost + + curl $SERVER_URL/api/admin/datafiles/integrity/fixmissingoriginalsizes + +with limit parameter: + +.. code-block:: bash + + export SERVER_URL=https://localhost + export LIMIT=10 + + curl "$SERVER_URL/api/admin/datafiles/integrity/fixmissingoriginalsizes?limit=$LIMIT" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl https://localhost/api/admin/datafiles/integrity/fixmissingoriginalsizes" + +with limit parameter: + +.. code-block:: bash + + curl https://localhost/api/admin/datafiles/integrity/fixmissingoriginalsizes?limit=10" Note the optional "limit" parameter. Without it, the API will attempt to populate the sizes for all the saved originals that don't have them in the database yet. Otherwise it will do so for the first N such datafiles. +By default, the admin API calls are blocked and can only be called from localhost. See more details in :ref:`:BlockedApiEndpoints <:BlockedApiEndpoints>` and :ref:`:BlockedApiPolicy <:BlockedApiPolicy>` settings in :doc:`/installation/config`. + Users Token Management ---------------------- diff --git a/doc/sphinx-guides/source/api/search.rst b/doc/sphinx-guides/source/api/search.rst index baae7820de8..25bad2b8091 100755 --- a/doc/sphinx-guides/source/api/search.rst +++ b/doc/sphinx-guides/source/api/search.rst @@ -116,6 +116,7 @@ https://demo.dataverse.org/api/search?q=trees "Astronomy and Astrophysics", "Other" ], + "fileCount":3, "versionId":1260, "versionState":"RELEASED", "majorVersion":3, @@ -291,6 +292,7 @@ The above example ``fq=publicationStatus:Published`` retrieves only "RELEASED" v "subjects": [ "Medicine, Health and Life Sciences" ], + "fileCount":6, "versionId": 53001, "versionState": "RELEASED", "majorVersion": 1, @@ -323,6 +325,7 @@ The above example ``fq=publicationStatus:Published`` retrieves only "RELEASED" v "subjects": [ "Medicine, Health and Life Sciences" ], + "fileCount":9, "versionId": 53444, "versionState": "RELEASED", "majorVersion": 1, diff --git a/doc/sphinx-guides/source/developers/big-data-support.rst b/doc/sphinx-guides/source/developers/big-data-support.rst index 37a794e804e..bb16dd9133d 100644 --- a/doc/sphinx-guides/source/developers/big-data-support.rst +++ b/doc/sphinx-guides/source/developers/big-data-support.rst @@ -18,7 +18,7 @@ Install a DCM Installation instructions can be found at https://github.com/sbgrid/data-capture-module/blob/master/doc/installation.md. Note that shared storage (posix or AWS S3) between Dataverse and your DCM is required. You cannot use a DCM with Swift at this point in time. -.. FIXME: Explain what ``dataverse.files.dcm-s3-bucket-name`` is for and what it has to do with ``dataverse.files.s3-bucket-name``. +.. FIXME: Explain what ``dataverse.files.dcm-s3-bucket-name`` is for and what it has to do with ``dataverse.files.s3.bucket-name``. Once you have installed a DCM, you will need to configure two database settings on the Dataverse side. These settings are documented in the :doc:`/installation/config` section of the Installation Guide: @@ -100,6 +100,7 @@ Optional steps for setting up the S3 Docker DCM Variant ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Before: the default bucket for DCM to hold files in S3 is named test-dcm. It is coded into `post_upload_s3.bash` (line 30). Change to a different bucket if needed. +- Also Note: With the new support for multiple file store in Dataverse, DCM requires a store with id="s3" and DCM will only work with this store. - Add AWS bucket info to dcmsrv - Add AWS credentials to ``~/.aws/credentials`` @@ -115,6 +116,9 @@ Optional steps for setting up the S3 Docker DCM Variant - ``cd /opt/glassfish4/bin/`` - ``./asadmin delete-jvm-options "\-Ddataverse.files.storage-driver-id=file"`` - ``./asadmin create-jvm-options "\-Ddataverse.files.storage-driver-id=s3"`` + - ``./asadmin create-jvm-options "\-Ddataverse.files.s3.type=s3"`` + - ``./asadmin create-jvm-options "\-Ddataverse.files.s3.label=s3"`` + - Add AWS bucket info to Dataverse - Add AWS credentials to ``~/.aws/credentials`` @@ -132,7 +136,7 @@ Optional steps for setting up the S3 Docker DCM Variant - S3 bucket for Dataverse - - ``/usr/local/glassfish4/glassfish/bin/asadmin create-jvm-options "-Ddataverse.files.s3-bucket-name=iqsstestdcmbucket"`` + - ``/usr/local/glassfish4/glassfish/bin/asadmin create-jvm-options "-Ddataverse.files.s3.bucket-name=iqsstestdcmbucket"`` - S3 bucket for DCM (as Dataverse needs to do the copy over) diff --git a/doc/sphinx-guides/source/installation/config.rst b/doc/sphinx-guides/source/installation/config.rst index 74c11dde5d1..bfa66b97eb1 100644 --- a/doc/sphinx-guides/source/installation/config.rst +++ b/doc/sphinx-guides/source/installation/config.rst @@ -215,10 +215,46 @@ As for the "Remote only" authentication mode, it means that: - ``:DefaultAuthProvider`` has been set to use the desired authentication provider - The "builtin" authentication provider has been disabled (:ref:`api-toggle-auth-provider`). Note that disabling the "builtin" authentication provider means that the API endpoint for converting an account from a remote auth provider will not work. Converting directly from one remote authentication provider to another (i.e. from GitHub to Google) is not supported. Conversion from remote is always to "builtin". Then the user initiates a conversion from "builtin" to remote. Note that longer term, the plan is to permit multiple login options to the same Dataverse account per https://github.com/IQSS/dataverse/issues/3487 (so all this talk of conversion will be moot) but for now users can only use a single login option, as explained in the :doc:`/user/account` section of the User Guide. In short, "remote only" might work for you if you only plan to use a single remote authentication provider such that no conversion between remote authentication providers will be necessary. -File Storage: Local Filesystem vs. Swift vs. S3 ------------------------------------------------ +File Storage: Using a Local Filesystem and/or Swift and/or S3 object stores +--------------------------------------------------------------------------- -By default, a Dataverse installation stores data files (files uploaded by end users) on the filesystem at ``/usr/local/glassfish4/glassfish/domains/domain1/files`` but this path can vary based on answers you gave to the installer (see the :ref:`dataverse-installer` section of the Installation Guide) or afterward by reconfiguring the ``dataverse.files.directory`` JVM option described below. +By default, a Dataverse installation stores all data files (files uploaded by end users) on the filesystem at ``/usr/local/glassfish4/glassfish/domains/domain1/files``. This path can vary based on answers you gave to the installer (see the :ref:`dataverse-installer` section of the Installation Guide) or afterward by reconfiguring the ``dataverse.files.directory`` JVM option described below. + +Dataverse can alternately store files in a Swift or S3-compatible object store, and can now be configured to support multiple stores at once. With a multi-store configuration, the location for new files can be controlled on a per-dataverse basis. + +The following sections describe how to set up various types of stores and how to configure for multiple stores. + +Multi-store Basics ++++++++++++++++++ + +To support multiple stores, Dataverse now requires an id, type, and label for each store (even for a single store configuration). These are configured by defining two required jvm options: + +.. code-block:: none + + ./asadmin $ASADMIN_OPTS create-jvm-options "\-Ddataverse.files..type=" + ./asadmin $ASADMIN_OPTS create-jvm-options "\-Ddataverse.files..label=