Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multiple contact points/publishers #307

Closed
hcvdwerf opened this issue Sep 26, 2024 · 11 comments · Fixed by #317
Closed

Support for multiple contact points/publishers #307

hcvdwerf opened this issue Sep 26, 2024 · 11 comments · Fixed by #317

Comments

@hcvdwerf
Copy link
Contributor

hcvdwerf commented Sep 26, 2024

DCAT-AP 3.0 makes the inclusion of multiple contacts mandatory for datasets. Specifically, the dcat:contactPoint property must be able to reference multiple contact points. This requirement aligns with the latest DCAT-AP specification.

Code I use to test

@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ldp: <http://www.w3.org/ns/ldp#> .

<https://health-ri.sandbox.semlab-leiden.nl/dataset/d7129d28-b72a-437f-8db0-4f0258dd3c25>
  a dcat:Resource, dcat:Dataset;
  <http://www.w3.org/2000/01/rdf-schema#label> "Example";
  dcterms:title "Example";
  <https://w3id.org/fdp/fdp-o#metadataIssued> "2023-09-05T12:00:36.276171042Z"^^xsd:dateTime;
  <https://w3id.org/fdp/fdp-o#metadataModified> "2024-05-02T13:01:35.716385359Z"^^xsd:dateTime;
  dcterms:license <http://rdflicense.appspot.com/rdflicense/cc-by-nc-nd3.0>;
  dcterms:description "This is an example description.";
  dcat:contactPoint <https://orcid.org/0000-0002-9095-9201>, <https://orcid.org/0000-0003-2558-7496> .
@pytest.mark.ckan_config("ckan.plugins", "scheming_datasets")
@pytest.mark.usefixtures("with_plugins")
def test_profile_contact_point_multiple_uris():
    fdp_record_to_package = FairDataPointRecordToPackageConverter(profile="fairdatapoint_dcat_ap")
    data = Graph().parse(Path(TEST_DATA_DIRECTORY, "contact_point_multiple_urls.ttl")).serialize()
    actual = fdp_record_to_package.record_to_package(
        guid="https://health-ri.sandbox.semlab-leiden.nl/catalog/e3faf7ad-050c-475f-8ce4-da7e2faa5cd0;"
             "dataset=https://health-ri.sandbox.semlab-leiden.nl/dataset/d7129d28-b72a-437f-8db0-4f0258dd3c25",
        record=data)
    expected = {
        'extras': [],
        'title': 'Example',
        'notes': 'This is an example description.',
        'contact': [
            {
                'uri': 'https://orcid.org/0000-0002-9095-9201'
            },
            {
                'uri': 'https://orcid.org/0000-0003-2558-7496'
            }
        ],
        'license_id': '',
        'resources': [],
        'tags': [],
        'uri': 'https://health-ri.sandbox.semlab-leiden.nl/dataset/d7129d28-b72a-437f-8db0-4f0258dd3c25'
    }
    assert actual == expected
@hcvdwerf
Copy link
Contributor Author

Same for publisher

@hcvdwerf hcvdwerf changed the title Support for multiple contact points Support for multiple contact points/publishers Sep 26, 2024
@amercader
Copy link
Member

@hcvdwerf The handling of multiple contact points is done via the repeating_subfields preset. This is already working in the euro_dcat_ap_3 and euro_dcat_ap_2 + euro_dcat_ap_scheming profiles. It allows you to define multiple contact points:

Screenshot 2024-10-01 at 11-35-08 Create Dataset - CKAN

In your example, you would only provide the URI field.

As for publishers, DCAT AP 3 only allows one instance (0..1) but if you want to support multiple publishers you can remove the repeating_once: true bit on the field schema definition.

I'm not keen on extending support for multiple instances to the legacy harvesters using contact_* extras as this would add complexity and we want to encourage users to migrate to scheming-based profiles.

Let me know if this makes sense

@hcvdwerf
Copy link
Contributor Author

hcvdwerf commented Oct 1, 2024

I think the issue arises during harvesting, not when entering data. If you refer to this pull request, you'll notice that the _publisher function does not return a list.

I’ve run multiple tests with various contact points, but only one is returned. Also, the repeating_once: true property is not defined in the DCAT extension itself.

For reference, you can see a class full of tests involving multiple contact points here: test_profiles.py.

@Markus92
Copy link
Contributor

Markus92 commented Oct 4, 2024

There's defintely a bug in the harvester: even though there is a for-loop, it only returns a single dict. The dict gets overwritten on every iteration. The changes in this PR make it such that a list is returned of dicts, which we understood to be the intended be intended behavior.

@amercader If you'd like the legacy fields to remain as is, we can modify the PR such that it will only ever return one element in that case(publisher/contact). The only thing is you'd have no control whatsoever which publisher/contact it would be in the case of multi-valued input, as graphs are inherently out-of-order (it'd be pretty much random, or down to the used memory allocator, to determine the order they are looped over).
We use the scheming ourselves too, no legacy stuff.

Mark,
SW Engineer at Health-RI

@amercader
Copy link
Member

@Markus92 @hcvdwerf let me have a closer look next week, but I agree that it seems like a bug in the harvester. It shouldn't matter whether a dataset was harvested, created via UI or API, all use the same DCAT -> CKAN parser
We probably need to review the harvester to make sure scheming-based datasets store all fields properly

Also, the repeating_once: true property is not defined in the DCAT extension itself.

This is just used here to prevent showing the "Add more" button in the form for this particular field, but yeah, datasets created via API or harvester can have (or should have after we fix this bug) multiple publishers

@hcvdwerf
Copy link
Contributor Author

hcvdwerf commented Oct 21, 2024

@amercader any update from your side? Tnx in advance!

@amercader
Copy link
Member

@hcvdwerf Sorry I finally got time to work on this. I encountered the same issue while working on #314 and I'm hoping to have a generic fix for this today. I'll ping you when it's ready so you can test if it fixes your issue.

@amercader
Copy link
Member

@hcvdwerf @Markus92 let me know if #317 works for your issue

@hcvdwerf
Copy link
Contributor Author

hcvdwerf commented Oct 28, 2024

@amercader I have looked to the code and did some tests, but until now it is working. For example
Screenshot 2024-10-28 at 14 00 39

@hcvdwerf
Copy link
Contributor Author

@amercader When are you planning to do a release?

@amercader
Copy link
Member

This week or the next one, once the DCAT US and multilingual PRs are merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants