Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset relations POC #335

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Dataset relations POC #335

wants to merge 2 commits into from

Conversation

amercader
Copy link
Member

This change implements a potential approach for handling relations between datasets as discussed in #331. It explores the "store relations in dataset metadata" approach rather than the "store relations in a separate db"

Right now it comprises mainly two sides:

  • A dataset_relation preset that extends the multiple_text one to add a custom validator and eventually custom form/display snippets. The validator checks that the field items are either a valid dataset id or a URI. The form widget would allow to choose existing datasets in the site or paste a URI.
  • Changes in the base profiles to handle certain fields like has_version or source. When serializing, if the value is an URI is left unchanged but if it's a dataset id, a dataset URI is generated for it

This allows us to go from the current serialization where we just dump whatever there is in e.g. the has_version field:

@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://localhost:5015/dataset/bdde0f9f-6311-4770-a3b8-e946c822d629> a dcat:Dataset ;
    dct:description "sadas" ;
    dct:hasVersion <http://some.uri.somewhere.else>,
        "fd2c4eaf-0bc3-48d1-8d3f-58e73f8c674d" ;
    dct:identifier "bdde0f9f-6311-4770-a3b8-e946c822d629" .

To exposing actual URIs to the other datasets in the catalog:

@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://localhost:5015/dataset/bdde0f9f-6311-4770-a3b8-e946c822d629> a dcat:Dataset ;
    dct:description "sadas" ;
    dct:hasVersion <http://localhost:5015/dataset/fd2c4eaf-0bc3-48d1-8d3f-58e73f8c674d>,
        <http://some.uri.somewhere.else> ;
    dct:identifier "bdde0f9f-6311-4770-a3b8-e946c822d629"

Other things that are missing:

  • Avoiding stale relations, .e.g. what happens when a dataset involved in a relation is deleted? This could be handled with an after_dataset_deleted hook that fires a background job that uses the search to find datasets that use the deleted dataset id an relation and patch them the remove that relation (This would require indexing the relation field values as lists)
  • If desirable, another background job could create the inverse relations (e.g. is_version_of) although DCAT-AP 3 explicitly says that these are not necessary.
  • Form widget: uses the search to get a list of datasets matching a query (might be tricky in sites with many datasets to find the one you want), or you can paste a dataset page URL or an external URI.
  • Display widget: here we'll have the common problem that we are just storing an id but we probably want a title and a name in the template to generate a link. We then either need to call package_show on each page request or store this info somehow at index time, but then it can get stale if the title is updated.

This change implements a potential approach for handling relations
between datasets. It is mainly comprised of two sides:

* A `dataset_relation` preset that extends the multiple_text one to add
  a custom validator and eventually custom form/display snippets. The
  validator checks that the field items are either a valid dataset id or
  a URI. The form widget would allow to choose existing datasets in the
  site or paste a URI
* Changes in the base profiles to handle certain fields like
  `has_version` or `source`. When serializing, if the value is an URI is
  left unchanged but if it's a dataset id, a dataset URI is generated
  for it
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant