Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TIME namespace definition to use DCAT recommendation #344

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mjanez
Copy link

@mjanez mjanez commented Mar 4, 2025

Description

Context

This PR addresses an issue with namespace definitions while working on implementing profiles according to the contributing guidelines. We've been developing custom profiles for the Spanish application profiles - both the current NTI-RISP (based on DCAT) and the future Spanish profile based on DCAT-AP.

Although this can be handled within the harvester, following DCAT's recommendation to use the more common namespace would be beneficial, especially since many publishers rely on ckanext-dcat to serialize their RDF for the national catalog.

Problem

During development, we identified an issue with the TIME namespace definition in base.py. The current implementation defines TIME without the trailing hash (#):

TIME = Namespace("http://www.w3.org/2006/time")

However, the correct URI according to W3C specifications and DCAT-AP should include the trailing hash:

TIME = Namespace("http://www.w3.org/2006/time#")

Impact

This incorrect namespace definition causes properties like time:years, time:days, etc. to be serialized with incorrect URIs:

  • Current (incorrect): http://www.w3.org/2006/timeyears
  • Expected (correct): http://www.w3.org/2006/time#years

This breaks interoperability with standards-compliant harvesters, especially for dct:accrualPeriodicity data structures for federation with portals like datos.gob.es.

Example of the issue

We found datasets that weren't federating correctly with datos.gob.es. Examining the RDF revealed that years/days properties weren't being properly recognized:

Incorrect (won't federate):

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
  xmlns:adms="http://www.w3.org/ns/adms#"
  xmlns:hydra="http://www.w3.org/ns/hydra/core#"
  xmlns:foaf="http://xmlns.com/foaf/0.1/"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:dcat="http://www.w3.org/ns/dcat#"
  xmlns:locn="http://www.w3.org/ns/locn#"
  xmlns:vcard="http://www.w3.org/2006/vcard/ns#"
  xmlns:schema="http://schema.org/"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:dct="http://purl.org/dc/terms/"
  xmlns:odrs="http://schema.theodi.org/odrs#"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:time="http://www.w3.org/2006/time"
  xmlns:cnt="http://www.w3.org/2011/content#"
>
...

<dcat:Dataset rdf:about="https://example.org/catalogo/dataset/54336a93-2478-44fc-bb78-696c77cff5c2">
  <dct:accrualPeriodicity>
    <dct:Frequency rdf:nodeID="Ne107337f7f944ee0b67a3813d529b04f">
      <rdf:value>
        <time:DurationDescription rdf:nodeID="N5a50f9faffc542649ba2ba516a14a837">
          <rdfs:label>2 años</rdfs:label>
          <time:years rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">2</time:years>
        </time:DurationDescription>
      </rdf:value>
    </dct:Frequency>
  </dct:accrualPeriodicity>

...

Here, time:days points to http://www.w3.org/2006/timeyears instead of http://www.w3.org/2006/time#years.

Correctly federated example:

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
  xmlns:adms="http://www.w3.org/ns/adms#"
  xmlns:hydra="http://www.w3.org/ns/hydra/core#"
  xmlns:foaf="http://xmlns.com/foaf/0.1/"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:dcat="http://www.w3.org/ns/dcat#"
  xmlns:locn="http://www.w3.org/ns/locn#"
  xmlns:vcard="http://www.w3.org/2006/vcard/ns#"
  xmlns:schema="http://schema.org/"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:dct="http://purl.org/dc/terms/"
  xmlns:odrs="http://schema.theodi.org/odrs#"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:time="http://www.w3.org/2006/time#"
  xmlns:cnt="http://www.w3.org/2011/content#"
>
...

<dcat:Dataset rdf:about="https://example.org/catalogo/dataset/54336a93-2478-44fc-bb78-696c77cff5c2">
  <dct:accrualPeriodicity>
    <dct:Frequency rdf:nodeID="Ne107337f7f944ee0b67a3813d529b04f">
      <rdf:value>
        <time:DurationDescription rdf:nodeID="N5a50f9faffc542649ba2ba516a14a837">
          <rdfs:label>2 años</rdfs:label>
          <time:years rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">2</time:years>
        </time:DurationDescription>
      </rdf:value>
    </dct:Frequency>
  </dct:accrualPeriodicity>

...

Solution

This PR updates the namespace definition in base.py to use the correct URI with the trailing hash, as specified in DCAT:

@@ -23,7 +23,7 @@
 VCARD = Namespace("http://www.w3.org/2006/vcard/ns#")
 FOAF = Namespace("http://xmlns.com/foaf/0.1/")
 SCHEMA = Namespace("http://schema.org/")
-TIME = Namespace("http://www.w3.org/2006/time")
+TIME = Namespace("http://www.w3.org/2006/time#")
 LOCN = Namespace("http://www.w3.org/ns/locn#")
 GSP = Namespace("http://www.opengis.net/ont/geosparql#")
 OWL = Namespace("http://www.w3.org/2002/07/owl#")

Also update namespace in tests

Testing & Verification

We've verified this fix by manually applying it to our local installation and confirming that datasets with temporal properties using the TIME ontology (e.g., dct:accrualPeriodicity) are now correctly harvested by external systems.

Tests also pass.

References

mjanez added a commit to mjanez/ckanext-schemingdcat that referenced this pull request Mar 4, 2025
@amercader
Copy link
Member

@mjanez thanks for the detailed report.

The fix looks good, but I'm curious to know where are you getting serializations with the incorrect format http://www.w3.org/2006/timeyears. Is that in ckanext-dcat own DCAT serializations? or is in external systems consuming the serializations generated by CKAN?
Just trying to think if we need fixes in other parts of the extension.

@mjanez
Copy link
Author

mjanez commented Mar 13, 2025

Thanks @amercader

Good point, you're right that the example I gave with time:years might be confusing since it’s not from the vanilla ckanext-dcat.

The context here is an external system that consumes RDF serialization provided by a custom DCAT ES profile based on ckanext-dcat, and it inherited its own TIME namespace.

The specific case serializes the RDF with time periods using time:DurationDescription instead of the URIRefOrLiteral serialization, which wasn’t compatible with the “current” application profile (pre-DCAT-AP), producing something like this:

dct:accrualPeriodicity 
[ 
    a dct:Frequency; 
    rdf:value 
    [ 
        a time:DurationDescription; 
        rdfs:label "{time-interval}"; 
        time:{period} {n}. 
    ]; 
    rdfs:label "Every {time-interval}". 
]; 

The issue is that the harvesting extension defines its own TIME namespace, and it did have the #:

https://github.com/ctt-gob-es/datos.gob.es/blob/30c4a0d97356e0caf948aff2bb74790f4885c67f/ckan/ckanext-dge-harvest/ckanext/dge_harvest/profiles.py#L54

So, when a publisher generates RDF that doesn’t match the same TIME namespace declaration, it fails and throws an error. Because of the parsing method's quirks, it never parsed the update frequency.

Probably the only thing to check would be if all namespaces are used canonically, which is probably the case since this is the first bug of this kind we've encountered. But I can check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants