Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration from ClosedNamespace to DeclaredNamespace #1074

Merged
merged 31 commits into from
Jul 14, 2021
Merged

Migration from ClosedNamespace to DeclaredNamespace #1074

merged 31 commits into from
Jul 14, 2021

Conversation

hsolbrig
Copy link
Contributor

Instead of ClosedNamespace which throws errors, moved to (auto-generated) DeclaredNamespace that includes type hints for all members and (optionally) raises runtime warnings.

This passes all of the unit tests locally (in PyCharm) but I had to change 2 import statements (from .manifest and from .earl) to from manifest and from earl. If this breaks anything they can be changed back - test_nquads_w3c.py, test_dawg.py, test_nt_w3c.py, test_trig_w3c.py, test_turtle_w3c.py.

Passes all unit tests, but did have to remove '.' from manifest and earl imports.  May have to restore...
@hsolbrig
Copy link
Contributor Author

Goodness - looks like I've still got some work to do...

@tgbugs
Copy link
Contributor

tgbugs commented May 27, 2020

General question about caps vs lowercase. I prefer to match the way the namespace is conventionally serialized in a ttl file (e.g. I perfer the rdfs namespace to be lowercase). This seems to be at odds with the rdflib conventions, but I wonder whether it might help when searching codebases for namespaces if they matched what a user would see in the serialized form (among other things).

Copy link
Contributor

@tgbugs tgbugs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pulling all of this together. In general it looks good and I like the approach. I don't entirely understand all of the decisions that were made so have left some questions/comments. I also can't seem to find the code that is used to generate the new files, did I just miss it?

The main concern I have right now is that it seems that the types and default behavior of some of the existing of the namespaces has changed in ways that are not necessary and that are likely to create additional work for anyone who happens to be using them (e.g. the RDF namespace is no longer closed and no longer a ClosedNamespace).

for c in cls.mro() if issubclass(c, DefinedNamespace))


class DefinedNamespace(metaclass=DefinedNamespaceMeta):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has ClosedNamespace not been a subclass of Namespace all this time? Should ClosedNamespace and DefinedNamespace be subclasses of Namespace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't touched ClosedNamespace, so yes.

DefinedNamespace's behavior is sufficiently different that it is easy (and safer, imo) to implement it as a wrapper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in response to the generator question above, yes - it was built using https://github.com/hsolbrig/definednamespace -- once (if) the above changes are accepted in rdflib then I can do a cleanup step. It is broken right at the moment and needs a good deal more documentation.

It leads to a question, however. I could drop the generator into the tools directory if it made sense or leave it as a separate package. Will be a bit 'o a pita to get all the unit tests and the like migrated so it may be a little while...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think getting it into the primary tools directory of this repository is a good idea. A more extreme view would be that it is preferable for any code generation components to be packaged with the core, and thus that inclusion be a perquisite for merging this PR, but that seems silly to me since the generation code can be merged at its own pace. Let's see what others think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having the generator in tools is the way to go: use is decoupled.

@@ -53,7 +53,6 @@
"BNode",
"Literal",
"Variable",
"Namespace",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be removed from the rdflib (python) * namespace, it will break a bunch of existing codebases. Was there a reason why it was removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch -- my bad in the conversion. I was wondering why (surprise) I broke a bunch of existing codebases.

@@ -539,9 +306,9 @@ def __init__(self, graph):
for p, n in self.namespaces(): # self.bind is not always called
insert_trie(self.__trie, str(n))
self.bind("xml", "http://www.w3.org/XML/1998/namespace")
self.bind("rdf", RDF)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import cycle. It is a weak spot in the code, but I'd have to enhance the auto-generator to get around it...

"TIME",
"VOID",
"XMLNS",
"XSD",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why where these removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not have been. Probably a cut and paste error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, they weren't. namespace.py no longer exists -- instead becoming namespace/__init__.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I see. However, the classes were lifted to modules, so anyone who imported rdflib.namespace.TIME will get a nasty surprise when TIME is suddenly a module rather than a class. I suspect that this means that the files will need to be renamed to avoid collisions in the rdflib.namespace (python) namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it should work as expected, as the module is named _TIME and it is lifted in rdflib.namespace. In general, if there are nasty surprises waiting, shouldn't there be tests for them? I'd been running under the assumption that if something matters there is (or will be) a unit test for it -- is this not the case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies I must have brainfarted hard on the file names or been fooled by the fact that github didn't update them in the interface. I see that this was already addressed by 91015c1, so before I made the previous comment, and addresses exactly the nasty surprise that I was imagining. Whether there were explicit tests for the class names, I suspect that there were quite a few implicit tests simply by virtue of having to import them at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup - one of the take-home lessons I've learned from this push is that it is far more important to review changes to unit tests than to the code base itself. I (stupidly) made several changes to the unit tests to make them match the new code, initially hiding the "nasty surprises". Hopefully I have undone all of those and can somewhat confidently assert that the set of nasty surprises in the unit tests have been covered.

from rdflib.namespace import DefinedNamespace, Namespace


class RDF(DefinedNamespace):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this still match the behavior of CloseNamespace or is there a reason for the change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can mimic the behavior of ClosedNamespace by setting RDF._fail = True. There is definitely a reason for the change -- there are times when warnings are needed or desired. Now all of the namespaces behave the same way -- RDF, SKOS, SDO, etc. W/ auto complete and (if you get it wrong) runtime warnings. You can also turn warnings off by setting RDF._warn = False (or SKOS._warn, etc.)

from rdflib.term import URIRef


class RDFVOC(RDF):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should this be the other way around in order to preserve the existing behavior of the RDF namespace, so RDF would have _fail = True and RDF??? would be the one with _fail = False. Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - if you feel deeply about having your software die (vs. warn), you can set RDF._fail to True. If you feel deeply that everyone needs/wants to endure this behavior, we can set RDF._fail to True as the default and we can override it if needs be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the RDFVOC is the extended RDF that is used in the RDF XML parser. I had to make it behave like a ClosedNamespace because the parser actually depended (well, the TEST cases depended) on ClosedNamespace behavior. Was easier to do this way than try to blend Defined and Closed namespaces.

At some point we definitely need to do a bit 'o performance testing on these to see whether I've injected any unexpected slowness...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: failure for everyone. I don't have enough data to say what a sane default should be in the ideal case, but changing the existing default behavior is something that should be carefully considered, and definitely documented in the change log. That said, if the decision is to make the change so that all namespaces have consistent default behavior (which I think is a good idea), then 6.0.0 seems like an ideal time to do it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have the first consequence of this just here: #1374

hsolbrig added 3 commits May 26, 2020 22:31
For some reason the sequence of processing is different in Travis than in PyCharm.  Changing the module names eliminates any possibility of confusion.
Also fixed a number of warnings, whines and general unhappiness in the build process.
Note - the doctest in `rdflib/__init__.py` fails 50% of the time because there isn't a definite order in the turtle serialization
Also, we still have a mystery issue of why `Namespace` isn't included in `from rdflib import *`
@coveralls
Copy link

coveralls commented May 27, 2020

Coverage Status

Coverage increased (+4.5%) to 80.253% when pulling 7a41f34 on hsolbrig:master into 037ea51 on RDFLib:master.

@hsolbrig
Copy link
Contributor Author

The remaining build error is the randomness in TTL serialization. I'm thinking that it is time to correct that issue -- will try to get at it later this week.

@@ -84,6 +97,9 @@
]

import sys
from rdflib.namespace import XMLNS, SOSA

assert sys.version_info >= (2, 7, 0), "rdflib requires Python 2.7 or higher"
Copy link
Contributor

@tgbugs tgbugs May 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this check is needed since 6.0.0 will drop 2.7 support and because any unsupported versions should be caught before this assertion is ever reached.

edit: oop I realize I'm in rdflib/__init__.py, but I think it still holds that this is still not needed.

@tgbugs
Copy link
Contributor

tgbugs commented May 27, 2020

Re: randomness in the ttl serialization: if you can fix it by tweaking the tests it will probably be easier, there are some serious sources of randomness in the default serializer that are hard to squash (my own ttlser library was originally created to eliminate them).

@nicholascar
Copy link
Member

Hi @hsolbrig @tgbugs, We - maintainers - really like this work and apologise for taking so long to get back to you. We appreciate moving all the individual Namespaces into a dir with all the ontology annotations in them - this will be a good learning resource.

Can this PR please be rebased to current master and then we'll re-review for merging if you are both happy with it? There are a number of other Namespace-related issues we'd like to fix but I think we should have this merged first and then this, and the other fixes, can come through in 6.0.0.

@hsolbrig
Copy link
Contributor Author

I'll catch this in the next couple of days.

@edmondchuc
Copy link
Contributor

This PR by @hsolbrig migrates ClosedNamespace to DefinedNamespace. By default, it does not throw errors for terms that are not in the namespace unless the _fail bool attribute is set. The namespace.py module no longer exists as it has been moved to a namespace package. The existing namespaces in RDFLib were auto-generated using https://github.com/hsolbrig/definednamespace which now provide type annotations and some documentation.

Other files were touched to use RDFVOC instead of RDF. Essentially RDFVOC is just RDF but with _fail attribute set to True.

I have to look a bit more at the modifications made to the tests.

@aucampia if you're willing, I'd love to get your input on this PR as well, since it's related to namespaces.

@nicholascar
Copy link
Member

@hsolbrig we are back on this one! Also, note that we're merging rdflib-jsonld into rdflib now too. This should eventually make RDFlib much more JSON-capable.

These changes are all feeding into a push this week to make an RDFlib 6.0.0 release which contains a huge number of improvements over last year's 5.0. including some breaking changes (dropped Python versions, real improvements).

I hope these updates re-invigorate your interest in RDFlib. I'd love to see us get to that magic land of more verifiable namespace use through these improvements and, ultimately, within-IDE namespace prompting facilitated by your additions of type hinting.

@@ -204,6 +204,9 @@ def __ge__(self, other):
if r:
return True
return self == other

def startswith(self, prefix, start = ..., end = ...) -> bool:
return str(self).startswith(str(prefix))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Args start and end are not being passed on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, why are we overriding this?

@edmondchuc
Copy link
Contributor

edmondchuc commented Jul 8, 2021

The new DefinedNamespace class does not preserve the previous ClosedNamespace class' uri attribute. This is a breaking change. E.g. see https://github.com/RDFLib/rdflib/blob/master/rdflib/plugins/sparql/operators.py#L39 within RDFLib itself. Users of RDFLib could also be using the uri attribute in existing code too.

Expected result with a ClosedNamespace.

>>> from rdflib.namespace import RDF
>>> RDF.uri
rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#')

Result of the new DefinedNamespace.

>>> from rdflib.namespace import RDF
>>> RDF.uri
<stdin>:1: UserWarning: Code: uri is not defined in namespace RDF
rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#uri')

We need to decide if we are dropping this functionality or not.

@@ -97,17 +100,23 @@ def test_n32(self):
in n3
)

@property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was @property decorator added to avoid running this particular test? Commenting it out currently results in a failed test.

add_not_in_namespace("firstName"),
URIRef("http://xmlns.com/foaf/0.1/firstName"),
)
warn("DeclaredNamespace does not address deprecated properties")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change

warn("DeclaredNamespace does not address deprecated properties")

to

warn("DefinedNamespace does not address deprecated properties")

def test_closed_namespace(self):
"""Tests terms both in an out of the ClosedNamespace FOAF"""

def add_not_in_namespace(s):
return FOAF[s]

# a blatantly non-existent FOAF property
self.assertRaises(KeyError, add_not_in_namespace, "blah")
self.assertWarnsRegex(UserWarning, add_not_in_namespace("blah"), 'UserWarning: Code: blah is not defined in namespace FOAF')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to the below code to fix the test.

        with self.assertWarnsRegex(UserWarning, 'Code: blah is not defined in namespace FOAF'):
            add_not_in_namespace("blah")

@edmondchuc
Copy link
Contributor

I've fixed up a few things, conflicts etc. Still need to decide whether we want to preserve or drop support for uri on DefinedNamespaces.

@edmondchuc edmondchuc removed this from the rdflib 6.0.0 release milestone Jul 9, 2021
@edmondchuc
Copy link
Contributor

@nicholascar are you able to assign the remaining namespaces in https://github.com/RDFLib/rdflib/projects/2 to either the ClosedNamespace behaviour or the default DefinedNamespace behaviour please?

@nicholascar
Copy link
Member

are you able to assign the remaining namespaces

Done

# Conflicts:
#	rdflib/namespace/__init__.py
#	rdflib/plugins/sparql/operators.py
#	test/test_namespace.py
@edmondchuc
Copy link
Contributor

Changing some namespaces to behave like ClosedNamespace has broken some tests. Tests are using terms that are not in their namespace.

@nicholascar
Copy link
Member

@edmondchuc one of the fails here is that the file namespace/__init__.py uses examples containing owl:seeAlso but this is not the correct property: should be rdfs:seeAlso. Should be:

.. code-block:: pycon

    >>> from rdflib.namespace import Namespace
    >>> RDFS = Namespace("http://www.w3.org/1999/02/22-rdf-syntax-ns#")

Fully qualified URIs in the namespace can be constructed either by attribute
or by dictionary access on Namespace instances:

.. code-block:: pycon

    >>> rdfs.seeAlso
    rdflib.term.URIRef("http://www.w3.org/1999/02/22-rdf-syntax-ns#seeAlso")
    >>> rdfs['seeAlso']
    rdflib.term.URIRef("http://www.w3.org/1999/02/22-rdf-syntax-ns#seeAlso")

@nicholascar
Copy link
Member

nicholascar commented Jul 14, 2021

Perhaps we need to relax the fail for XSD, so I've moved it over in here https://github.com/RDFLib/rdflib/projects/2 to Default DefinedNamespace. That will clear up another set of failures.

@nicholascar
Copy link
Member

Good to have caught the bad examples. For any we can’t fix with the XSD change, just convert the examples to using example.com namespace

@edmondchuc
Copy link
Contributor

@nicholascar tests passing.

@nicholascar nicholascar self-requested a review July 14, 2021 20:34
@nicholascar nicholascar merged commit c62a573 into RDFLib:master Jul 14, 2021
mwatts15 added a commit to mwatts15/rdflib that referenced this pull request Aug 5, 2021
- Also, deleting _RDFNamespace since it is not used in rdflib since
  RDFLib#1074
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants