-
Notifications
You must be signed in to change notification settings - Fork 571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migration from ClosedNamespace to DeclaredNamespace #1074
Conversation
Passes all unit tests, but did have to remove '.' from manifest and earl imports. May have to restore...
Goodness - looks like I've still got some work to do... |
Fix a few odd's and ends warnings as well
General question about caps vs lowercase. I prefer to match the way the namespace is conventionally serialized in a ttl file (e.g. I perfer the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pulling all of this together. In general it looks good and I like the approach. I don't entirely understand all of the decisions that were made so have left some questions/comments. I also can't seem to find the code that is used to generate the new files, did I just miss it?
The main concern I have right now is that it seems that the types and default behavior of some of the existing of the namespaces has changed in ways that are not necessary and that are likely to create additional work for anyone who happens to be using them (e.g. the RDF namespace is no longer closed and no longer a ClosedNamespace).
for c in cls.mro() if issubclass(c, DefinedNamespace)) | ||
|
||
|
||
class DefinedNamespace(metaclass=DefinedNamespaceMeta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has ClosedNamespace not been a subclass of Namespace all this time? Should ClosedNamespace and DefinedNamespace be subclasses of Namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't touched ClosedNamespace, so yes.
DefinedNamespace's behavior is sufficiently different that it is easy (and safer, imo) to implement it as a wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, in response to the generator question above, yes - it was built using https://github.com/hsolbrig/definednamespace -- once (if) the above changes are accepted in rdflib then I can do a cleanup step. It is broken right at the moment and needs a good deal more documentation.
It leads to a question, however. I could drop the generator into the tools directory if it made sense or leave it as a separate package. Will be a bit 'o a pita to get all the unit tests and the like migrated so it may be a little while...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think getting it into the primary tools directory of this repository is a good idea. A more extreme view would be that it is preferable for any code generation components to be packaged with the core, and thus that inclusion be a perquisite for merging this PR, but that seems silly to me since the generation code can be merged at its own pace. Let's see what others think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having the generator in tools is the way to go: use is decoupled.
rdflib/__init__.py
Outdated
@@ -53,7 +53,6 @@ | |||
"BNode", | |||
"Literal", | |||
"Variable", | |||
"Namespace", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be removed from the rdflib (python) *
namespace, it will break a bunch of existing codebases. Was there a reason why it was removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch -- my bad in the conversion. I was wondering why (surprise) I broke a bunch of existing codebases.
@@ -539,9 +306,9 @@ def __init__(self, graph): | |||
for p, n in self.namespaces(): # self.bind is not always called | |||
insert_trie(self.__trie, str(n)) | |||
self.bind("xml", "http://www.w3.org/XML/1998/namespace") | |||
self.bind("rdf", RDF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import cycle. It is a weak spot in the code, but I'd have to enhance the auto-generator to get around it...
"TIME", | ||
"VOID", | ||
"XMLNS", | ||
"XSD", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why where these removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not have been. Probably a cut and paste error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, they weren't. namespace.py
no longer exists -- instead becoming namespace/__init__.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I see. However, the classes were lifted to modules, so anyone who imported rdflib.namespace.TIME
will get a nasty surprise when TIME
is suddenly a module rather than a class. I suspect that this means that the files will need to be renamed to avoid collisions in the rdflib.namespace
(python) namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it should work as expected, as the module is named _TIME and it is lifted in rdflib.namespace. In general, if there are nasty surprises waiting, shouldn't there be tests for them? I'd been running under the assumption that if something matters there is (or will be) a unit test for it -- is this not the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies I must have brainfarted hard on the file names or been fooled by the fact that github didn't update them in the interface. I see that this was already addressed by 91015c1, so before I made the previous comment, and addresses exactly the nasty surprise that I was imagining. Whether there were explicit tests for the class names, I suspect that there were quite a few implicit tests simply by virtue of having to import them at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup - one of the take-home lessons I've learned from this push is that it is far more important to review changes to unit tests than to the code base itself. I (stupidly) made several changes to the unit tests to make them match the new code, initially hiding the "nasty surprises". Hopefully I have undone all of those and can somewhat confidently assert that the set of nasty surprises in the unit tests have been covered.
rdflib/namespace/RDF.py
Outdated
from rdflib.namespace import DefinedNamespace, Namespace | ||
|
||
|
||
class RDF(DefinedNamespace): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this still match the behavior of CloseNamespace or is there a reason for the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can mimic the behavior of ClosedNamespace by setting RDF._fail = True. There is definitely a reason for the change -- there are times when warnings are needed or desired. Now all of the namespaces behave the same way -- RDF, SKOS, SDO, etc. W/ auto complete and (if you get it wrong) runtime warnings. You can also turn warnings off by setting RDF._warn = False (or SKOS._warn, etc.)
from rdflib.term import URIRef | ||
|
||
|
||
class RDFVOC(RDF): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should this be the other way around in order to preserve the existing behavior of the RDF namespace, so RDF would have _fail = True
and RDF???
would be the one with _fail = False
. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - if you feel deeply about having your software die (vs. warn), you can set RDF._fail to True. If you feel deeply that everyone needs/wants to endure this behavior, we can set RDF._fail to True as the default and we can override it if needs be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the RDFVOC is the extended RDF that is used in the RDF XML parser. I had to make it behave like a ClosedNamespace because the parser actually depended (well, the TEST cases depended) on ClosedNamespace behavior. Was easier to do this way than try to blend Defined and Closed namespaces.
At some point we definitely need to do a bit 'o performance testing on these to see whether I've injected any unexpected slowness...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: failure for everyone. I don't have enough data to say what a sane default should be in the ideal case, but changing the existing default behavior is something that should be carefully considered, and definitely documented in the change log. That said, if the decision is to make the change so that all namespaces have consistent default behavior (which I think is a good idea), then 6.0.0 seems like an ideal time to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have the first consequence of this just here: #1374
For some reason the sequence of processing is different in Travis than in PyCharm. Changing the module names eliminates any possibility of confusion. Also fixed a number of warnings, whines and general unhappiness in the build process.
Note - the doctest in `rdflib/__init__.py` fails 50% of the time because there isn't a definite order in the turtle serialization Also, we still have a mystery issue of why `Namespace` isn't included in `from rdflib import *`
The remaining build error is the randomness in TTL serialization. I'm thinking that it is time to correct that issue -- will try to get at it later this week. |
rdflib/__init__.py
Outdated
@@ -84,6 +97,9 @@ | |||
] | |||
|
|||
import sys | |||
from rdflib.namespace import XMLNS, SOSA | |||
|
|||
assert sys.version_info >= (2, 7, 0), "rdflib requires Python 2.7 or higher" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this check is needed since 6.0.0 will drop 2.7 support and because any unsupported versions should be caught before this assertion is ever reached.
edit: oop I realize I'm in rdflib/__init__.py
, but I think it still holds that this is still not needed.
Re: randomness in the ttl serialization: if you can fix it by tweaking the tests it will probably be easier, there are some serious sources of randomness in the default serializer that are hard to squash (my own ttlser library was originally created to eliminate them). |
Hi @hsolbrig @tgbugs, We - maintainers - really like this work and apologise for taking so long to get back to you. We appreciate moving all the individual Namespaces into a dir with all the ontology annotations in them - this will be a good learning resource. Can this PR please be rebased to current master and then we'll re-review for merging if you are both happy with it? There are a number of other Namespace-related issues we'd like to fix but I think we should have this merged first and then this, and the other fixes, can come through in 6.0.0. |
I'll catch this in the next couple of days. |
This PR by @hsolbrig migrates Other files were touched to use I have to look a bit more at the modifications made to the tests. @aucampia if you're willing, I'd love to get your input on this PR as well, since it's related to namespaces. |
@hsolbrig we are back on this one! Also, note that we're merging rdflib-jsonld into rdflib now too. This should eventually make RDFlib much more JSON-capable. These changes are all feeding into a push this week to make an RDFlib 6.0.0 release which contains a huge number of improvements over last year's 5.0. including some breaking changes (dropped Python versions, real improvements). I hope these updates re-invigorate your interest in RDFlib. I'd love to see us get to that magic land of more verifiable namespace use through these improvements and, ultimately, within-IDE namespace prompting facilitated by your additions of type hinting. |
@@ -204,6 +204,9 @@ def __ge__(self, other): | |||
if r: | |||
return True | |||
return self == other | |||
|
|||
def startswith(self, prefix, start = ..., end = ...) -> bool: | |||
return str(self).startswith(str(prefix)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Args start
and end
are not being passed on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, why are we overriding this?
The new Expected result with a >>> from rdflib.namespace import RDF
>>> RDF.uri
rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#') Result of the new >>> from rdflib.namespace import RDF
>>> RDF.uri
<stdin>:1: UserWarning: Code: uri is not defined in namespace RDF
rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#uri') We need to decide if we are dropping this functionality or not. |
test/test_namespace.py
Outdated
@@ -97,17 +100,23 @@ def test_n32(self): | |||
in n3 | |||
) | |||
|
|||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was @property
decorator added to avoid running this particular test? Commenting it out currently results in a failed test.
test/test_namespace.py
Outdated
add_not_in_namespace("firstName"), | ||
URIRef("http://xmlns.com/foaf/0.1/firstName"), | ||
) | ||
warn("DeclaredNamespace does not address deprecated properties") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change
warn("DeclaredNamespace does not address deprecated properties")
to
warn("DefinedNamespace does not address deprecated properties")
test/test_namespace.py
Outdated
def test_closed_namespace(self): | ||
"""Tests terms both in an out of the ClosedNamespace FOAF""" | ||
|
||
def add_not_in_namespace(s): | ||
return FOAF[s] | ||
|
||
# a blatantly non-existent FOAF property | ||
self.assertRaises(KeyError, add_not_in_namespace, "blah") | ||
self.assertWarnsRegex(UserWarning, add_not_in_namespace("blah"), 'UserWarning: Code: blah is not defined in namespace FOAF') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to the below code to fix the test.
with self.assertWarnsRegex(UserWarning, 'Code: blah is not defined in namespace FOAF'):
add_not_in_namespace("blah")
I've fixed up a few things, conflicts etc. Still need to decide whether we want to preserve or drop support for |
@nicholascar are you able to assign the remaining namespaces in https://github.com/RDFLib/rdflib/projects/2 to either the ClosedNamespace behaviour or the default DefinedNamespace behaviour please? |
Done |
# Conflicts: # rdflib/namespace/__init__.py # rdflib/plugins/sparql/operators.py # test/test_namespace.py
Changing some namespaces to behave like ClosedNamespace has broken some tests. Tests are using terms that are not in their namespace. |
@edmondchuc one of the fails here is that the file
|
Perhaps we need to relax the fail for XSD, so I've moved it over in here https://github.com/RDFLib/rdflib/projects/2 to Default DefinedNamespace. That will clear up another set of failures. |
Good to have caught the bad examples. For any we can’t fix with the XSD change, just convert the examples to using |
@nicholascar tests passing. |
- Also, deleting _RDFNamespace since it is not used in rdflib since RDFLib#1074
Instead of ClosedNamespace which throws errors, moved to (auto-generated) DeclaredNamespace that includes type hints for all members and (optionally) raises runtime warnings.
This passes all of the unit tests locally (in PyCharm) but I had to change 2 import statements (
from .manifest
andfrom .earl
) tofrom manifest
andfrom earl
. If this breaks anything they can be changed back - test_nquads_w3c.py, test_dawg.py, test_nt_w3c.py, test_trig_w3c.py, test_turtle_w3c.py.