-
Notifications
You must be signed in to change notification settings - Fork 571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove singly-used alias obviated by IdentifiedNode #1695
Conversation
ContextNode did not appear anywhere else in the code base. Signed-off-by: Alex Nelson <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, change looks good, but I am a bit worried if someone has already used ContextNode, as this will affect them.
Understood. I didn't know that alias was hanging around in |
Think best that @nicholascar give some input here, I would be slightly more comfortable waiting for the major release, but not sure it is that critical.
I think the current approach is to just leave the pull requests and keep rebasing/merging them until we are ready for them, if we have permission from the branch creator to make changes to it then it works best that way. |
I'm happy to await input. I have the "Allow edits" box checked, so hopefully you can make changes. |
We are planning a 7.0.0 release in the next few months, but the aims of that are to tidy up Perhaps it's sensible to issue a 6.2.0 for this level of change? We can merge into master for this: it's ok to have master a bit different to the latest release. |
My concern is, if someone did this their code will break at runtime: from rdflib.graph import ContextNode
def foo(context_node: ContextNode) -> None:
pass I would say there is a low probability that this was done, but it is possible none the less. I don't have problem with this going into master, I am a bit concerned, but not that much, if it causes any real problems then we can always issue a patch release to put it back, so I'm fine with putting it in master for |
I'd appreciate some early guidelines on the degree of type rigour that 7.0 is planned to impose on the API. atm, these tests succeed: import pytest
from rdflib import XSD
from rdflib.graph import ContextNode, Graph
from rdflib.term import IdentifiedNode, Literal, Node, URIRef
def test_literal_as_graph_identifier():
g = Graph(identifier=Literal("mygraph", datatype=XSD.string))
assert isinstance(g.identifier, Node)
with pytest.raises(AssertionError):
assert isinstance(g.identifier, IdentifiedNode)
with pytest.raises(AssertionError):
assert type(g.identifier) is ContextNode
def test_string_as_graph_identifier():
g = Graph(identifier="mygraph")
assert isinstance(g.identifier, str)
assert isinstance(g.identifier, IdentifiedNode)
with pytest.raises(AssertionError):
assert type(g.identifier) is ContextNode |
rdflib/graph.py
Outdated
ContextNode = Union[BNode, URIRef] | ||
DatasetQuad = Tuple[Node, URIRef, Node, Optional[ContextNode]] | ||
# Type aliase to make unpacking what's going on a little more human friendly | ||
DatasetQuad = Tuple[Node, URIRef, Node, Optional[IdentifiedNode]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an issue with the extant typing of ContextNode
which this change understandably doesn't address. The Graph identifier is (correctly) type-hinted as Optional[Union[Node, str]]
and Optional[IdentifiedNode]
doesn't capture the str
component.
Meh, I forgot that Graph also accepts a Literal as an identifier, also not captured by IdentifiedNode
.
An all-inclusive ContextNode
would be Union[IdentifiedNode, Literal, str, None]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this last point is correct: any of those things can be used for a context node, so we may have a case here to now remove ContextNode
as I don't think it can be obviated by IdentifiedNode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I certainly missed this use case ContextNode
was supporting, apparently incompletely. I suppose the question now is, is ContextNode
intended to be a symbol meant for export and re-consumption, or is it internal to the module? If internal, would it be better suited as being named _ContextNode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it should be ContextNode[Any]
😄 ...
g1 = Graph(identifier="⚠️")
logger.debug(f"{g1.identifier}")
g2 = Graph(identifier=Literal("foo"))
logger.debug(f"{g2.identifier}")
g3 = Graph(identifier=BNode())
logger.debug(f"{g3.identifier}")
g4 = Graph(identifier=URIRef("urn:example:context-1"))
logger.debug(f"{g4.identifier}")
g5 = Graph(identifier=FOAF.name)
logger.debug(f"{g5.identifier}")
g6 = Graph(identifier=Namespace("http://rdflib.net"))
logger.debug(f"{g6.identifier}")
g7 = Graph(identifier=Variable("xyzzy"))
logger.debug(f"{g7.identifier}")
# At least this is dereferenced, phew!
g8 = Graph(identifier=g1)
logger.debug(f"{g8.identifier}")
# lol
g9 = Graph(identifier=(tarek, likes, pizza))
logger.debug(f"{g9.identifier}")
⚠️
foo
N487b9bd4146e44e8b039eadca3a36ec8
urn:example:context-1
http://xmlns.com/foaf/0.1/name
http://rdflib.net
xyzzy
N3b7d314ec941418d9d0f0f6301846796
(rdflib.term.URIRef('urn:example:tarek'),
rdflib.term.URIRef('urn:example:likes'),
rdflib.term.URIRef('urn:example:pizza'))
PASSED [100%]
Anything that returns a str
from repr
is valid, so most of the successes are unsurprising. (For clarity, "tarek" etc are URIRefs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meh, I forgot that Graph also accepts a Literal as an identifier, also not captured by
IdentifiedNode
.
It is important to note, this type alias is only used in one place, which is the return value for Dataset.__iter__
so it should be correct for that, and for this reason it probably should not be a type alias, or at the very least not one outside of the class. At the very least a type alias on module level should be used in more than one class/function in that module before it is clear that it makes sense for it to be an alias, as there may be cases where something looks like a ContextNode and needs to have Literal
, str
, Node
and None
- and other places where IdentifiedNode
really is the most appropriate.
Also Literal is not in the value space of graph identifiers/names according to RDF 1.1:
An RDF dataset is a collection of RDF graphs, and comprises:
So if Graph accepts it as an identifier, it probably should not, and the right thing is to narrow the type there to match the correct return type for Dataset.__iter__
- if Dataset.__iter__
does return something that is contrary to the spec I think we should at least understand where things go awry also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an issue with the extant typing of
ContextNode
which this change understandably doesn't address. The Graph identifier is (correctly) type-hinted asOptional[Union[Node, str]]
andOptional[IdentifiedNode]
doesn't capture thestr
component
as noted elsewhere, Graph will convert any identifier that is not a Node to a URIRef:
Lines 342 to 345 in 4eba1a0
self.__identifier: Node | |
self.__identifier = identifier or BNode() # type: ignore[assignment] | |
if not isinstance(self.__identifier, Node): | |
self.__identifier = URIRef(self.__identifier) # type: ignore[unreachable] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it should be
ContextNode[Any]
smile ...
This may be the case for Graph constructor, but all of these values are converted to URIRef if they are not URIRefs:
$ pipx run --spec git+https://github.com/RDFLib/rdflib.git@master#egg=rdflib python
⚠️ python is already on your PATH and installed at /usr/bin/python. Downloading and running anyway.
Python 3.10.2 (main, Jan 17 2022, 00:00:00) [GCC 11.2.1 20211203 (Red Hat 11.2.1-7)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import rdflib
>>> rdflib.Graph(identifier=(1,2,3)).identifier
rdflib.term.URIRef('(1, 2, 3)')
And even though it does work in some sense (I'm actually not sure how valid it will be as a IRI, but maybe it is entirely valid), I still think there is an open question as to whether we want to advertise that anything should be passible into it, and certainly there are some things that do fail and it seems tuple works for reasons that were not quite intended:
>>> rdflib.Graph(identifier=3).identifier
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/graph.py", line 345, in __init__
self.__identifier = URIRef(self.__identifier) # type: ignore[unreachable]
File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/term.py", line 273, in __new__
if not _is_valid_uri(value):
File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/term.py", line 96, in _is_valid_uri
if c in uri:
TypeError: argument of type 'int' is not iterable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And note that even though things don't throw an exception if someone does not adhere to type annotations, I think the point of type annotations should be to tell people what the correct type is to pass in, so that they can spot bugs in their code, if for some reason things still mostly work in one place when they pass in wrong values, it may still break down later on, and this is basically also declaring our contract with users of the library, if we say we will accept tuples as input to Graph identifier, just because they work, then we have to bear the consequences of this everywhere and maintain this, while if we say that we only accept strings, if someone then passes something else, and it works, well fine, but if something breaks further down the line at least we can tell them that they should be passing in a string instead of a tuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't even need to keep str
, you are correct, all the sane values resolve to URIRef. I worked up a more thorough check:
import pytest
from rdflib import Namespace, FOAF
from rdflib.graph import Dataset, Graph
from rdflib.term import (
BNode,
Literal,
RDFLibGenid,
URIRef,
Node,
IdentifiedNode,
Variable,
)
tarek = URIRef("urn:example:tarek")
likes = URIRef("urn:example:likes")
pizza = URIRef("urn:example:pizza")
def test_graph_identifier_as_string():
g = Graph(identifier="xxx")
assert type(g.identifier) is URIRef # rdflib.term.URIRef('xxx')
assert issubclass(type(g.identifier), Node)
assert issubclass(type(g.identifier), IdentifiedNode)
def test_graph_identifier_as_literal():
g = Graph(identifier=Literal("xxx"))
assert type(g.identifier) is Literal # rdflib.term.Literal('xxx')
assert issubclass(type(g.identifier), Node)
assert not issubclass(type(g.identifier), IdentifiedNode)
def test_graph_identifier_as_bnode():
g = Graph(identifier=BNode())
assert type(g.identifier) is BNode # rdflib.term.BNode('Ndc8ac01941254b299a66c31a5b49cdd3')
assert issubclass(type(g.identifier), Node)
assert issubclass(type(g.identifier), IdentifiedNode)
def test_graph_identifier_as_namespace():
g = Graph(identifier=(tarek, likes, pizza))
assert type(g.identifier) is URIRef # rdflib.term.URIRef("(rdflib.term.URIRef('urn:example:tarek'),
# rdflib.term.URIRef('urn:example:likes'),
# rdflib.term.URIRef('urn:example:pizza'))")
assert issubclass(type(g.identifier), Node)
assert issubclass(type(g.identifier), IdentifiedNode)
def test_graph_identifier_as_graph():
g = Graph()
g = Graph(identifier=g)
assert type(g.identifier) is BNode # rdflib.term.BNode('N666b78c69a3d4544a079ab0919a84dda')
assert issubclass(type(g.identifier), Node)
assert issubclass(type(g.identifier), IdentifiedNode)
def test_graph_identifier_as_genid():
g = Graph()
g = Graph(identifier=RDFLibGenid("xxx"))
assert type(g.identifier) is RDFLibGenid # RDFLibGenid('xxx')
assert issubclass(type(g.identifier), Node)
assert issubclass(type(g.identifier), IdentifiedNode)
def test_graph_identifier_as_identifiednode():
g = Graph()
g = Graph(identifier=IdentifiedNode("xxx"))
assert type(g.identifier) is IdentifiedNode # 'xxx'
assert issubclass(type(g.identifier), Node)
assert issubclass(type(g.identifier), IdentifiedNode)
def test_graph_identifier_as_variable():
g = Graph()
g = Graph(identifier=Variable("x"))
assert type(g.identifier) is Variable # rdflib.term.Variable('x')
assert issubclass(type(g.identifier), Node)
assert not issubclass(type(g.identifier), IdentifiedNode)
Sorry all, I've effectively ruined this PR be reintroducing I don't really know where to go from here... @aucampia @ajnelson-nist can you please re-review? |
My intent for the PR wasn't to eliminate |
rdflib/graph.py
Outdated
# Type aliases to make unpacking what's going on a little more human friendly | ||
ContextNode = Union[BNode, URIRef] | ||
# Type aliase to make unpacking what's going on a little more human friendly | ||
ContextNode = Union[IdentifiedNode, Literal, str, None] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only place that ContextNode
is used is in DatasetQuad
, and there it is optional, so the None
here is not needed, but it also does not hurt really.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we should re-evaluate the idea here, I would say at the very least we should consider making these private, i.e. _ContextNode
, _DatasetQuad
- if we use the same type aliases many times in many places we can look at making them public, but even then I would say they should rather go into rdflib.typing
so we can reserve the names in here if we later on want to make them abstract base classes or real classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the problem here is, this is used in a single place, and in that place ContextNode will never be a string, but always be a Node
I think:
Lines 2036 to 2041 in 4eba1a0
def quads(self, quad): | |
for s, p, o, c in super(Dataset, self).quads(quad): | |
if c.identifier == self.default_context: | |
yield s, p, o, None | |
else: | |
yield s, p, o, c.identifier |
Because if it is not a node it will be converted to URIRef here:
Lines 342 to 345 in 4eba1a0
self.__identifier: Node | |
self.__identifier = identifier or BNode() # type: ignore[assignment] | |
if not isinstance(self.__identifier, Node): | |
self.__identifier = URIRef(self.__identifier) # type: ignore[unreachable] |
And actually it should also never be anything other than the Default Graph (in this case identified by none), an IRI or a blank node:
https://www.w3.org/TR/rdf11-concepts/#section-dataset
An RDF dataset is a collection of RDF graphs, and comprises:
I think the best options here are:
I somewhat favor option 2, as I think this is the more correct thing to do. We don't want to advertise to people that they should be passing values in as graph identifiers that are not inline with RDF 1.1 - and if they still want to they can use type ignores. |
Also change Graph/Context identifier annotations to match RDF spec. Graph/Context identifiers should be URIRef or BNode - in RDFLib these are now the only extant subclasses of IdentifiedNode, so I'm using that.
I have made a PR to change this branch accordingly: ajnelson-nist#13 Again other options may make sense, but I think in all cases we should eliminate |
Appologies for accidental closure, I think part of suggestion 2 is being done: |
…etQuad Change `Dataset.__iter__` to match RDF spec
Thanks for the update, @aucampia ! My contributions to the rest of the conversations in this thread would basically await consequences of type signature review. So, I'm happy with whatever makes incremental progress for now. If the type aliases are retained, I agree with making them underscore-prefixed private names. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redid review of latest iteration of this change, I think this is optimal for now, aliases would be great to simplify the type annotations but ideally they should be accompanied with a couple of uses and remain private unless they are in a different module. I think we will get there soon though.
ContextNode did not appear anywhere else in the code base.
This PR has no associated Issue.
Proposed Changes
ContextNode
from/rdflib/graph.py
.IdentifiedNode
into/rdflib/graph.py
.