-
Notifications
You must be signed in to change notification settings - Fork 571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change some more internal usages of ConjunctiveGraph to Dataset to silence warnings #2867
Conversation
…() instance to parse into, instead of using ConjunctiveGraph. Add better handling of parsing into the correct default-graph for destination ConjuctiveGraphs or Datasets that already have their own default_graph defined.
…in the named graph with name DATASET_DEFAULT_GRAPH_ID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @ashleysommer
I know the big wall of text in the original comment is hard to follow. The concepts are difficult, and the changes are hard to describe. Simply changing ConjunctiveGraph to Dataset is exposing these hidden bugs in unexpected areas, thankfully our test suit seems to be catching them. This issue is going to get worse as we approach full removal of ConjunctiveGraph, before it gets better. The connection between the change in nquads parser, and the failing JSON-LD serializer tests, is because those tests read in the test data from the test suite on disk in nquads format. When the nquads parser changed from parsing using ConjunctiveGraph to parsing using Dataset, it begain using the Dataset's default_graph (DATASET_DEFAULT_GRAPH_ID), as the Named default graph, and that exposed the bug in JSON-LD where it could not handle that (seems like it was simply never tested to serialize a Dataset instance that uses the original default_graph.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, looks good to me.
This was originally a very simple patch to change the use of
ConjuctiveGraph
toDataset
in the hextuples and nquads graph parsers.The motivation was to silence the deprecation warnings that are emitted when simply parsing a nquads file or hextuples file, even if you were not personally using a ConjunctiveGraph, the warnings were being emitted for the use of
ConjunctiveGraph
within the parsers.One of the key differences between ConjuctiveGraph and Dataset is how the
default_graph
works. When creating a newConjunctiveGraph
(cg
), you can pass anidentifier
that becomescg.identifier
and also becomes the identifier of the default_graph. It uses a BNode as the identifier for the default graph if you don't pass one. A named graph with BNode as identifier is actually considered an "unnamed graph".On the other hand, you cannot pass an
identifier
to aDataset
. The Dataset's internal identifier is always a new BNode, (that isn't even used for anything_. The identifier ofds.default_graph
is the constantDATASET_DEFAULT_GRAPH_ID
("urn:x-rdflib:default"
). So the difference is ConjunctiveGraph by default has an Unnamed graph as its default_graph, and Dataset has a Named Graph (with a URI) as its default_graph.Changing
ConjunctiveGraph
toDataset
in the Hextuples and NQuads parsers inadvertently exposed an underlying existing bug in both parsers. If the destination graph (aka, the "sink") already had a Named default_graph assigned (like in the case of every Dataset), the parser would ignore that and parse into a new default_graph, causing one extra graph in the resulting dataset than you were expecting. That was fixed in this PR.Another key difference between ConjuctiveGraph and Dataset is that
ConjunctiveGraph
always setsdefault_union
to False unless you force it True, butDataset
is easily selectable, with the defaultFalse
.So for compatibility sake, we set
default_union
to True on the usage of Dataset in these two parsers.Finally, after those changes were made I started getting failing tests in the seemingly unrelated JSON-LD Serializer. This was caused by the change of nquads to use
Dataset
in the parser. Specifically, it was an "Unnamed default-graph" vs "Named default-graph" issue. There was an existing bug in the JSON-LD serializer that treated all named-graphs as "non-default" graphs, where it wraps them in a@graph{}
wrapper and adds an@id
with the named-graph identifier. All "Unnamed" graphs were treated as "default-graph" and were added to the top-level context in the JSON-LD file. The problem is it was treating the Dataset named graph with nameDATASET_DEFAULT_GRAPH_ID
as a "non-default" graph because it has a URI as its identifier. This was triggering a lot of JSON-LD serializer tests to fail because the expected output was different.So this PR fixes that issue in the JSON-LD serializer too, so it now treats Dataset default_graph as an un-named graph.
Note, this is directly related to #2445 which still needs to be fixed.