Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change some more internal usages of ConjunctiveGraph to Dataset to silence warnings #2867

Merged
merged 3 commits into from
Aug 1, 2024

Conversation

ashleysommer
Copy link
Contributor

@ashleysommer ashleysommer commented Aug 1, 2024

This was originally a very simple patch to change the use of ConjuctiveGraph to Dataset in the hextuples and nquads graph parsers.

The motivation was to silence the deprecation warnings that are emitted when simply parsing a nquads file or hextuples file, even if you were not personally using a ConjunctiveGraph, the warnings were being emitted for the use of ConjunctiveGraph within the parsers.

One of the key differences between ConjuctiveGraph and Dataset is how the default_graph works. When creating a new ConjunctiveGraph (cg), you can pass an identifier that becomes cg.identifier and also becomes the identifier of the default_graph. It uses a BNode as the identifier for the default graph if you don't pass one. A named graph with BNode as identifier is actually considered an "unnamed graph".

On the other hand, you cannot pass an identifier to a Dataset. The Dataset's internal identifier is always a new BNode, (that isn't even used for anything_. The identifier of ds.default_graph is the constant DATASET_DEFAULT_GRAPH_ID ("urn:x-rdflib:default"). So the difference is ConjunctiveGraph by default has an Unnamed graph as its default_graph, and Dataset has a Named Graph (with a URI) as its default_graph.

Changing ConjunctiveGraph to Dataset in the Hextuples and NQuads parsers inadvertently exposed an underlying existing bug in both parsers. If the destination graph (aka, the "sink") already had a Named default_graph assigned (like in the case of every Dataset), the parser would ignore that and parse into a new default_graph, causing one extra graph in the resulting dataset than you were expecting. That was fixed in this PR.

Another key difference between ConjuctiveGraph and Dataset is that ConjunctiveGraph always sets default_union to False unless you force it True, but Dataset is easily selectable, with the default False.
So for compatibility sake, we set default_union to True on the usage of Dataset in these two parsers.

Finally, after those changes were made I started getting failing tests in the seemingly unrelated JSON-LD Serializer. This was caused by the change of nquads to use Dataset in the parser. Specifically, it was an "Unnamed default-graph" vs "Named default-graph" issue. There was an existing bug in the JSON-LD serializer that treated all named-graphs as "non-default" graphs, where it wraps them in a @graph{} wrapper and adds an @id with the named-graph identifier. All "Unnamed" graphs were treated as "default-graph" and were added to the top-level context in the JSON-LD file. The problem is it was treating the Dataset named graph with name DATASET_DEFAULT_GRAPH_ID as a "non-default" graph because it has a URI as its identifier. This was triggering a lot of JSON-LD serializer tests to fail because the expected output was different.

So this PR fixes that issue in the JSON-LD serializer too, so it now treats Dataset default_graph as an un-named graph.
Note, this is directly related to #2445 which still needs to be fixed.

…() instance to parse into, instead of using ConjunctiveGraph.

Add better handling of parsing into the correct default-graph for destination ConjuctiveGraphs or Datasets that already have their own default_graph defined.
…in the named graph with name DATASET_DEFAULT_GRAPH_ID
@ashleysommer ashleysommer changed the title Internal conjunctive graph Change some more internal usages of ConjunctiveGraph to Dataset set silence warnings Aug 1, 2024
@coveralls
Copy link

coveralls commented Aug 1, 2024

Coverage Status

coverage: 90.64% (-0.003%) from 90.643%
when pulling 509c73c on internal_conjunctive_graph
into 0e7695c on main.

@ashleysommer ashleysommer changed the title Change some more internal usages of ConjunctiveGraph to Dataset set silence warnings Change some more internal usages of ConjunctiveGraph to Dataset to silence warnings Aug 1, 2024
Copy link
Member

@nicholascar nicholascar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @ashleysommer

@ashleysommer
Copy link
Contributor Author

ashleysommer commented Aug 1, 2024

I know the big wall of text in the original comment is hard to follow. The concepts are difficult, and the changes are hard to describe. ConjuctiveGraph vs Dataset usage is resulting in hidden bugs around the use of default_union, default_context, and identifier that are treated differently on Dataset than on ConjunctiveGraph.

Simply changing ConjunctiveGraph to Dataset is exposing these hidden bugs in unexpected areas, thankfully our test suit seems to be catching them. This issue is going to get worse as we approach full removal of ConjunctiveGraph, before it gets better.

The connection between the change in nquads parser, and the failing JSON-LD serializer tests, is because those tests read in the test data from the test suite on disk in nquads format. When the nquads parser changed from parsing using ConjunctiveGraph to parsing using Dataset, it begain using the Dataset's default_graph (DATASET_DEFAULT_GRAPH_ID), as the Named default graph, and that exposed the bug in JSON-LD where it could not handle that (seems like it was simply never tested to serialize a Dataset instance that uses the original default_graph.)

Copy link
Contributor

@edmondchuc edmondchuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looks good to me.

@ashleysommer ashleysommer merged commit fd10835 into main Aug 1, 2024
22 checks passed
@ashleysommer ashleysommer deleted the internal_conjunctive_graph branch August 1, 2024 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants