Change some more internal usages of ConjunctiveGraph to Dataset to silence warnings #2867

ashleysommer · 2024-08-01T04:20:00Z

This was originally a very simple patch to change the use of ConjuctiveGraph to Dataset in the hextuples and nquads graph parsers.

The motivation was to silence the deprecation warnings that are emitted when simply parsing a nquads file or hextuples file, even if you were not personally using a ConjunctiveGraph, the warnings were being emitted for the use of ConjunctiveGraph within the parsers.

One of the key differences between ConjuctiveGraph and Dataset is how the default_graph works. When creating a new ConjunctiveGraph (cg), you can pass an identifier that becomes cg.identifier and also becomes the identifier of the default_graph. It uses a BNode as the identifier for the default graph if you don't pass one. A named graph with BNode as identifier is actually considered an "unnamed graph".

On the other hand, you cannot pass an identifier to a Dataset. The Dataset's internal identifier is always a new BNode, (that isn't even used for anything_. The identifier of ds.default_graph is the constant DATASET_DEFAULT_GRAPH_ID ("urn:x-rdflib:default"). So the difference is ConjunctiveGraph by default has an Unnamed graph as its default_graph, and Dataset has a Named Graph (with a URI) as its default_graph.

Changing ConjunctiveGraph to Dataset in the Hextuples and NQuads parsers inadvertently exposed an underlying existing bug in both parsers. If the destination graph (aka, the "sink") already had a Named default_graph assigned (like in the case of every Dataset), the parser would ignore that and parse into a new default_graph, causing one extra graph in the resulting dataset than you were expecting. That was fixed in this PR.

Another key difference between ConjuctiveGraph and Dataset is that ConjunctiveGraph always sets default_union to False unless you force it True, but Dataset is easily selectable, with the default False.
So for compatibility sake, we set default_union to True on the usage of Dataset in these two parsers.

Finally, after those changes were made I started getting failing tests in the seemingly unrelated JSON-LD Serializer. This was caused by the change of nquads to use Dataset in the parser. Specifically, it was an "Unnamed default-graph" vs "Named default-graph" issue. There was an existing bug in the JSON-LD serializer that treated all named-graphs as "non-default" graphs, where it wraps them in a @graph{} wrapper and adds an @id with the named-graph identifier. All "Unnamed" graphs were treated as "default-graph" and were added to the top-level context in the JSON-LD file. The problem is it was treating the Dataset named graph with name DATASET_DEFAULT_GRAPH_ID as a "non-default" graph because it has a URI as its identifier. This was triggering a lot of JSON-LD serializer tests to fail because the expected output was different.

So this PR fixes that issue in the JSON-LD serializer too, so it now treats Dataset default_graph as an un-named graph.
Note, this is directly related to #2445 which still needs to be fixed.

…() instance to parse into, instead of using ConjunctiveGraph. Add better handling of parsing into the correct default-graph for destination ConjuctiveGraphs or Datasets that already have their own default_graph defined.

…in the named graph with name DATASET_DEFAULT_GRAPH_ID

coveralls · 2024-08-01T04:27:26Z

coverage: 90.64% (-0.003%) from 90.643%
when pulling 509c73c on internal_conjunctive_graph
into 0e7695c on main.

nicholascar

Great work @ashleysommer

ashleysommer · 2024-08-01T05:47:40Z

I know the big wall of text in the original comment is hard to follow. The concepts are difficult, and the changes are hard to describe. ConjuctiveGraph vs Dataset usage is resulting in hidden bugs around the use of default_union, default_context, and identifier that are treated differently on Dataset than on ConjunctiveGraph.

Simply changing ConjunctiveGraph to Dataset is exposing these hidden bugs in unexpected areas, thankfully our test suit seems to be catching them. This issue is going to get worse as we approach full removal of ConjunctiveGraph, before it gets better.

The connection between the change in nquads parser, and the failing JSON-LD serializer tests, is because those tests read in the test data from the test suite on disk in nquads format. When the nquads parser changed from parsing using ConjunctiveGraph to parsing using Dataset, it begain using the Dataset's default_graph (DATASET_DEFAULT_GRAPH_ID), as the Named default graph, and that exposed the bug in JSON-LD where it could not handle that (seems like it was simply never tested to serialize a Dataset instance that uses the original default_graph.)

edmondchuc

Nice, looks good to me.

ashleysommer added 2 commits August 1, 2024 13:44

Fix JSON-LD serializing when the source is a Dataset and triples are …

eb121af

…in the named graph with name DATASET_DEFAULT_GRAPH_ID

ashleysommer changed the title ~~Internal conjunctive graph~~ Change some more internal usages of ConjunctiveGraph to Dataset set silence warnings Aug 1, 2024

ashleysommer requested review from nicholascar and edmondchuc August 1, 2024 04:20

ashleysommer changed the title ~~Change some more internal usages of ConjunctiveGraph to Dataset set silence warnings~~ Change some more internal usages of ConjunctiveGraph to Dataset to silence warnings Aug 1, 2024

nicholascar approved these changes Aug 1, 2024

View reviewed changes

Merge branch 'main' into internal_conjunctive_graph

509c73c

edmondchuc approved these changes Aug 1, 2024

View reviewed changes

ashleysommer merged commit fd10835 into main Aug 1, 2024
22 checks passed

ashleysommer deleted the internal_conjunctive_graph branch August 1, 2024 06:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change some more internal usages of ConjunctiveGraph to Dataset to silence warnings #2867

Change some more internal usages of ConjunctiveGraph to Dataset to silence warnings #2867

ashleysommer commented Aug 1, 2024 •

edited

Loading

coveralls commented Aug 1, 2024 •

edited

Loading

nicholascar left a comment

ashleysommer commented Aug 1, 2024 •

edited

Loading

edmondchuc left a comment

Change some more internal usages of ConjunctiveGraph to Dataset to silence warnings #2867

Change some more internal usages of ConjunctiveGraph to Dataset to silence warnings #2867

Conversation

ashleysommer commented Aug 1, 2024 • edited Loading

coveralls commented Aug 1, 2024 • edited Loading

nicholascar left a comment

Choose a reason for hiding this comment

ashleysommer commented Aug 1, 2024 • edited Loading

edmondchuc left a comment

Choose a reason for hiding this comment

ashleysommer commented Aug 1, 2024 •

edited

Loading

coveralls commented Aug 1, 2024 •

edited

Loading

ashleysommer commented Aug 1, 2024 •

edited

Loading