Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug: duplicate rST hyperlink targets don't appear to emit a warning during project processing? #13383

Open
jayaddison opened this issue Feb 22, 2025 · 10 comments · May be fixed by #13400
Labels

Comments

@jayaddison
Copy link
Contributor

Describe the bug

This is observed from a build of the Linux kernel documentation in Arch Linux; relevant diffoscope output can be found here: https://reproducible.archlinux.org/api/v0/builds/740203/diffoscope

It seems that sometimes the hyperlink href (destination) included in the HTML output for a document may resolve to a different document, despite a local declaration of the target (in the RST source).

This is visible as a diff in the output quickly-build-trimmed-linux.html HTML file as:

│ ├── usr/lib/modules/6.13.3-arch1-1/build/Documentation/admin-guide/quickly-build-trimmed-linux.html
│ │┄ 'html2text' not available in path.
│ │ @@ -1172,15 +1172,15 @@
│ │  guide would defeat its purpose, as without such a focus you would need dozens or
│ │  hundreds of constructs along the lines of ‘in case you are having <insert
│ │  machine or distro>, you at this point have to do <this and that>
│ │  <instead|additionally>’. Each of which would make the text longer, more
│ │  complicated, and harder to follow.</p>
│ │  <p>That being said: this of course is a balancing act. Hence, if you think an
│ │  additional use-case is worth describing, suggest it to the maintainers of this
│ │ -document, as <a class="reference internal" href="#submit-improvements"><span class="std std-ref">described above</span></a>.</p>
│ │ +document, as <a class="reference internal" href="verify-bugs-and-bisect-regressions.html#submit-improvements"><span class="std std-ref">described above</span></a>.</p>
│ │  </section>
│ │  </section>
│ │  </section>
│ │  
│ │  
│ │            </div>

One characteristic of the problem link is that it is declared locally within quickly-build-trimmed-linux.rst -- but also, separately within the verify-bugs-and-bisect-regressions.rst file that the referencing link sometimes mistakenly resolves to.

I believe this problem was previously identified as one of I think three non-determinism problems reported in #6714 (we fixed one of those, another remains open, and the item I'm reporting again here seems untracked).

A workaround could be to remove the ambiguity in the kernel documentation, and that may be straightforward to do. Even so, I think this may be a bug in Sphinx.

cc @bmwiedemann
cc @kpcyrd

How to Reproduce

TBD

Environment Information

Platform:              linux; (Arch Linux; precise version TBD)
Python version:        3.13.2 (precise version TBD)
Python implementation: CPython
Sphinx version:        8.1.3
Docutils version:      0.21.2
Jinja2 version:        3.1.5
Pygments version:      2.19.1

Sphinx extensions

extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include',
              'kfigure', 'sphinx.ext.ifconfig', 'automarkup',
              'maintainers_include', 'sphinx.ext.autosectionlabel',
              'kernel_abi', 'kernel_feat', 'translations']

Additional context

No response

@jayaddison
Copy link
Contributor Author

I'm beginning to think that this is not a bug in Sphinx (nor docutils), because I'm not sure that there are any scoping rules for link targets in reStructuredText.

@jayaddison
Copy link
Contributor Author

(...but even if so, I suppose we could warn about situations where duplicate link target definitions are encountered)

@jayaddison
Copy link
Contributor Author

I think this may be a false alarm. We already have code to detect and warn about declaration of duplicate labels:

if name in self.labels:
logger.warning(__('duplicate label %s, other instance in %s'),
name, env.doc2path(self.labels[name][0]),
location=node)

I don't know why that message doesn't appear in the build output from that package yet, though.

@jayaddison
Copy link
Contributor Author

Docutils nodes that contain refuri are ignored during detection of duplicate labels; that logic goes back to 910e488 (quite some time ago, in Y2008). This appears to be the case for the hyperlink target nodes.

@jayaddison
Copy link
Contributor Author

Basically I think the fix here is to disambiguate the names of these hyperlink targets in the Linux kernel documentation. But it surprises me that we don't warn about it in Sphinx.

@jayaddison jayaddison changed the title HTML: ambiguous href resolution for locally-declared link targets that also exist elsewhere Possible bug: duplicate rST hyperlink targets don't appear to emit a warning during project processing? Feb 22, 2025
@jayaddison
Copy link
Contributor Author

A minimal reproducer for this is:

index.rst

index
=====

link to `something <duplicate>`_.

.. _duplicate:

index-testing

another.rst

another
=======

link to `something <duplicate>`_.

.. _duplicate:

testing

conf.py

# empty; no custom configuration required

@jayaddison
Copy link
Contributor Author

I believe this problem was previously identified as one of I think three non-determinism problems reported in #6714 (we fixed one of those, another remains open, and the item I'm reporting again here seems untracked).

@khanxmetu I'd like to apologize for failing to recognize your report of this problem earlier; I believe it is the same issue you described at: #6714 (comment)

I was slightly distracted by travel and my mind was focused narrowly on the toctree problem at that time, but even so I should have paused to read each update in more detail.

@electric-coder
Copy link

electric-coder commented Feb 24, 2025

I'm beginning to think that this is not a bug in Sphinx (nor docutils), because I'm not sure that there are any scoping rules for link targets in reStructuredText.

There's a difference between:

  1. Using a bare reST Internal hyperlink target or
  2. using that same internal hyperlink target together with Sphinx's :ref: role as explained in Cross-referencing arbitrary locations.

because I'm not sure that there are any scoping rules for link targets in reStructuredText.

In case 1. the Internal hyperlink target is meant to be linked within the same page, per reST spec notice the emphasis on "within a document":

They provide an end point allowing a hyperlink to connect one place to another within a document.

In case 2. it's Sphinx's :ref: role itself that should be responsible for issuing a warning if there is a duplicate target.

The MRE in the above post is correct because the :ref: role isn't used, so it's pure reST and having duplicate names for the internal hyperlink target isn't a problem because they necessarily refer to links within the same .rst document.

(At least from experience that's how I read the difference between the reST spec and how Sphinx's :ref: role builds on it, I didn't check the linked issue's code.)

@jayaddison
Copy link
Contributor Author

Thank you @electric-coder. If I understand your explanation and the referenced docutils documentation correctly, we should consider this behaviour a bug -- because it's invalid for those internal hyperlink targets to resolve from other documents.

@electric-coder
Copy link

electric-coder commented Feb 25, 2025

we should consider this behaviour a bug (...)

because it's invalid for those internal hyperlink targets to resolve from other documents.

I'm not sure if it is a bug, the reST spec says: "within a document" but extension of the behavior by Sphinx (to make the links work cross-document if there aren't multiple targets) doesn't seem invalid... Maybe it can be considered an undocumented Sphinx feature extending the basic reST spec of internal hyperlink targets (cross-reference functionalities are also an extension building on basic reST) - but this begs the question: the "internal" in "internal hyperlink targets" would the be internal to what? (Internal to the whole "docs" doesn't make much sense, or does it?!)

If it is, indeed, undefined and not a bug the community should arbitrate what to consider it. What would be the reasons for or against it? Maybe enforcing use of :ref: would contribute to not muddy the waters between Sphinx/reST and contribute to less ambiguous practices over some loss in syntactic conciseness?!

I honestly don't know so at this point I'll defer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants