-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[linkcheck] PDF anchor (...pdf#anchor
) leads to 'utf-8' codec can't decode byte ...
#11041
Comments
I have the same issue when linking to Circumvented by removing the Quite some irony that linking to the Unicode standard runs into a UTF-8 issue :-) :-) |
I'm not sure what to suggest as a remedy for this, but from investigating why it occurs: this problem is due to the anchor-checking mechanism expecting to parse HTML content. When anchor-checking is enabled and content with a binary header is retrieved from a URI with an anchor fragment ( In other words: it's not exactly the hash fragment in the URI that causes the problem, but it is a requirement for the problem to occur -- because for links without anchors, we don't need to read the HTTP response content, only the status line. I think the most difficult decision for a fix is: what do we do for non-HTML formats like PDF when anchor-checking is enabled, and the hyperlink contains a fragment? Is it better to consider the result working (despite not checking for existence of a matching anchor destination), ignored/unchecked (informationally wasteful given that we have made a network request), or do something else? |
Self-nitpick: and sometimes response headers, I suppose - for example to handle rate-limiting. |
Describe the bug
Related to #7694.
Note that the query symbol
?
is not required when using an anchor (i.e.,#fragment
).How to Reproduce
index.rst
:$ sphinx-build -b linkcheck . build
Output:
Environment Information
The text was updated successfully, but these errors were encountered: