-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(fetch): properly redirect non-ascii location header url #2971
fix(fetch): properly redirect non-ascii location header url #2971
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is correct spec-wise
@anonrig do you know why this is needed? |
cc @lemire |
Please don't merge it until I have some time to review this tomorrow. |
I am confused |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KhafraDev This is intended, and the output is correct. Look at the example below:
> let u = new URL('http://127.0.0.1:51619/redirect')
undefined
> u.pathname = '/ì\x95\x88ë\x85\x95'
'/ì\x95\x88ë\x85\x95'
> u
URL {
href: 'http://127.0.0.1:51619/%C3%AC%C2%95%C2%88%C3%AB%C2%85%C2%95',
origin: 'http://127.0.0.1:51619',
protocol: 'http:',
username: '',
password: '',
host: '127.0.0.1:51619',
hostname: '127.0.0.1',
port: '51619',
pathname: '/%C3%AC%C2%95%C2%88%C3%AB%C2%85%C2%95',
search: '',
searchParams: URLSearchParams {},
hash: ''
}
URL specification path state (https://url.spec.whatwg.org/#path-state) says that:
UTF-8 percent-encode c using the path percent-encode set and append the result to buffer.
You can replicate the same behavior on Safari as well. |
lib/web/fetch/util.js
Outdated
@@ -44,7 +44,8 @@ function responseLocationURL (response, requestFragment) { | |||
// 3. If location is a header value, then set location to the result of | |||
// parsing location with response’s URL. | |||
if (location !== null && isValidHeaderValue(location)) { | |||
location = new URL(location, responseURL(response)) | |||
// Make sure location is properly encoded. | |||
location = new URL(Buffer.from(location, 'ascii').toString('utf8'), responseURL(response)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems strange to me: Buffer.from(location, 'ascii').toString('utf8')
. I suggest at least some comments to explain what the intention is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "parsing location with response URL" mean precisely? I interpret it is a pathname setter...
The spec links "parsing" to https://url.spec.whatwg.org/#concept-url-parser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lemire
This issue arises when the location header's address is not encoded as with encodeURI
, but is responded in binary form.
For example, while the result of encodeURIComponent('안녕')
is %EC%95%88%EB%85%95
, in binary form, this appears as \xEC\x95\x88\xEB\x85\x95
, same as the outcome of Buffer.from('안녕', 'utf8').toString('binary')
.
What I am attempting in this code is to normalize characters that exist outside the ASCII range into UTF-8 form and URL encode them, to ensure behavior consistent with major web browsers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KhafraDev Yes. Parsing with the base will work the same in ada, test here : ada-url/ada#611
This seems like quite a bad issue if it's causing pages not to load. How does Safari handle loading the google webstore in this case (example)? |
I've added comments to make the code clearer and revised it to be more concise. Please take a look. |
*/ | ||
function normalizeBinaryStringToUtf8 (value) { | ||
return Buffer.from(value, 'binary').toString('utf8') | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let b = Buffer.from(value, 'binary');
if(Buffer.isAscii(b)) { return value; } // skip new string alloc.
return b.toString('utf8'); // must alloc a new string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to call Buffer.from()
. Buffer.isAscii(b)
is a static function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anonrig We get at the core of the issue here. The value
is a string not a buffer. At some point it was wrongly interpreted as latin1 (maybe in http
) and so we must now convert it to a buffer and then back to a string.
I think that this function should receive a Buffer and then the Buffer could be interpreted as UTF-8 and we'd be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is far beyond the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The irony is that we seem to try to purposefully (in http) prevent people from sending UTF-8 content as UTF-8... and then we go back and say "no, it is really UTF-8".
Much technical debt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@anonrig PTAL |
This relates to...
Rationale
The fetch function behaves differently from major web browsers(chrome, safari, firefox etc), and even CURL (with
-L
option).Here is a minimal reproducible URL I made: https://non-ascii-location-header.sys.workers.dev/redirect
Worker code
By accessing the address at https://non-ascii-location-header.sys.workers.dev/redirect directly
or https://non-ascii-location-header.sys.workers.dev/ first and then making a fetch request to the /redirect path using the developer tools leads to reaching https://non-ascii-location-header.sys.workers.dev/%EC%95%88%EB%85%95
which shows
pathname /%EC%95%88%EB%85%95
as a result.However, using undici fetch function does not behave in the same way.
This issue can also be observed at chrome web store: https://chromewebstore.google.com/detail/gbgmenmdglilmbmemagekpeaodajbeei
I do not have an in-depth understanding of the fetch standard, so if this PR request violates the standard in any way, please feel free to close it.
Thank you.
Changes
Features
Bug Fixes
Breaking Changes and Deprecations
Status