-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http: Use GURL as HTTP URL parser utility #11670
Conversation
Signed-off-by: Dhi Aurrahman <[email protected]>
Signed-off-by: Dhi Aurrahman <[email protected]>
Signed-off-by: Dhi Aurrahman <[email protected]>
Signed-off-by: Dhi Aurrahman <[email protected]>
Signed-off-by: Dhi Aurrahman <[email protected]>
Signed-off-by: Dhi Aurrahman <[email protected]>
-#if defined(__clang__) && __has_attribute(uninitialized) | ||
+#if defined(__clang__) | ||
+#if defined(__has_attribute) | ||
+#if __has_attribute(uninitialized) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we put /Za
when compiling on Windows, MSVC's C4067 warning is treated as an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it interesting given that googletest itself supports msvc, but does not support clang-cl.
cc_library( | ||
name = "strings", | ||
srcs = [ | ||
- "string16.cc", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string16
doesn't make sense to use on 2-byte wchar_t systems (e.g. Windows).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I trust this patch is operating entirely on ISO-8859-1 with utf-8 enablement? Should be no wide char anything, this isn't java.
"string_piece.h", | ||
"string_piece_forward.h", | ||
"string_util.h", | ||
- "string_util_posix.h", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I'm still not super happy with the size of this patch. @wrowe @sunjayBhatia do you have any thoughts on what we can do to avoid having to hand craft this in Envoy? If @dio merges, would you folks be able to followup with upstream to eliminate this patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put this review on full pause until we have 11107 merged, and rebased? That will pick up //test/... and we'll be in a better position to give this our full attention, since regressions are taking up most of our free cycles for the past days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. I leave it to @htuch to decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can wait for #11107.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So today we should see 11107 merged. We did go ahead this week and build dio's branch locally with //test/... on Windows, and didn't observe any significant or outright failures, although there are several compilation emits that should be addressed before we merge this to master. This will become apparent to all when it is rebuilt after 11107.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make things very clear, we are patching the deliberate exclusion of one particular architecture, which served no purpose and discouraged development by any interested contributor. I don't know how a patch to revert such a silly commit would be received at the upstream community, so we can make no promises how long a patch will have to live at envoy, @htuch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#11107 is merged and this PR has picked it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. I'm wondering if we can get some agreement on the long term ownership of this patch. From my perspective, a patch this size is not sustainable in Envoy and that can't be the long term strategy; this will impact our velocity and add friction to upgrading to new versions of GURL in response to security issues. OTOH, I'm eager to move forward and we can accept a patch in the interim.
@dio would it make sense to just try upstream this and see what happens?
@wrowe would you folks be able to take over this patch and long term efforts to shrink it down via tactical upstreaming if this can't be easily done?
tags = ["requires-rtti"], | ||
visibility = ["//visibility:private"], | ||
deps = [":headers"], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@htuch: This is an autoconf dependency, can we just use foreign_cc
to avoid having to hand-maintain a bespoke BUILD file for the project?
@dio: I'm testing it, but it seems the build time is significantly increasing.
@htuch: Interesting, I wonder if it's doing something like building tests or additional libraries. Might be possible to control with some autoconf flags...
@dio: I believe I have turned off everything (that makes sense, like examples, tests, even tools). And it is tricky to force using msvc (when using configure it tends to use mingw and it is failed at configure stage, and I don't think we want to mix mingw and msvc).
Ref: #10599 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, what is an autoconf dependency? If one is introduced by this patch, that will break the contribution of all four engineers working on the Windows port by breaking master. The one I am familiar with is the opentracing dependency which we have excluded for now, but that only breaks tracing extensions. Breaking http/1 seems unwise. Otherwise the basic idea of this bazel BUILD hack looks entirely reasonable, and there is no expectation it would be fixed upstream, because we are cherry-picking a project for subcomponents that don't have any independent project management and don't appear to be supported stand-alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with this BUILD (as it isn't a huge patch). In terms of being an autoconf dependency, I'm basically saying it should build under rules_foreign_cc
, which knows how to build autoconf.
gperftools
is an autoconf dep today that we build this way, see https://github.com/envoyproxy/envoy/blob/master/bazel/foreign_cc/BUILD#L13.
Are you saying in the future, we can't take any external dependency that is autoconf based?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, autoconf is gnu tooling and while it runs in tandem the pseudo-unix cygwin toolchain there is no Windows port, and it's likely to be a thorn in the side of other odd SDKs on mobile and such.
Most every project building on autoconf offers an alternative mechanism for some windows build, however well thought out and supported, or not. external/org_unicode_icuuc/icu4c/source/allinone/ is the Windows build logic by the icu4c maintainers.
Most projects are moving in the direction of supporting CMake for a much simpler cross-platform experience, even as they retain autoconf. This makes the build agnostic of the flavor of Visual Studio and even msvc vs llvm clang-cl.
For Envoy, maintaining a bazel BUILD file for this project makes perfect sense, since the lib objects all match the compiler and link choices of this project. We need to evaluate what the autoconf results produce in terms of source file toggles and changes, even if we avoid these long-term.
For upstream, propose a CMakeFiles.txt to cover their existing autoconf determinations. Our rules_foreign_cc support in bazel is much better with CMake than less-predictable autoconf output. This probably results in the most flexibility to the maintainers and would be our simplest solution short of this short-term BUILD maintenance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wrowe can you propose some amendment to https://github.com/envoyproxy/envoy/blob/master/bazel/EXTERNAL_DEPS.md based on this?
I don't think it sounds unreasonable, but we should take this through a review process and ensure folks are aligned, as it's a major project restriction.
How come gperftools isn't impacting you? Don't you use tcmalloc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tcmalloc is disabled on Windows if you look at .bazelrc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gperftools as of 2015 was being ported to Windows using the tcmalloc_minimal implementation. We have have documented this exclusion for Windows both in .bazelrc and in bazel/foreign_cc/BUILD.
Until bazel test //test/... is largely building clean with all currently tagged "fails_on_windows", gperftools hasn't been and won't be a particularly high priority. The same is true for opentracing and all tracing extensions on Windows.
THE major difference between all that and this proposed enhancement is that the envoyproxy is really nothing without a url parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but I would like to get this down in writing, so that we can ensure everyone is on-board and don't relitigate this each time we add a new dependency. Can you do the PR to https://github.com/envoyproxy/envoy/blob/master/bazel/EXTERNAL_DEPS.md arguing to remove autoconf in rules_foreign_cc for Windows reasons for all deps that are core across platforms?
bool Url::initializeForConnect(GURL&& url) { | ||
// CONNECT requests can only contain "hostname:port" | ||
// https://github.com/nodejs/http-parser/blob/d9275da4650fd1133ddc96480df32a9efe4b059b/http_parser.c#L2503-L2506. | ||
if (!url.is_valid() || url.IsStandard()) { | ||
return false; | ||
} | ||
|
||
const auto& parsed = url.parsed_for_possibly_invalid_spec(); | ||
// The parsed.scheme contains the URL's hostname (stored by GURL). While host and port have -1 | ||
// as its length. | ||
if (parsed.scheme.len <= 0 || parsed.host.len > 0 || parsed.port.len > 0) { | ||
return false; | ||
} | ||
|
||
host_and_port_ = url.possibly_invalid_spec(); | ||
const auto& parts = StringUtil::splitToken(host_and_port_, ":", /*keep_empty_string=*/true, | ||
/*trim_whitespace=*/false); | ||
if (parts.size() != 2 || static_cast<size_t>(parsed.scheme.len) != parts.at(0).size() || | ||
!validPortForConnect(parts.at(1))) { | ||
return false; | ||
} | ||
|
||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alyssawilk I'm not sure if this is the right thing to do for making sure CONNECT
works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm a bit confused why EXPECT_FALSE(url.initialize("foo.com:65536", is_connect)); fails. Could be I'm insufficiently caffeinated though :-P
Also as long as I'm looking, do we think this should be runtime guarded, even with a shorter duration? I could imagine this having behavioral changes in weird corner cases though ideally it should be good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
65536 is greater than std::numeric_limits<uint16_t>::max()
. Not in 1-65535 valid port range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: Runtime guard. SGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, gotcha. maybe comments there but yeah, otherwise this looks fine.
If folks prefer we could also split out connect validation into its own class since it's a weird one-off but I suspect it'll be useful to have a consistent API and this how the old class worked so LGTM!
Signed-off-by: Dhi Aurrahman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks for reviving. My main concern is just around the size of the manual patch for GURL and ICUUC. If there is any way or plan to reduce it, it would be really awesome. Otherwise I'm happy to ship and iterate.
"string_piece.h", | ||
"string_piece_forward.h", | ||
"string_util.h", | ||
- "string_util_posix.h", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I'm still not super happy with the size of this patch. @wrowe @sunjayBhatia do you have any thoughts on what we can do to avoid having to hand craft this in Envoy? If @dio merges, would you folks be able to followup with upstream to eliminate this patch?
@htuch after this is merged, I'll try to play around with the |
Signed-off-by: Dhi Aurrahman <[email protected]>
Signed-off-by: Dhi Aurrahman <[email protected]>
Signed-off-by: Dhi Aurrahman <[email protected]>
/azp run envoy-presubmit |
Azure Pipelines successfully started running 1 pipeline(s). |
Merging master to pick up #11107. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo the patch/ICUUC threads. Excited to see this ready to land..
"string_piece.h", | ||
"string_piece_forward.h", | ||
"string_util.h", | ||
- "string_util_posix.h", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. I'm wondering if we can get some agreement on the long term ownership of this patch. From my perspective, a patch this size is not sustainable in Envoy and that can't be the long term strategy; this will impact our velocity and add friction to upgrading to new versions of GURL in response to security issues. OTOH, I'm eager to move forward and we can accept a patch in the interim.
@dio would it make sense to just try upstream this and see what happens?
@wrowe would you folks be able to take over this patch and long term efforts to shrink it down via tactical upstreaming if this can't be easily done?
tags = ["requires-rtti"], | ||
visibility = ["//visibility:private"], | ||
deps = [":headers"], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but I would like to get this down in writing, so that we can ensure everyone is on-board and don't relitigate this each time we add a new dependency. Can you do the PR to https://github.com/envoyproxy/envoy/blob/master/bazel/EXTERNAL_DEPS.md arguing to remove autoconf in rules_foreign_cc for Windows reasons for all deps that are core across platforms?
Do you mean to submit it to the Chromium project? I can try. Probably, we need @alyssawilk's perspective and guidance on how to do that properly. |
@dio yep. I think @alyssawilk and @danzh2010 probably could both give some advice on how to approach. Doesn't Chromium build for Windows anyway? I'm pretty sure I remember running Chrome on Windows last time I used it :-P |
@htuch haha, yeah. However only clang-cl is supported on Windows, see https://crbug.com/988071. We're doing MSVC. I'm a bit scared. 😶 |
@dio maybe just start by opening a Chromium issue and see if it's not too controversial? |
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
I'm trying to join the discussion here: https://bugs.chromium.org/p/chromium/issues/detail?id=1053958. |
Signed-off-by: Dhi Aurrahman <[email protected]>
Thanks, @htuch! Updated the patch with TODO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! Epic.
org_unicode_icuuc = dict( | ||
strip_prefix = "icu-release-64-2", | ||
sha256 = "524960ac99d086cdb6988d2a92fc163436fd3c6ec0a84c475c6382fbf989be05", | ||
urls = ["https://github.com/unicode-org/icu/archive/release-64-2.tar.gz"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dio is there any reason you picked this April 2019 rather than the latest https://github.com/unicode-org/icu/releases/tag/release-67-1 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BUILD file is inspired from Tensorflow project as known to be stable. Will try to update along the way. Thanks for the reminder! https://github.com/tensorflow/tensorflow/blob/master/third_party/icu/workspace.bzl. Tracked in here. #12015
After envoyproxy#11670, the CSRF filter started failing for us. This change fixes 3 issues that were uncovered after moving to gURL for parsing URLs: 1) the hostAndPort() utility method, in the CSRF filter, was returning a string view of a stack variable. 2) the Origin header always includes the scheme, so let's ensure this is illustrated in tests (which were missing this and passing due to relaxed checks). 3) the Url::initialize method expects an absolute URL, something that the CSRF filter wasn't complying with. Signed-off-by: Raul Gutierrez Segales <[email protected]>
After #11670, the CSRF filter started failing for us. This change fixes 3 issues that were uncovered after moving to gURL for parsing URLs: 1) the hostAndPort() utility method, in the CSRF filter, was returning a string view of a stack variable. 2) the Origin header always includes the scheme, so let's ensure this is illustrated in tests (which were missing this and passing due to relaxed checks). 3) the Url::initialize method expects an absolute URL, something that the CSRF filter wasn't complying with. Signed-off-by: Raul Gutierrez Segales <[email protected]>
After envoyproxy#11670, the CSRF filter started failing for us. This change fixes 3 issues that were uncovered after moving to gURL for parsing URLs: 1) the hostAndPort() utility method, in the CSRF filter, was returning a string view of a stack variable. 2) the Origin header always includes the scheme, so let's ensure this is illustrated in tests (which were missing this and passing due to relaxed checks). 3) the Url::initialize method expects an absolute URL, something that the CSRF filter wasn't complying with. Signed-off-by: Raul Gutierrez Segales <[email protected]> Signed-off-by: Kevin Baichoo <[email protected]>
This replaces http_parser's URL parser with GURL. Risk Level: Medium Testing: Unit Docs Changes: N/A Release Notes: N/A Signed-off-by: Dhi Aurrahman <[email protected]> Signed-off-by: scheler <[email protected]>
After envoyproxy#11670, the CSRF filter started failing for us. This change fixes 3 issues that were uncovered after moving to gURL for parsing URLs: 1) the hostAndPort() utility method, in the CSRF filter, was returning a string view of a stack variable. 2) the Origin header always includes the scheme, so let's ensure this is illustrated in tests (which were missing this and passing due to relaxed checks). 3) the Url::initialize method expects an absolute URL, something that the CSRF filter wasn't complying with. Signed-off-by: Raul Gutierrez Segales <[email protected]> Signed-off-by: scheler <[email protected]>
Commit Message: http: Use GURL as HTTP URL parser utility
Additional Description: This replaces
http_parser
's URL parser with GURL.Risk Level: Medium?
Testing: Unit
Docs Changes: N/A
Release Notes: N/A
Signed-off-by: Dhi Aurrahman [email protected]