-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trace HTTPClient request execution #320
base: main
Are you sure you want to change the base?
Conversation
Can one of the admins verify this patch? |
4 similar comments
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
I chatted with @ktoso earlier to discuss the manual context propagation, and we agreed that we probably shouldn't deprecate the "old" API accepting a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So since technically we're 0.1 and something may change... how do we want to tackle adoption here.
I was thinking to kick off a branch like tracing
for now, so we can polish up there and once we're all confident merge into mainline? We could also tag those tracing releases, they'd follow normal releases e.g. 1.2.2-tracing.
I don't really expect anything breaking in the core APIs but the open telemetry support which we may want to use here could still fluctuate a little bit until they're final hmmm...
Package.swift
Outdated
], | ||
targets: [ | ||
.target( | ||
name: "AsyncHTTPClient", | ||
dependencies: ["NIO", "NIOHTTP1", "NIOSSL", "NIOConcurrencyHelpers", "NIOHTTPCompression", | ||
"NIOFoundationCompat", "NIOTransportServices", "Logging"] | ||
"NIOFoundationCompat", "NIOTransportServices", "Logging", "Instrumentation"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we right away go with Tracing
and do the full thing in a single PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's my intention. I've added a checklist to the PR including creating a Span. I first wanted to get the instrumentation part down and then continue with tracing, but all inside this PR.
047fbb0
to
87085d9
Compare
@swift-server-bot add to whitelist |
I'd like to punt this to a side-branch for iterative development if we can. |
Sure, sounds like a good approach. I can change the target branch once it's created. |
I've opened up the |
@ktoso The CI seems to fail because the Baggage repo cannot be cloned through the Git URL. Should we pin Tracing to 0.1.1 here in order to get the fix? (apple/swift-distributed-tracing/pull/25) |
No, we need to tag a 0.1.1, I'll do that in a moment. |
0.1.1. tagged, please depend on that. Thanks Cory for the development branch, sounds good 👍 |
87085d9
to
ae7268d
Compare
@swift-server-bot test this please |
Can drafts get CI validation? 🤔 |
Yes, they can: I think the CI isn't targeting that branch at the moment. |
ae7268d
to
329522c
Compare
329522c
to
d68cb8f
Compare
e6e48ef
to
5e7ddf1
Compare
@@ -30,18 +32,20 @@ extension HTTPClient { | |||
public func execute( | |||
_ request: HTTPClientRequest, | |||
deadline: NIODeadline, | |||
logger: Logger? = nil | |||
logger: Logger? = nil, | |||
context: ServiceContext? = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a parameter is technically breaking, so we'll need one more variant of this function with the original signature and @_disfavoredOverload
and a deprecation warning, calling out to this new method.
@@ -55,12 +59,14 @@ extension HTTPClient { | |||
public func execute( | |||
_ request: HTTPClientRequest, | |||
timeout: TimeAmount, | |||
logger: Logger? = nil | |||
logger: Logger? = nil, | |||
context: ServiceContext? = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here re re-adding the previous variant for backwards compatibility.
var request = request | ||
request.head.headers.propagate(span.context) | ||
span.updateAttributes { attributes in | ||
attributes["http.request.method"] = request.head.method.rawValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also attach the path as a separate attribute, or is url.full
the only place it's expected to show up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path may be nice tbh... it's not a "standardized" convention but I could see it be useful... Up to y'all if we want to duplicate; url.full seems to be the "expected one" to set otel wise...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
url.path
seems to be recommended for the server span only here: https://opentelemetry.io/docs/specs/semconv/http/http-spans/#http-server-semantic-conventions? But here it's simply listed, without that server guidance: https://opentelemetry.io/docs/specs/semconv/attributes-registry/url/#url-path
I guess wouldn't hurt to add it, but I guess the client is expected not to have the full URL broken out into components, but the server does.
I'm happy to keep it as-is, with just url.full
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think it's fine to just include url.full
for now. That being said, something I haven't yet implemented in the PR is redaction as recommended (although experimental
) in OTel SemConv:
https://opentelemetry.io/docs/specs/semconv/http/http-spans/#http-client
[4]
url.full
: For network calls, URL usually has scheme://host[:port][path][?query][#fragment] format, where the fragment is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless.
url.full
MUST NOT contain credentials passed via URL in form of https://username:[email protected]/. In such case username and password SHOULD be redacted and attribute’s value SHOULD be https://REDACTED:[email protected]/.
Sensitive content provided inurl.full
SHOULD be scrubbed when instrumentations can identify it.
Experimental Query string values for the following keys SHOULD be redacted by default and replaced by the value REDACTED:
AWSAccessKeyId / Signature / sig / X-Goog-Signature
This list is subject to change over time.
When a query string value is redacted, the query string key SHOULD still be preserved, e.g. https://www.example.com/path?color=blue&sig=REDACTED.
What do you think about adding these? I'm a bit torn but I don't think we'd want to include these specific keys in AsyncHTTPClient
. Perhaps we could skip adding them for now and only redact the basic auth parameters, and tackle the other redaction in swift-otel/swift-otel-semantic-conventions and eventually add a dependency on it to AsyncHTTPClient
.
} catch { | ||
span.setStatus(.init(code: .error)) | ||
span.attributes["error.type"] = "\(type(of: error))" | ||
throw error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean the span doesn't include the error description, and it's up to user code to attach it higher up the stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comes back to the entire "can we log errors" (not really since no idea what they contain) discussion we were having elsewhere recently. 😅
The otel guidance is the same tbh: https://opentelemetry.io/docs/specs/semconv/general/recording-errors/#recording-errors-on-spans but you'll notice it gets hand wavy with it with exception messages -- Java influenced wording there;
SHOULD set the span status code to Error
SHOULD set the error.type attribute
SHOULD set the span status description when it has additional information about the error which is not expected to contain sensitive details and aligns with Span Status Description definition.
It’s NOT RECOMMENDED to duplicate status code or error.type in span status description.
When the operation fails with an exception, the span status description SHOULD be set to the exception message.
So that leaves us back at step 1 where we need to decide what is safe -- the "exception message" is not a thing in Swift after all, and "(error)" may or may not be safe...
Unless we're able to confirm all errors that can be thrown here are "safe to be logged" and only from inside the HTTP infra and don't include request details it doesn't seem like we should log the "message" (that being the "(error)" in swift). I'm not sure if we're 100% confident about it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure, but I wouldn't want this PR to be blocked on that, I'm happy with the first iteration not including the message here.
}, | ||
onCancel: { | ||
cancelHandler.cancel(reason: .taskCanceled) | ||
return try await withSpan(request.head.method.rawValue, context: context, ofKind: .client) { span in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM on using just the method as low cardinality name (matches otel well) 👍
}, | ||
onCancel: { | ||
cancelHandler.cancel(reason: .taskCanceled) | ||
return try await withSpan(request.head.method.rawValue, context: context, ofKind: .client) { span in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the Tracer
be a property of the client, to allow for easier unit testing? And just default it to the current process-bootstrapped tracer in the initializer, unless it's overridden?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do that, yeah. Elsewhere we just use InstrumentationSystem.bootstrapInternal
in unit tests which also works fine but I'd be happy either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless there are downsides, I prefer the explicit dependency injection. Easy enough to debug and reason about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me as long as we don't expose this initializer publicly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the downside of adding it to the public initializer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It goes against the way Swift Distributed Tracing usually works, where a single tracer is chosen for the system and used transparently everywhere. Having said that, Logging also allows passing specific Logger
s around and still use LoggingSystem
. I'm curious to hear what @ktoso thinks about this.
Motivation:
Context Propagation
In order to instrument distributed systems, metadata such as trace ids
must be propagated across network boundaries through HTTP headers.
As HTTPClient operates at one such boundary, it should take care of
injecting metadata into HTTP headers automatically using the configured
instrument.
Built-in tracing
Furthermore,
HTTPClient
should create aSpan
for executed requestsunder the hood, so that users benefit from tracing effortlessly.
Modifications:
Span
for executed HTTP requestResult:
AsyncHTTPClient
automatically creates a Distributed Tracing span per requestAsyncHTTPClient
propagates service context to server, making server spans children of client spans