Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace HTTPClient request execution #320

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

slashmo
Copy link

@slashmo slashmo commented Dec 5, 2020

Motivation:

Context Propagation

In order to instrument distributed systems, metadata such as trace ids
must be propagated across network boundaries through HTTP headers.
As HTTPClient operates at one such boundary, it should take care of
injecting metadata into HTTP headers automatically using the configured
instrument.

Built-in tracing

Furthermore, HTTPClient should create a Span for executed requests
under the hood, so that users benefit from tracing effortlessly.

Modifications:

  • Propagate service context to server via HTTP request headers
  • Create Span for executed HTTP request

Result:

  • AsyncHTTPClient automatically creates a Distributed Tracing span per request
  • AsyncHTTPClient propagates service context to server, making server spans children of client spans

@swift-server-bot
Copy link

Can one of the admins verify this patch?

4 similar comments
@swift-server-bot
Copy link

Can one of the admins verify this patch?

@swift-server-bot
Copy link

Can one of the admins verify this patch?

@swift-server-bot
Copy link

Can one of the admins verify this patch?

@swift-server-bot
Copy link

Can one of the admins verify this patch?

@slashmo
Copy link
Author

slashmo commented Dec 5, 2020

I chatted with @ktoso earlier to discuss the manual context propagation, and we agreed that we probably shouldn't deprecate the "old" API accepting a Logger for each request overload, as we don't want to push users too much into the direction of manual context passing because that's ideally not necessary once the mentioned language changes have been made: https://github.com/apple/swift-distributed-tracing#important-note-on-adoption

@slashmo slashmo marked this pull request as draft December 5, 2020 17:11
Copy link
Contributor

@ktoso ktoso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So since technically we're 0.1 and something may change... how do we want to tackle adoption here.

I was thinking to kick off a branch like tracing for now, so we can polish up there and once we're all confident merge into mainline? We could also tag those tracing releases, they'd follow normal releases e.g. 1.2.2-tracing.

I don't really expect anything breaking in the core APIs but the open telemetry support which we may want to use here could still fluctuate a little bit until they're final hmmm...

Package.swift Outdated
],
targets: [
.target(
name: "AsyncHTTPClient",
dependencies: ["NIO", "NIOHTTP1", "NIOSSL", "NIOConcurrencyHelpers", "NIOHTTPCompression",
"NIOFoundationCompat", "NIOTransportServices", "Logging"]
"NIOFoundationCompat", "NIOTransportServices", "Logging", "Instrumentation"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we right away go with Tracing and do the full thing in a single PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my intention. I've added a checklist to the PR including creating a Span. I first wanted to get the instrumentation part down and then continue with tracing, but all inside this PR.

@slashmo slashmo force-pushed the feature/tracing branch 2 times, most recently from 047fbb0 to 87085d9 Compare December 7, 2020 17:05
@Lukasa
Copy link
Collaborator

Lukasa commented Dec 8, 2020

@swift-server-bot add to whitelist

@Lukasa
Copy link
Collaborator

Lukasa commented Dec 8, 2020

I'd like to punt this to a side-branch for iterative development if we can.

@slashmo
Copy link
Author

slashmo commented Dec 8, 2020

I'd like to punt this to a side-branch for iterative development if we can.
@Lukasa

Sure, sounds like a good approach. I can change the target branch once it's created.

@Lukasa
Copy link
Collaborator

Lukasa commented Dec 8, 2020

I've opened up the tracing-development branch.

@slashmo slashmo changed the base branch from main to tracing-development December 8, 2020 09:21
@slashmo
Copy link
Author

slashmo commented Dec 8, 2020

@ktoso The CI seems to fail because the Baggage repo cannot be cloned through the Git URL. Should we pin Tracing to 0.1.1 here in order to get the fix? (apple/swift-distributed-tracing/pull/25)

@ktoso
Copy link
Contributor

ktoso commented Dec 8, 2020

No, we need to tag a 0.1.1, I'll do that in a moment.

@ktoso
Copy link
Contributor

ktoso commented Dec 8, 2020

0.1.1. tagged, please depend on that.

Thanks Cory for the development branch, sounds good 👍

@ktoso
Copy link
Contributor

ktoso commented Dec 8, 2020

@swift-server-bot test this please

@ktoso
Copy link
Contributor

ktoso commented Dec 8, 2020

Can drafts get CI validation? 🤔

@Lukasa
Copy link
Collaborator

Lukasa commented Dec 8, 2020

Yes, they can: I think the CI isn't targeting that branch at the moment.

@slashmo slashmo changed the base branch from tracing-development to main December 30, 2024 21:32
@@ -30,18 +32,20 @@ extension HTTPClient {
public func execute(
_ request: HTTPClientRequest,
deadline: NIODeadline,
logger: Logger? = nil
logger: Logger? = nil,
context: ServiceContext? = nil

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a parameter is technically breaking, so we'll need one more variant of this function with the original signature and @_disfavoredOverload and a deprecation warning, calling out to this new method.

@@ -55,12 +59,14 @@ extension HTTPClient {
public func execute(
_ request: HTTPClientRequest,
timeout: TimeAmount,
logger: Logger? = nil
logger: Logger? = nil,
context: ServiceContext? = nil

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here re re-adding the previous variant for backwards compatibility.

var request = request
request.head.headers.propagate(span.context)
span.updateAttributes { attributes in
attributes["http.request.method"] = request.head.method.rawValue

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also attach the path as a separate attribute, or is url.full the only place it's expected to show up?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path may be nice tbh... it's not a "standardized" convention but I could see it be useful... Up to y'all if we want to duplicate; url.full seems to be the "expected one" to set otel wise...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url.path seems to be recommended for the server span only here: https://opentelemetry.io/docs/specs/semconv/http/http-spans/#http-server-semantic-conventions? But here it's simply listed, without that server guidance: https://opentelemetry.io/docs/specs/semconv/attributes-registry/url/#url-path

I guess wouldn't hurt to add it, but I guess the client is expected not to have the full URL broken out into components, but the server does.

I'm happy to keep it as-is, with just url.full.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it's fine to just include url.full for now. That being said, something I haven't yet implemented in the PR is redaction as recommended (although experimental) in OTel SemConv:

https://opentelemetry.io/docs/specs/semconv/http/http-spans/#http-client

[4] url.full: For network calls, URL usually has scheme://host[:port][path][?query][#fragment] format, where the fragment is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless.

url.full MUST NOT contain credentials passed via URL in form of https://username:[email protected]/. In such case username and password SHOULD be redacted and attribute’s value SHOULD be https://REDACTED:[email protected]/.
Sensitive content provided in url.full SHOULD be scrubbed when instrumentations can identify it.

Experimental Query string values for the following keys SHOULD be redacted by default and replaced by the value REDACTED:
AWSAccessKeyId / Signature / sig / X-Goog-Signature
This list is subject to change over time.

When a query string value is redacted, the query string key SHOULD still be preserved, e.g. https://www.example.com/path?color=blue&sig=REDACTED.

What do you think about adding these? I'm a bit torn but I don't think we'd want to include these specific keys in AsyncHTTPClient. Perhaps we could skip adding them for now and only redact the basic auth parameters, and tackle the other redaction in swift-otel/swift-otel-semantic-conventions and eventually add a dependency on it to AsyncHTTPClient.

} catch {
span.setStatus(.init(code: .error))
span.attributes["error.type"] = "\(type(of: error))"
throw error

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean the span doesn't include the error description, and it's up to user code to attach it higher up the stack?

Copy link
Contributor

@ktoso ktoso Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comes back to the entire "can we log errors" (not really since no idea what they contain) discussion we were having elsewhere recently. 😅

The otel guidance is the same tbh: https://opentelemetry.io/docs/specs/semconv/general/recording-errors/#recording-errors-on-spans but you'll notice it gets hand wavy with it with exception messages -- Java influenced wording there;

SHOULD set the span status code to Error

SHOULD set the error.type attribute

SHOULD set the span status description when it has additional information about the error which is not expected to contain sensitive details and aligns with Span Status Description definition.

It’s NOT RECOMMENDED to duplicate status code or error.type in span status description.

When the operation fails with an exception, the span status description SHOULD be set to the exception message.

So that leaves us back at step 1 where we need to decide what is safe -- the "exception message" is not a thing in Swift after all, and "(error)" may or may not be safe...

Unless we're able to confirm all errors that can be thrown here are "safe to be logged" and only from inside the HTTP infra and don't include request details it doesn't seem like we should log the "message" (that being the "(error)" in swift). I'm not sure if we're 100% confident about it here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure, but I wouldn't want this PR to be blocked on that, I'm happy with the first iteration not including the message here.

},
onCancel: {
cancelHandler.cancel(reason: .taskCanceled)
return try await withSpan(request.head.method.rawValue, context: context, ofKind: .client) { span in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on using just the method as low cardinality name (matches otel well) 👍

},
onCancel: {
cancelHandler.cancel(reason: .taskCanceled)
return try await withSpan(request.head.method.rawValue, context: context, ofKind: .client) { span in

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the Tracer be a property of the client, to allow for easier unit testing? And just default it to the current process-bootstrapped tracer in the initializer, unless it's overridden?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do that, yeah. Elsewhere we just use InstrumentationSystem.bootstrapInternal in unit tests which also works fine but I'd be happy either way.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there are downsides, I prefer the explicit dependency injection. Easy enough to debug and reason about.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me as long as we don't expose this initializer publicly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the downside of adding it to the public initializer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It goes against the way Swift Distributed Tracing usually works, where a single tracer is chosen for the system and used transparently everywhere. Having said that, Logging also allows passing specific Loggers around and still use LoggingSystem. I'm curious to hear what @ktoso thinks about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants