Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adaptive sampling does not work with OTEL SDKs #6700

Open
yurishkuro opened this issue Feb 9, 2025 · 0 comments
Open

Adaptive sampling does not work with OTEL SDKs #6700

yurishkuro opened this issue Feb 9, 2025 · 0 comments

Comments

@yurishkuro
Copy link
Member

Problem

The adaptive sampling logic relies on span tags (sampler.type and sampler.param) provided by legacy Jaeger SDKs to determine the sampling probability and whether a trace was sampled using a lower-bound rate limiter. OpenTelemetry SDKs do not include these tags, preventing the engine from verifying if adaptive sampling rates are being correctly applied or distinguishing traces from lower-bound rate limiters.

Previous Approach and Rationale

The engine used these span tags for two primary reasons: (1) to understand the actual sampling probability used by the SDK and (2) to ignore traces sampled via lower-bound rate limiter (as their probability was not accurately reflected). This validation was implemented as a safeguard. The concern was that if a non-adaptive sampler was configured in the SDK with a very low probability, the engine's adaptive logic would continuously increase the sampling probability, eventually reaching 100%, without observing a corresponding increase in trace volume. If the SDK is later switched to respect adaptive sampling, a large spike in trace volume could occur. However, this spike would be temporary. The engine would react and adjust the sampling rate accordingly, although it might take a few minutes.

Potential Solution

While the lack of span tags prevents validation, it's not strictly critical. The original validation was primarily a preventative measure against the unlikely scenario described above: a low-probability, non-adaptive sampler being used in the SDK then replaced with a sampler that respects the probability from Jaeger.

While the ideal scenario would be to have the span tags, the absence of tags doesn't fundamentally break the system. The temporary spike is a risk, but it's manageable. The users of OTEL SDKs can be required to opt-in into this risk, until the proper behavior of sending tags can be added to OTEL SDKs.

Gemini can make mistakes, so double-check it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant