-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS S3 behavior changes when using the auto instrumentation #1609
AWS S3 behavior changes when using the auto instrumentation #1609
Comments
Tagging component owners @carolabadeer @blumamir |
I think this is a P1 bug. I'll assign that priority for now and the code owner can update it if required. |
Hey we ran into this too, its not isolated to s3. Seems to effect requests that are retried too. Kinesis without retry: Kinesis with retry: This is using these versions
From this https://github.com/baselime/lambda-node-opentelemetry I am looking into a fix and if it's helpful I can post anything that I find here |
Howdy, Any update on this? Would be good to at least get some triage info from the maintainers if possible... Would upgrading to aws api v3 fix this? Thank you !! |
I ran into this issue during our migration to OpenTelemetry as well, the problem is with instrumentation-http and not the AWS SDK instrumentation specifically. The Regarding the specific issue that we ran into with the AWS SDK making calls to S3, we have a bucket located in The solution is simple enough, replace the value of |
@ten-ans hey I also followed up with this bug on the Aws end. I was surprised that it only affected the V2 sdk and not V3. I found that the V2 sdk cached the signature it passes adds and does not recompute it on retries. I raised an issue here and got a simpler work around AWS.Signers.V4.prototype.unsignableHeaders.push('traceparent'); Which I have tested and confirmed that it works. Hope this helps |
Thanks @ten-ans and @Ankcorn for your thorough investigation of this issue! @Ankcorn from reading through the issue you opened in the AWS SDK repo, it seems that they confirmed it is an issue with the AWS SDK v2 (and does not persist in v3). Is the suggested workaround a change that can be added in an application or an instrumentation change? |
Hey @carolabadeer, Thanks for taking a look at this :) aws/aws-sdk-js#4473 The PR to add traceparent has not been looked at yet. I also don't know if it totally solves the problem, AFAIK would lambda runtimes still pull in an old and incompatible version of the aws-sdk? This would cause some confusion for new OTEL users. We might be able to fix this by adding the
to the AWS SDK instrumentation. Thoughts? |
Hey @mhausenblas is there someone from the AWS SDK team we can coordinate with? @martinkuba is going to spend some time here too |
The issue -- as @ten-ans pointed out above (#1609 (comment)) -- is that the HttpInstrumentation modifies the http.get(someUrl, { headers: { Foo: 'Bar' } }, (res) => { ... }); Often, calling code won't notice this change. However, in aws-sdk code that retries, it keeps a reference to that headers object, so this is what happens:
This can be seen in this example: const { NodeSDK, tracing } = require('@opentelemetry/sdk-node');
const { HttpInstrumentation } = require('@opentelemetry/instrumentation-http');
const sdk = new NodeSDK({
serviceName: 'make-http-req',
spanProcessor: new tracing.SimpleSpanProcessor(new tracing.ConsoleSpanExporter()),
instrumentations: [
new HttpInstrumentation(),
]
})
sdk.start();
const http = require('http');
const myHeaders = {
Foo: 'Bar'
}
http.get('http://localhost:8200/', { headers: myHeaders }, (res) => {
console.log('response: %s %s', res.statusCode, res.headers);
res.resume();
res.on('end', () => {
console.log('response end');
});
});
console.log('http.get() called: was myHeaders mutated? %s', myHeaders); Running that shows that
I got lucky here, because I hit this before in Elastic's APM agent. A longer write-up is here: elastic/apm-agent-nodejs#2134 It makes sense why -- as noted in open-telemetry/opentelemetry-js#3922 -- that open-telemetry/opentelemetry-js@4978743 changed the behaviour. The header normalization from before that commit was indirectly creating a clone of the headers object. possible fixAn alternative to #1813 would be something like this -- again, as @ten-ans suggested above. diff --git a/experimental/packages/opentelemetry-instrumentation-http/src/http.ts b/experimental/packages/opentelemetry-instrumentation-http/src/http.ts
index 9422bbc9..780f5c32 100644
--- a/experimental/packages/opentelemetry-instrumentation-http/src/http.ts
+++ b/experimental/packages/opentelemetry-instrumentation-http/src/http.ts
@@ -672,6 +672,10 @@ export class HttpInstrumentation extends InstrumentationBase<Http> {
if (!optionsParsed.headers) {
optionsParsed.headers = {};
+ } else {
+ // Make a copy of the headers object to avoid mutating an object the
+ // caller might have a reference to.
+ optionsParsed.headers = Object.assign({}, optionsParsed.headers);
}
propagation.inject(requestContext, optionsParsed.headers);
@martinkuba Thoughts? Should I make a PR for that? |
I've added a PR for the alternative fix here: open-telemetry/opentelemetry-js#4346 |
…going http requests (#4346) Fixes: open-telemetry/opentelemetry-js-contrib#1609
To be clear, this should be fixed with the next release of |
* Add Trent to approvers (#4311) * chore(renovate): require dashboard approval for lerna updates (#4276) * chore(ci): install semver globally to speed up "peer-api" workflow (#4270) Closes: #4242 * fix(ci): remove token setup via environment variable from .npmrc (#4329) * fix(instrumentation-http): resume responses when there is no response listener Fixes a memory leak where unhandled response bodies pile up in node 20 * feat: add script to update changelogs on release preparation (#4315) * feat: add script to update changelogs on releases * fix: address comments * Apply suggestions from code review Co-authored-by: Trent Mick <[email protected]> * fix: apply suggestions from code review * fix: use packageJson.version instead of version --------- Co-authored-by: Trent Mick <[email protected]> * Fix event name * test: make rawRequest HTTP-compliant * Add node 20 to test matrix * Enable old hash functions on 20 * Fix esm handling for iitm node 20 * Use err.code to make test more reliable * Changelog * nit: single import * Remove unused files * Add v20 to supported runtimes * ci: add npm cache in actions/setup-node (#4271) * feat(sdk-logs): add droppedAttributesCount field to ReadableLogRecord (#4289) * feat(sdk-logs): add droppedAttributesCount field to ReadableLogRecord * chore: check droppedAttributesCount value in test case * feat(otlp-transformer): make toLogRecord() use ReadableLogRecord.droppedAttributesCount --------- Co-authored-by: Marc Pichler <[email protected]> * fix(api-logs): allow passing in TimeInput for LogRecord (#4345) * fix: allow passing in TimeInput for LogRecord * chore: update changelog * fix: programmatic url and headers take precedence in metric exporters… (#4334) * fix: programmatic url and headers take precedence in metric exporters (#2370) * chore: adjust grpc exporter metrics test * chore(changelog): update changelog * fix(instrumentation-http): do not mutate given headers object for outgoing http requests (#4346) Fixes: open-telemetry/opentelemetry-js-contrib#1609 * chore(deps): update actions/stale action to v9 (#4353) * fix(deps): update dependency import-in-the-middle to v1.6.0 (#4357) * chore(deps): update all patch versions (#4306) * chore(ci): use node 20 in lint workflow (#4359) * chore(deps): update dependency linkinator to v6 (#4237) * fix(otlp-exporter-base): decrease default concurrency limit to 30 (#4211) * fix(otlp-exporter-base): decrease concurrency limit to 30 * fix(changelog): add changelog entry * chore(deps): use actions/checkout >4 instead of 4.0.0 exactly (#4361) --------- Co-authored-by: Marc Pichler <[email protected]> Co-authored-by: strivly <[email protected]> Co-authored-by: Trent Mick <[email protected]> Co-authored-by: lyzlisa <[email protected]> Co-authored-by: Hyun Oh <[email protected]> Co-authored-by: Siim Kallas <[email protected]> Co-authored-by: Vladimir Adamić <[email protected]> Co-authored-by: Mend Renovate <[email protected]>
What version of OpenTelemetry are you using?
What version of Node are you using?
v16.20.0
What did you do?
When using the recommended nodejs autoinstrumentation set up found here: https://opentelemetry.io/docs/instrumentation/js/getting-started/nodejs/#setup, While having a global S3 configuration, the autoinstrumentation causes an error to be thrown when the AWS_REGION is set to a region that is not us-east-1 or aws-global. The requests will succeed when the instrumentation.js is not required.
Reproduction here: https://github.com/zdelagrange/otel-nodejs-aws-bug
What did you expect to see?
I expected the S3 functionality to not change when using the auto instrumentation
What did you see instead?
S3 functionality changed when using the auto instrumentation causing a 403/failure to be returned when a success returns without using the auto instrumentation:
Additional context
Reproduction here: https://github.com/zdelagrange/otel-nodejs-aws-bug
The text was updated successfully, but these errors were encountered: