-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade Storage Backends to V2 Storage API #6458
Comments
…o v1 (#6485) ## Which problem is this PR solving? - Resolves #6480 ## Description of the changes - This PR implements a reverse adapter (`SpanReader`) that wraps a native v2 storage interface (`tracestore.Reader`) and downgrades it to implement the v1 storage interface (`spanstore.Reader`). - The reverse adapter was integrated with the v1 query service. This code path will only get executed once we start upgrading the existing storage implementations to implement the new `tracestore.Reader` interface as a part of #6458 ## How was this change tested? - CI - Added new unit tests ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Mahad Zaryab <[email protected]>
Hello @mahadzaryab1. I am Devaansh Kumar and I am interested in applying for this project under LFX'25. I have some experience with open source earlier as I had participated in GSoC'24 under Kubernetes. I have a few questions:
Thank you |
…o v1 (jaegertracing#6485) ## Which problem is this PR solving? - Resolves jaegertracing#6480 ## Description of the changes - This PR implements a reverse adapter (`SpanReader`) that wraps a native v2 storage interface (`tracestore.Reader`) and downgrades it to implement the v1 storage interface (`spanstore.Reader`). - The reverse adapter was integrated with the v1 query service. This code path will only get executed once we start upgrading the existing storage implementations to implement the new `tracestore.Reader` interface as a part of jaegertracing#6458 ## How was this change tested? - CI - Added new unit tests ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Mahad Zaryab <[email protected]> Signed-off-by: adityachopra29 <[email protected]>
@yurishkuro I had one more question. Do you expect to move the above storages completely to v2 or have both v1 and v2 present side-by-side and give the user choice on which API version to use? |
Move completely. We're about to upgrade the write pipeline to use v2 storage API anyway, so we won't need v1 API in the future. |
@yurishkuro love to work on this project in lfx 2025! |
@yurishkuro @mahadzaryab1 I have gone through the READ path, where we process two types of requests: HTTP requests and gRPC requests. We have set up two processes to handle read requests. Let's focus on HTTP: The UI sends a request to the HTTP server, and the For the full request flow now, trace data undergoes two conversions: Questions: |
It is already changed according to query service v2. See these files: https://github.com/jaegertracing/jaeger/blob/main/cmd/query/app/apiv3/http_gateway.go and https://github.com/jaegertracing/jaeger/blob/main/cmd/query/app/apiv3/grpc_handler.go |
@mahadzaryab1 @yurishkuro I have a doubt regarding this issue. Let's take an example of upgrading |
Elasticsearch already implemented a method for writing traces into storage. |
@zzzk1 Thanks for this! So are you saying to take inspiration from this and implement our own method? Because I don't think we can use this directly in jaeger. And for memory we have to discuss for the implementaion because that is jaeger specific! |
Yes, The |
@zzzk1 no, it's not. Yes, both these handlers operate on querysvc/v1, but internally that service is already instantiated with v2 storage and it just downgrades it to v1 if necessary jaeger/cmd/query/app/querysvc/query_service.go Lines 57 to 63 in 1ac7d6d
|
@yurishkuro I could understand that we want to get rid of the conversion from OTEL model but do we need to change the id which is getting stored in the database? If yes, then how we can ensure backward compatibilty? Or should we move forward by converting the OTEL span id to jaeger specific id? |
the IDs are compatible, and besides Jaeger treats them as completely opaque (same as OTEL). |
@yurishkuro Is there any specific reason why we have implemented the tenant struct's services set as a map with the value section being an empty struct instead of a simpler array? Similarly, the operations map's value part is a map whose value portion is an empty struct.
|
|
@yurishkuro The changes in memory storage are not expected to be backward compatible, right? I can only imagine restarting the storage after the update which will eventually lead to loss of old data (because it's in-memory). Am I right? Or is there any backup mechanism? |
@Manik2708 being backwards compatible has nothing to do with persistence / backup mechanism. Memory storage is transient, it's a cache. Upgrading it to v2 is just changing its API, not it's features. |
@yurishkuro I am interested in this opportunity. I am interested in both this and the UI. Can I add an application for both? |
Actually the Tenant structure looks like this: jaeger/internal/storage/v1/memory/memory.go Lines 32 to 41 in 316cbe3
For changing the API we need to change the jaeger specific models to OTEL specific models. Or am I understanding wrong? Do we just need to change the APIs and then convert the OTEL models to jaeger models and keeping the store intact and same? |
Tenant struct is not part of the API. The API is https://github.com/jaegertracing/jaeger/blob/main/internal/storage/v1/api/spanstore/interface.go to be changed to https://github.com/jaegertracing/jaeger/tree/main/internal/storage/v2/api/tracestore |
@yurishkuro but for that wouldn't we also have to change how the storage backend stores traces, for instance the memory backend stores traces with the If this is the wrong path, do we instead want to change how the models such as |
yes, we want to change the internal implementation. It's up to the storage backend in what format it stores the data, as long as the external API is OTLP. |
@yurishkuro I have a doubt in the design of the v2 APIs, currently we are returning traces in this form:
but shouldn't it be:
Why I think so? Because I was reading about iterator in go for a while and the reason why iterators sought to be better than slices is because: we need not to wait for the whole slice. We can get and process every trace and then do whatever we want to do with it (directly stream them to the client). So if iterator takes the chunk as a slice then what is the point of using iterator? because as far as I can think is there will always be a single chunk. Also while returning this iterator we still have to wait for the whole slice to get completed (that is we have to wait for all the traces). Conclusively I think that using an iterator with slice is opposing the use of an iterator (I may be wrong, please correct me if wrong). Please see this
|
@Manik2708 I may be wrong here, but I think it's because a trace might span over multiple otlp |
Maybe you are right! That's why I wrote processing the traces. GetTraces is taking multiple trace ids (earlier it was a single trace id). This confused me because trace ID is unique so I thought why we need multiple traces for a single trace id. But then I am now doubtful that it can be possible if distinguishing factor is not ID but only resource/service. |
The method returns a slice in each iteration because the API is optimized for storage communication, and the storage may find it more efficient to return a single payload with multiple traces, eg one iteration with 10 traces. |
## Which problem is this PR solving? Fixes a part of #6458 ## Description of the changes - The JsonSpanWriter accepts the json span instead of jaeger span which makes it reusabe in v2 storage APIs ## How was this change tested? - Unit Tests ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Manik2708 <[email protected]>
…ertracing#6796) ## Which problem is this PR solving? Fixes a part of jaegertracing#6458 ## Description of the changes - The JsonSpanWriter accepts the json span instead of jaeger span which makes it reusabe in v2 storage APIs ## How was this change tested? - Unit Tests ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Manik2708 <[email protected]>
…make them reusable for v2 APIs (#6828) ## Which problem is this PR solving? Fixes a part of: #6458 ## Description of the changes - Refactoring of SpanReader for make it reusable for v2 APIs ## How was this change tested? - Unit Tests ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Manik2708 <[email protected]> Signed-off-by: Yuri Shkuro <[email protected]> Co-authored-by: Yuri Shkuro <[email protected]>
… for v2 APIs (#6831) ## Which problem is this PR solving? Fixes a part of: #6458 ## Description of the changes - Refactoring of FindTraceIDs ## How was this change tested? - Unit Tests ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Manik2708 <[email protected]>
## Which problem is this PR solving? Fixes a part of #6458 ## Description of the changes - As discussed in the comment #6845 (comment), legacy trace id is moved to feature gate ## How was this change tested? - Unit and Integration tests ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Manik2708 <[email protected]>
This is a project proposed as part of LFX Mentorship term #6470 ⬅ read this first.
Background
Jaeger is an open-source, distributed tracing platform designed to monitor and troubleshoot microservices-based systems. A critical component of Jaeger is its storage backends, where traces captured by Jaeger are persisted for querying.
Currently, Jaeger uses a v1 Storage API, which operates on a data model specific to Jaeger. Each storage backend implements this API, requiring transformations between Jaeger's proprietary model and the OpenTelemetry Protocol (OTLP) data model, which is now the industry standard.
As part of #5079, Jaeger has introduced the more efficient v2 Storage API, which natively supports the OpenTelemetry data model (OTLP), allows batching of writes and streaming of resultes. This effort is part of a broader alignment with the OpenTelemetry Collector framework, tracked under #4843.
Objective
Upgrade Jaeger storage backends to natively implement the v2 Storage API.
The chosen storage backend should be upgraded to fully implement the v2 Storage API in place. For a rough idea of how to upgrade from the v1 model to the OTLP data model, take a look at the PRs in the following issues that do a similar upgrade for other components of Jaeger:
Desired Outcomes
Upgrade Memory and Elasticsearch backends
We prioritize these two backends as they are the mostly frequently used with Jaeger and upgrading them paves a path for upgrading other backends.
Testing
Bonus: Upgrade Other Backends
If time permits, upgrade Badger and Cassandra storage backends.
Risks / Open Questions
The text was updated successfully, but these errors were encountered: