Upgrade Storage Backends to V2 Storage API #6458

mahadzaryab1 · 2025-01-01T20:29:40Z

This is a project proposed as part of LFX Mentorship term #6470 ⬅ read this first.

Background

Jaeger is an open-source, distributed tracing platform designed to monitor and troubleshoot microservices-based systems. A critical component of Jaeger is its storage backends, where traces captured by Jaeger are persisted for querying.

Currently, Jaeger uses a v1 Storage API, which operates on a data model specific to Jaeger. Each storage backend implements this API, requiring transformations between Jaeger's proprietary model and the OpenTelemetry Protocol (OTLP) data model, which is now the industry standard.

As part of #5079, Jaeger has introduced the more efficient v2 Storage API, which natively supports the OpenTelemetry data model (OTLP), allows batching of writes and streaming of resultes. This effort is part of a broader alignment with the OpenTelemetry Collector framework, tracked under #4843.

Objective

Upgrade Jaeger storage backends to natively implement the v2 Storage API.

The chosen storage backend should be upgraded to fully implement the v2 Storage API in place. For a rough idea of how to upgrade from the v1 model to the OTLP data model, take a look at the PRs in the following issues that do a similar upgrade for other components of Jaeger:

Desired Outcomes

Upgrade Memory and Elasticsearch backends

We prioritize these two backends as they are the mostly frequently used with Jaeger and upgrading them paves a path for upgrading other backends.

Testing

The storage implementations should be fully covered by unit tests
There are already integration tests for the storage backend so all of them should pass without needing to be modified

Bonus: Upgrade Other Backends

If time permits, upgrade Badger and Cassandra storage backends.

Risks / Open Questions

The v2 storage API doesn't have a distinction between primary and archive storage but v1 does. The ultimate plan is to remove the archive storage from the v1 implementation as well. That work effort is being tracked in Phase out the distinction between primary and archive storage #6065. We may want to think about how to handle the upgrades for the storages that implement the archive storage in v1 while we work on removing it. We may want to simply ignore the archive part of the storage while we resolve the aforementioned issue if that is possible.

…o v1 (#6485) ## Which problem is this PR solving? - Resolves #6480 ## Description of the changes - This PR implements a reverse adapter (`SpanReader`) that wraps a native v2 storage interface (`tracestore.Reader`) and downgrades it to implement the v1 storage interface (`spanstore.Reader`). - The reverse adapter was integrated with the v1 query service. This code path will only get executed once we start upgrading the existing storage implementations to implement the new `tracestore.Reader` interface as a part of #6458 ## How was this change tested? - CI - Added new unit tests ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Mahad Zaryab <[email protected]>

Devaansh-Kumar · 2025-01-07T19:35:14Z

Hello @mahadzaryab1. I am Devaansh Kumar and I am interested in applying for this project under LFX'25. I have some experience with open source earlier as I had participated in GSoC'24 under Kubernetes.

I have a few questions:

What was the reason behind removing the distinction between primary and archive storage for v2 API? It seems intuitive to me to have both in v2.
For contributors new to Jaeger’s codebase, are there specific areas or smaller tasks you’d recommend starting with to get familiarized?

Thank you

yurishkuro · 2025-01-07T20:09:37Z

@Devaansh-Kumar

…o v1 (jaegertracing#6485) ## Which problem is this PR solving? - Resolves jaegertracing#6480 ## Description of the changes - This PR implements a reverse adapter (`SpanReader`) that wraps a native v2 storage interface (`tracestore.Reader`) and downgrades it to implement the v1 storage interface (`spanstore.Reader`). - The reverse adapter was integrated with the v1 query service. This code path will only get executed once we start upgrading the existing storage implementations to implement the new `tracestore.Reader` interface as a part of jaegertracing#6458 ## How was this change tested? - CI - Added new unit tests ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Mahad Zaryab <[email protected]> Signed-off-by: adityachopra29 <[email protected]>

Devaansh-Kumar · 2025-01-11T08:25:53Z

@yurishkuro I had one more question. Do you expect to move the above storages completely to v2 or have both v1 and v2 present side-by-side and give the user choice on which API version to use?

yurishkuro · 2025-01-11T16:14:47Z

Move completely. We're about to upgrade the write pipeline to use v2 storage API anyway, so we won't need v1 API in the future.

yp969803 · 2025-01-14T01:21:32Z

@yurishkuro love to work on this project in lfx 2025!

zhengkezhou1 · 2025-02-08T15:22:55Z

@yurishkuro @mahadzaryab1 I have gone through the READ path, where we process two types of requests: HTTP requests and gRPC requests. We have set up two processes to handle read requests.

Let's focus on HTTP: The UI sends a request to the HTTP server, and the http_handler routes it to a method, which then calls queryservice to retrieve the required data. Currently, we are still using v1 methods and models. For example, in http_handler.search, we use model.Trace and qs.spanReader.FindTraces instead of ptrace.Traces and qs.traceReader.FindTraces. The retrieved model is then serialized into a JSON object.

For the full request flow now, trace data undergoes two conversions:
Database model → Jaeger v1 model → JSON object

Questions:
Is it necessary to refactor http_handler and grpc_handler to v2 before working on this issue?
Based on Jaeger v2 for Otel collector, the read path allows us to convert the OTEL model directly into a JSON object. Can we achieve the same for our backend storages?
I found that the following storages might support this approach:
Elasticsearch
OpenSearch
Kafka
Clickhouse
Cassandra

Manik2708 · 2025-02-08T18:59:05Z

Is it necessary to refactor http_handler and grpc_handler to v2 before working on this issue?

It is already changed according to query service v2. See these files: https://github.com/jaegertracing/jaeger/blob/main/cmd/query/app/apiv3/http_gateway.go and https://github.com/jaegertracing/jaeger/blob/main/cmd/query/app/apiv3/grpc_handler.go

Manik2708 · 2025-02-09T12:25:24Z

@mahadzaryab1 @yurishkuro I have a doubt regarding this issue. Let's take an example of upgrading Writer from v1 to v2. Writer of v1 writes only 1 span whereas that of v2 takes traces to write. So while implementing writer for ElasticSearch, what is expected? Do we have to use bulk apis of elasticsearch or something else? Please see this: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html Because we are not storing traces but spans. A similar question for memory storage. In memory do we only need to iterate the traces and store the spans or something like batch writes?

zhengkezhou1 · 2025-02-09T12:37:36Z

@mahadzaryab1 @yurishkuro I have a doubt regarding this issue. Let's take an example of upgrading Writer from v1 to v2. Writer of v1 writes only 1 span whereas that of v2 takes traces to write. So while implementing writer for ElasticSearch, what is expected? Do we have to use bulk apis of elasticsearch or something else? Please see this: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html Because we are not storing traces but spans. A similar question for memory storage. In memory do we only need to iterate the traces and store the spans or something like batch writes?

Elasticsearch already implemented a method for writing traces into storage.

Manik2708 · 2025-02-09T12:46:31Z

@zzzk1 Thanks for this! So are you saying to take inspiration from this and implement our own method? Because I don't think we can use this directly in jaeger. And for memory we have to discuss for the implementaion because that is jaeger specific!

zhengkezhou1 · 2025-02-09T15:52:56Z

@zzzk1 Thanks for this! So are you saying to take inspiration from this and implement our own method? Because I don't think we can use this directly in jaeger. And for memory we have to discuss for the implementaion because that is jaeger specific!

Yes, The pushTraceData method demonstrates how to send traces to Elasticsearch (ES). In Jaeger v2, built on the OpenTelemetry (OTEL) collector, data is first collected from the pipeline and efficiently(without model convert) written to storage using the v2 Writer interface via the WriteTraces(ctx context.Context, td ptrace.Traces) method.

yurishkuro · 2025-02-09T20:38:19Z

Is it necessary to refactor http_handler and grpc_handler to v2 before working on this issue?

@zzzk1 no, it's not. Yes, both these handlers operate on querysvc/v1, but internally that service is already instantiated with v2 storage and it just downgrades it to v1 if necessary

jaeger/cmd/query/app/querysvc/query_service.go

Lines 57 to 63 in 1ac7d6d

    
           func NewQueryService(traceReader tracestore.Reader, dependencyReader depstore.Reader, options QueryServiceOptions) *QueryService { 
        
           	spanReader, ok := v1adapter.GetV1Reader(traceReader) 
        
           	if !ok { 
        
           		// if the spanstore.Reader is not available, downgrade the native tracestore.Reader to 
        
           		// a spanstore.Reader 
        
           		spanReader = v1adapter.NewSpanReader(traceReader) 
        
           	}

Manik2708 · 2025-02-12T20:32:59Z

@yurishkuro I could understand that we want to get rid of the conversion from OTEL model but do we need to change the id which is getting stored in the database? If yes, then how we can ensure backward compatibilty? Or should we move forward by converting the OTEL span id to jaeger specific id?

yurishkuro · 2025-02-13T00:05:02Z

the IDs are compatible, and besides Jaeger treats them as completely opaque (same as OTEL).

adityachopra29 · 2025-02-15T21:24:43Z

@yurishkuro Is there any specific reason why we have implemented the tenant struct's services set as a map with the value section being an empty struct instead of a simpler array? Similarly, the operations map's value part is a map whose value portion is an empty struct.

jaeger/internal/storage/v1/memory/memory.go

Line 31 in c78f5c3

// Tenant is an in-memory store of traces for a single tenant

yurishkuro · 2025-02-16T03:22:24Z

map[string]struct{} is a common way to model a set.

Manik2708 · 2025-02-16T10:20:48Z

@yurishkuro The changes in memory storage are not expected to be backward compatible, right? I can only imagine restarting the storage after the update which will eventually lead to loss of old data (because it's in-memory). Am I right? Or is there any backup mechanism?

yurishkuro · 2025-02-16T13:26:50Z

@Manik2708 being backwards compatible has nothing to do with persistence / backup mechanism. Memory storage is transient, it's a cache. Upgrading it to v2 is just changing its API, not it's features.

metabiswadeep · 2025-02-16T14:31:41Z

@yurishkuro I am interested in this opportunity. I am interested in both this and the UI. Can I add an application for both?

Manik2708 · 2025-02-16T15:01:32Z

@Manik2708 being backwards compatible has nothing to do with persistence / backup mechanism. Memory storage is transient, it's a cache. Upgrading it to v2 is just changing its API, not it's features.

Actually the Tenant structure looks like this:

jaeger/internal/storage/v1/memory/memory.go

Lines 32 to 41 in 316cbe3

    
           type Tenant struct { 
        
           	sync.RWMutex 
        
           	ids        []*model.TraceID 
        
           	traces     map[model.TraceID]*model.Trace 
        
           	services   map[string]struct{} 
        
           	operations map[string]map[spanstore.Operation]struct{} 
        
           	deduper    adjuster.Adjuster 
        
           	config     Configuration 
        
           	index      int 
        
           }

For changing the API we need to change the jaeger specific models to OTEL specific models. Or am I understanding wrong? Do we just need to change the APIs and then convert the OTEL models to jaeger models and keeping the store intact and same?

yurishkuro · 2025-02-16T15:34:41Z

Tenant struct is not part of the API. The API is https://github.com/jaegertracing/jaeger/blob/main/internal/storage/v1/api/spanstore/interface.go to be changed to https://github.com/jaegertracing/jaeger/tree/main/internal/storage/v2/api/tracestore

ary82 · 2025-02-16T16:42:41Z

Tenant struct is not part of the API. The API is https://github.com/jaegertracing/jaeger/blob/main/internal/storage/v1/api/spanstore/interface.go to be changed to https://github.com/jaegertracing/jaeger/tree/main/internal/storage/v2/api/tracestore

@yurishkuro but for that wouldn't we also have to change how the storage backend stores traces, for instance the memory backend stores traces with the Tenant.traces field, which is currently map[model.TraceID]*model.Trace, for OTLP we can change this to something like map[pcommon.TraceID][]*ptrace.Span.

If this is the wrong path, do we instead want to change how the models such as model.Trace is generated, and have the OTLP []ptrace.Span(And also other OTLP stuff like Resources attributes and Scope Tags) be contained by the model.Trace?

yurishkuro · 2025-02-16T17:03:33Z

yes, we want to change the internal implementation. It's up to the storage backend in what format it stores the data, as long as the external API is OTLP.

Manik2708 · 2025-02-17T09:17:10Z

@yurishkuro I have a doubt in the design of the v2 APIs, currently we are returning traces in this form:

GetTraces(ctx context.Context, traceIDs ...GetTraceParams) iter.Seq2[[]ptrace.Traces, error]

but shouldn't it be:

GetTraces(ctx context.Context, traceIDs ...GetTraceParams) iter.Seq2[ptrace.Traces, error] // Returning an iterator of traces rather than iterator of slice of traces

Why I think so? Because I was reading about iterator in go for a while and the reason why iterators sought to be better than slices is because: we need not to wait for the whole slice. We can get and process every trace and then do whatever we want to do with it (directly stream them to the client). So if iterator takes the chunk as a slice then what is the point of using iterator? because as far as I can think is there will always be a single chunk. Also while returning this iterator we still have to wait for the whole slice to get completed (that is we have to wait for all the traces). Conclusively I think that using an iterator with slice is opposing the use of an iterator (I may be wrong, please correct me if wrong). Please see this

// Current approach
for traces, err := range GetTraces(ctx, params) {
 // Do something with the slice
}
// My proposed approach
for trace, err := range GetTraces(ctx, params) {
 // Here a single trace is a chunk. Either we can directly stream this trace to client or process it.
}

ary82 · 2025-02-17T10:06:02Z

but shouldn't it be:

GetTraces(ctx context.Context, traceIDs ...GetTraceParams) iter.Seq2[ptrace.Traces, error] // Returning an iterator of traces rather than iterator of slice of traces

@Manik2708 I may be wrong here, but I think it's because a trace might span over multiple otlp ptrace.Traces. A ptrace.Traces contains ptrace.Spans grouped over a particular Resource/Service. But a trace can span over multiple services. So that means for a complete trace, multiple ptrace.Traces may be required.
Let's wait for the confirmation though.

Manik2708 · 2025-02-17T10:19:57Z

Maybe you are right! That's why I wrote processing the traces. GetTraces is taking multiple trace ids (earlier it was a single trace id). This confused me because trace ID is unique so I thought why we need multiple traces for a single trace id. But then I am now doubtful that it can be possible if distinguishing factor is not ID but only resource/service.

yurishkuro · 2025-02-17T12:55:26Z

The method returns a slice in each iteration because the API is optimized for storage communication, and the storage may find it more efficient to return a single payload with multiple traces, eg one iteration with 10 traces.

## Which problem is this PR solving? Fixes a part of #6458 ## Description of the changes - The JsonSpanWriter accepts the json span instead of jaeger span which makes it reusabe in v2 storage APIs ## How was this change tested? - Unit Tests ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Manik2708 <[email protected]>

…ertracing#6796) ## Which problem is this PR solving? Fixes a part of jaegertracing#6458 ## Description of the changes - The JsonSpanWriter accepts the json span instead of jaeger span which makes it reusabe in v2 storage APIs ## How was this change tested? - Unit Tests ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Manik2708 <[email protected]>

…make them reusable for v2 APIs (#6828) ## Which problem is this PR solving? Fixes a part of: #6458 ## Description of the changes - Refactoring of SpanReader for make it reusable for v2 APIs ## How was this change tested? - Unit Tests ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Manik2708 <[email protected]> Signed-off-by: Yuri Shkuro <[email protected]> Co-authored-by: Yuri Shkuro <[email protected]>

… for v2 APIs (#6831) ## Which problem is this PR solving? Fixes a part of: #6458 ## Description of the changes - Refactoring of FindTraceIDs ## How was this change tested? - Unit Tests ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Manik2708 <[email protected]>

## Which problem is this PR solving? Fixes a part of #6458 ## Description of the changes - As discussed in the comment #6845 (comment), legacy trace id is moved to feature gate ## How was this change tested? - Unit and Integration tests ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Manik2708 <[email protected]>

mahadzaryab1 added the enhancement label Jan 1, 2025

dosubot bot added area/storage v2 labels Jan 1, 2025

yurishkuro mentioned this issue Jan 2, 2025

🎓 Jaeger's LFX Mentorship - Term 01 - 2025 March - May #6470

Open

5 tasks

mahadzaryab1 mentioned this issue Jan 5, 2025

[v2][storage] Implement reverse adapter to translate v2 storage api to v1 #6485

Merged

4 tasks

yurishkuro mentioned this issue Jan 13, 2025

Add e2e unit test for OTLP receiver in jaeger-v1 #6535

Closed

mahadzaryab1 mentioned this issue Feb 6, 2025

[v2][storage] Add ClickHouse storage factory interface #6686

Closed

4 tasks

Manik2708 mentioned this issue Mar 3, 2025

[es] Refactor the ES SpanWriter to make it reusable for v2 APIs #6796

Merged

4 tasks

This was referenced Mar 6, 2025

[ES] Refactored a part of SpanReader to make it reusable for V2 APIs #6824

Closed

[ES] Refactor the GetOperations and GetServices of SpanReader to make them reusable for v2 APIs #6828

Merged

Manik2708 mentioned this issue Mar 9, 2025

[ES] Refactor the FindTraceIDs of SpanReader to make them reusable for v2 APIs #6831

Merged

4 tasks

This was referenced Mar 12, 2025

[ES] Refactor FindTraces and GetTrace of SpanReader to make them reusable for v2 APIs #6845

Open

[ES] Add feature to stop Legacy Trace IDs handling in SpanReader #6848

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade Storage Backends to V2 Storage API #6458

Upgrade Storage Backends to V2 Storage API #6458

mahadzaryab1 commented Jan 1, 2025 •

edited by yurishkuro

Loading

Devaansh-Kumar commented Jan 7, 2025

yurishkuro commented Jan 7, 2025

Devaansh-Kumar commented Jan 11, 2025

yurishkuro commented Jan 11, 2025

yp969803 commented Jan 14, 2025

zhengkezhou1 commented Feb 8, 2025 •

edited

Loading

Manik2708 commented Feb 8, 2025

Manik2708 commented Feb 9, 2025

zhengkezhou1 commented Feb 9, 2025

Manik2708 commented Feb 9, 2025

zhengkezhou1 commented Feb 9, 2025 •

edited

Loading

yurishkuro commented Feb 9, 2025 •

edited

Loading

Manik2708 commented Feb 12, 2025

yurishkuro commented Feb 13, 2025

adityachopra29 commented Feb 15, 2025 •

edited

Loading

yurishkuro commented Feb 16, 2025

Manik2708 commented Feb 16, 2025

yurishkuro commented Feb 16, 2025

metabiswadeep commented Feb 16, 2025

Manik2708 commented Feb 16, 2025

yurishkuro commented Feb 16, 2025

ary82 commented Feb 16, 2025 •

edited

Loading

yurishkuro commented Feb 16, 2025

Manik2708 commented Feb 17, 2025

ary82 commented Feb 17, 2025

Manik2708 commented Feb 17, 2025

yurishkuro commented Feb 17, 2025

Upgrade Storage Backends to V2 Storage API #6458

Upgrade Storage Backends to V2 Storage API #6458

Comments

mahadzaryab1 commented Jan 1, 2025 • edited by yurishkuro Loading

Background

Objective

Desired Outcomes

Upgrade Memory and Elasticsearch backends

Testing

Bonus: Upgrade Other Backends

Risks / Open Questions

Devaansh-Kumar commented Jan 7, 2025

yurishkuro commented Jan 7, 2025

Devaansh-Kumar commented Jan 11, 2025

yurishkuro commented Jan 11, 2025

yp969803 commented Jan 14, 2025

zhengkezhou1 commented Feb 8, 2025 • edited Loading

Manik2708 commented Feb 8, 2025

Manik2708 commented Feb 9, 2025

zhengkezhou1 commented Feb 9, 2025

Manik2708 commented Feb 9, 2025

zhengkezhou1 commented Feb 9, 2025 • edited Loading

yurishkuro commented Feb 9, 2025 • edited Loading

Manik2708 commented Feb 12, 2025

yurishkuro commented Feb 13, 2025

adityachopra29 commented Feb 15, 2025 • edited Loading

yurishkuro commented Feb 16, 2025

Manik2708 commented Feb 16, 2025

yurishkuro commented Feb 16, 2025

metabiswadeep commented Feb 16, 2025

Manik2708 commented Feb 16, 2025

yurishkuro commented Feb 16, 2025

ary82 commented Feb 16, 2025 • edited Loading

yurishkuro commented Feb 16, 2025

Manik2708 commented Feb 17, 2025

ary82 commented Feb 17, 2025

Manik2708 commented Feb 17, 2025

yurishkuro commented Feb 17, 2025

mahadzaryab1 commented Jan 1, 2025 •

edited by yurishkuro

Loading

zhengkezhou1 commented Feb 8, 2025 •

edited

Loading

zhengkezhou1 commented Feb 9, 2025 •

edited

Loading

yurishkuro commented Feb 9, 2025 •

edited

Loading

adityachopra29 commented Feb 15, 2025 •

edited

Loading

ary82 commented Feb 16, 2025 •

edited

Loading