Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Store traces in ClickHouse based on Jaeger V2 #6725

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

zhengkezhou1
Copy link
Contributor

@zhengkezhou1 zhengkezhou1 commented Feb 13, 2025

Which problem is this PR solving?

Desgin doc: Jaeger V2: Support for ClickHouse as Storage Backend
Part of #5058

Description of the changes

  • Introduce clickhouse client: ch-go & clickhouse-go v2 to be used for writing/reading traces
  • Provide test container environment for ClickHouse

How was this change tested?

  • unit tests & intergation tests

Checklist

@zhengkezhou1 zhengkezhou1 force-pushed the write-path-for-clickhouse branch from a65794d to bacbf97 Compare February 13, 2025 13:51
Copy link

codecov bot commented Feb 13, 2025

Codecov Report

Attention: Patch coverage is 93.87097% with 38 lines in your changes missing coverage. Please review.

Project coverage is 95.91%. Comparing base (253bd53) to head (e62882d).

Files with missing lines Patch % Lines
internal/storage/v2/clickhouse/factory.go 77.77% 8 Missing and 4 partials ⚠️
internal/storage/v2/clickhouse/schema/schema.go 83.01% 6 Missing and 3 partials ⚠️
internal/storage/v2/clickhouse/wrapper/wrapper.go 86.76% 6 Missing and 3 partials ⚠️
internal/storage/v2/clickhouse/trace/reader.go 88.88% 6 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6725      +/-   ##
==========================================
- Coverage   96.05%   95.91%   -0.14%     
==========================================
  Files         366      376      +10     
  Lines       20750    21370     +620     
==========================================
+ Hits        19932    20498     +566     
- Misses        624      667      +43     
- Partials      194      205      +11     
Flag Coverage Δ
badger_v1 9.18% <0.00%> (-0.65%) ⬇️
badger_v2 1.98% <0.00%> (-0.01%) ⬇️
cassandra-4.x-v1-manual 13.89% <0.00%> (-0.94%) ⬇️
cassandra-4.x-v2-auto 1.97% <0.00%> (-0.01%) ⬇️
cassandra-4.x-v2-manual 1.97% <0.00%> (-0.01%) ⬇️
cassandra-5.x-v1-manual 13.89% <0.00%> (-0.94%) ⬇️
cassandra-5.x-v2-auto 1.97% <0.00%> (-0.01%) ⬇️
cassandra-5.x-v2-manual 1.97% <0.00%> (-0.01%) ⬇️
clickhouse-25.x 5.27% <69.19%> (?)
elasticsearch-6.x-v1 0.18% <0.00%> (-19.27%) ⬇️
elasticsearch-7.x-v1 0.18% <0.00%> (-19.35%) ⬇️
elasticsearch-8.x-v1 0.18% <0.00%> (-19.52%) ⬇️
elasticsearch-8.x-v2 1.98% <0.00%> (-0.01%) ⬇️
grpc_v1 10.16% <0.00%> (-0.71%) ⬇️
grpc_v2 7.92% <0.00%> (-0.01%) ⬇️
kafka-3.x-v1 9.46% <0.00%> (-0.67%) ⬇️
kafka-3.x-v2 1.98% <0.00%> (-0.01%) ⬇️
memory_v2 1.98% <0.00%> (-0.01%) ⬇️
opensearch-1.x-v1 18.45% <0.00%> (-1.14%) ⬇️
opensearch-2.x-v1 18.45% <0.00%> (-1.14%) ⬇️
opensearch-2.x-v2 1.98% <0.00%> (-0.01%) ⬇️
tailsampling-processor 0.48% <0.00%> (-0.01%) ⬇️
unittests 94.44% <78.54%> (-0.49%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@zhengkezhou1 zhengkezhou1 force-pushed the write-path-for-clickhouse branch from bacbf97 to 21c3fab Compare February 13, 2025 14:30
@zhengkezhou1 zhengkezhou1 changed the title [feat][storage]: add write path for ClickHouse based on Jaeger V2 [feat][storage] add write path for ClickHouse based on Jaeger V2 Feb 13, 2025
@zhengkezhou1 zhengkezhou1 force-pushed the write-path-for-clickhouse branch 5 times, most recently from a16370b to 72d91cc Compare February 14, 2025 07:24
@zhengkezhou1
Copy link
Contributor Author

zhengkezhou1 commented Feb 14, 2025

@yurishkuro ClickHouse integration test not working in CI. Is there anything I might have missed?

@zhengkezhou1 zhengkezhou1 force-pushed the write-path-for-clickhouse branch 2 times, most recently from 51b3f8f to b8915ef Compare February 14, 2025 10:52
@zhengkezhou1 zhengkezhou1 marked this pull request as ready for review February 14, 2025 11:00
@zhengkezhou1 zhengkezhou1 requested a review from a team as a code owner February 14, 2025 11:00
@dosubot dosubot bot added area/storage docker Pull requests that update Docker code v2 labels Feb 14, 2025
@zhengkezhou1 zhengkezhou1 force-pushed the write-path-for-clickhouse branch from 572c1a2 to 7718d65 Compare February 15, 2025 11:31
@zhengkezhou1 zhengkezhou1 changed the title [feat][storage] add write path for ClickHouse based on Jaeger V2 [feat][storage] Store Traces in ClickHouse Based on Jaeger V2 Feb 15, 2025
@zhengkezhou1 zhengkezhou1 force-pushed the write-path-for-clickhouse branch 2 times, most recently from 19bc811 to c408a25 Compare February 15, 2025 12:43
@zhengkezhou1
Copy link
Contributor Author

I would like to implement all basic features, ensure that the integration tests pass, and then return to address other low-priority tasks.

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before moving code around I suggest you read & understand the comments and then propose a new directory structure that we can agree on. This will reduce the churn.

@zhengkezhou1
Copy link
Contributor Author

zhengkezhou1 commented Feb 18, 2025

@yurishkuro The key here is that the design of the ch-go and clickhouse-go clients are very different, making it difficult to abstract them. Therefore, it's best to isolate them and use the minimal configuration file required for initialization as follows:

schema:
  #auto run DDL script
  auto: true
client:
  database: jaeger
  username: default
  password: default
  #ch-go
  writer:
    address: "127.0.0.1:9200"
    pool:
      max_connection_lifetime: 3600000000000
      max_connection_idle_time: 1800000000000
      #CPU Core number
      min_connections: 4
      #CPU Core number * 2
      max_connections: 8
      health_check_period: 60000000000
  #clickhouse-go
  reader:
    #no cluster just a different field here.
    addresses: ["node00:9200","node01:9200","node02:9200"]

The directory structure has been adjusted as follows:

internal/storage/v2/clickhouse/

  • client
    • conn Create a clickhouse-go connection.
    • pool Create ch-go connection pool.
  • config All configuration items correspond to the configuration file.
  • internal Tools required to write traces into the database.
  • wrapper Wrapper to isolate upper-layer calls from third-party implementations.
  • schema Initialization DLL scripts and providing automatic database initialization functionality.
  • tracestore Implement read and write operations for traces.

@zhengkezhou1
Copy link
Contributor Author

@yurishkuro In the implementation of GetTraces(ctx context.Context, traceIDs ...GetTraceParams),note that only the TraceID is utilized among the param in GetTraceParams. are these employed for range queries?

@yurishkuro
Copy link
Member

Code pointer?

@zhengkezhou1
Copy link
Contributor Author

zhengkezhou1 commented Feb 22, 2025

Code pointer?

ctx, span := s.startSpanForQuery(ctx, "readTrace", querySpanByTraceID)
defer span.End()
span.SetAttributes(attribute.Key("trace_id").String(traceID.String()))
trc, err := s.readTraceInSpan(ctx, traceID)

And only gRPC really use them all.

stream, err := c.readerClient.GetTrace(ctx, &storage_v1.GetTraceRequest{
TraceID: query.TraceID,
StartTime: query.StartTime,
EndTime: query.EndTime,
})

Also, do you think the new code structure I proposed is suitable?

@yurishkuro
Copy link
Member

The timestamps in GetTraces request were introduced on request from 3rd party implementations (e.g. Tempo). None of the internally supported backends need these parameters because of how the db schemas are organized. But it was expected that the timestamps could be useful for ClickHouse.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR introduces ClickHouse as a storage backend for Jaeger traces, including a new clickhouse client implementation, test infrastructure, and CI configuration.

  • Adds GitHub Actions workflows and docker-compose configurations for ClickHouse e2e tests.
  • Implements new client, connection, and pool configurations along with generated mocks and their tests.
  • Updates integration tests to support ClickHouse alongside existing storage backends.

Reviewed Changes

File Description
.github/workflows/ci-e2e-clickhouse.yml Adds CI workflow for running ClickHouse integration tests; note a potential variable reference error in the job name.
internal/storage/v2/clickhouse/client/mocks/Conn.go Generated mock for connection interface; no issues found.
internal/storage/v2/clickhouse/client/mocks/Rows.go Generated mock for rows interface; no issues found.
internal/storage/integration/clickhouse_test.go Integration tests for ClickHouse storage functionality.
internal/storage/v2/clickhouse/client/pool/config_test.go Tests for default pool configuration; variable naming typo observed.
internal/storage/v2/clickhouse/client/conn/config_test.go Tests for default connection configuration; variable naming typo observed.
internal/storage/v2/clickhouse/config/config.go Configuration for ClickHouse storage validated and aligned with new client components.
internal/storage/v2/clickhouse/factory.go Factory initialization for trace writer creation and resource cleanup.
docker-compose/clickhouse/docker-compose.yml Docker compose file to set up ClickHouse for local/integration testing.
internal/storage/v2/clickhouse/client/conn/config.go Implementation of connection configuration using the ClickHouse driver.
internal/storage/v2/clickhouse/client/pool/config.go Pool configuration implementation using the ClickHouse pool driver.
.mockery.yaml Updated to generate mocks for new client interfaces.
internal/storage/v2/clickhouse/client/client.go Defines client interfaces for connection, pool, and rows.
internal/storage/integration/package_test.go Enhancements in leak testing to handle multiple storage backends.
.github/workflows/ci-e2e-all.yml Updated CI pipeline to include ClickHouse integration tests.

Copilot reviewed 33 out of 33 changed files in this pull request and generated 3 comments.

@zhengkezhou1 zhengkezhou1 force-pushed the write-path-for-clickhouse branch from 295bf44 to f050114 Compare March 1, 2025 07:33
@zhengkezhou1 zhengkezhou1 changed the title [feat][storage] Store Traces in ClickHouse Based on Jaeger V2 [feat] Store traces in ClickHouse based on Jaeger V2 Mar 1, 2025
@zhengkezhou1
Copy link
Contributor Author

The current implementation has numerous design flaws. My plan is to first implement the basic functionality: saving traces to the backend, retrieving them, and verifying them through integration testing. Then, I will enhance the implementation and add supplementary test cases.

@zhengkezhou1 zhengkezhou1 force-pushed the write-path-for-clickhouse branch from a1a580a to 167dcf1 Compare March 1, 2025 16:41
Copy link
Contributor Author

@zhengkezhou1 zhengkezhou1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro I have refactored the code structure and implemented basic functions for writing and reading traces, as you suggested.

fail-fast: false
matrix:
clickhouse-version: ["25.x"]
create-schema: [manual, auto]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can create schema automatically why do we need to support manual?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes,we should use create-schema: [auto] only.

span.References = []model.SpanRef{}
}
if span.Tags == nil {
span.Tags = []model.KeyValue{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create a separate pr for this change, no need to bundle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, please see: #6798

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if I don’t make this change, the test will fail because it compares an empty struct with nil in the assertion. However, after I rebased the branch, this issue seems to be resolved.

id: test-execution
run: bash scripts/e2e/clickhouse.sh ${{ matrix.clickhouse-version }}-${{ matrix.create-schema }}
env:
SKIP_APPLY_SCHEMA: ${{ matrix.create-schema == 'auto' && true || false }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we simply use SKIP_APPLY_SCHEMA: false here?

@zhengkezhou1 zhengkezhou1 force-pushed the write-path-for-clickhouse branch from 6981216 to 8a35cbc Compare March 4, 2025 14:17
@zhengkezhou1 zhengkezhou1 marked this pull request as ready for review March 4, 2025 14:45
@zhengkezhou1 zhengkezhou1 force-pushed the write-path-for-clickhouse branch from 8a35cbc to a1edda1 Compare March 5, 2025 08:05
fail-fast: false
matrix:
clickhouse-version: ["25.x"]
create-schema: [auto]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant why have the parameter in the first place? You seem to have copied the Cassandra behavior with manual and auto schema init, but that was a legacy state. If we can always autocreate the schema we don't need to support manual schema creation at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated it and added more unit tests.

@zhengkezhou1 zhengkezhou1 force-pushed the write-path-for-clickhouse branch 8 times, most recently from 05578ad to 9868dd3 Compare March 11, 2025 11:59
Signed-off-by: zzzk1 <[email protected]>
Signed-off-by: zhengkezhou1 <[email protected]>
Signed-off-by: zzzk1 <[email protected]>
Signed-off-by: zhengkezhou1 <[email protected]>
Signed-off-by: zhengkezhou1 [email protected]

Signed-off-by: zhengkezhou1 <[email protected]>
@zhengkezhou1 zhengkezhou1 force-pushed the write-path-for-clickhouse branch from 9868dd3 to e62882d Compare March 11, 2025 12:27
@zhengkezhou1 zhengkezhou1 marked this pull request as draft March 11, 2025 15:58
Copy link
Contributor Author

@zhengkezhou1 zhengkezhou1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro Can you please take a look at my new improvements and ideas?

Comment on lines +15 to +40
type Model struct {
Timestamp time.Time
TraceId string
SpanId string
ParentSpanId string
TraceState string
SpanName string
SpanKind string
ServiceName string
ResourceAttributesKeys []string `ch:"ResourceAttributes.keys"`
ResourceAttributesValues []string `ch:"ResourceAttributes.values"`
ScopeName string
ScopeVersion string
SpanAttributesKeys []string `ch:"SpanAttributes.keys"`
SpanAttributesValues []string `ch:"SpanAttributes.values"`
Duration uint64
StatusCode string
StatusMessage string
EventsTimestamp []time.Time `ch:"Events.Timestamp"`
EventsName []string `ch:"Events.Name"`
EventsAttributes []map[string]string `ch:"Events.Attributes"`
LinksTraceId []string `ch:"Links.TraceId"`
LinksSpanId []string `ch:"Links.SpanId"`
LinksTraceState []string `ch:"Links.TraceState"`
LinksAttributes []map[string]string `ch:"Links.Attributes"`
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should they be split as follows?:

type Model struct {
	Trace  Trace
	Span   Span
	Scope  Scope
	Events Events
	Links  Links
}
type Trace struct {
	Timestamp   time.Time
	Id          string
	State       string
	ServiceName string
	Duration    uint64
}

type Span struct {
	Id               string
	ParentId         string
	Name             string
	Kind             string
	AttributesKeys   []string `ch:"SpanAttributes.keys"`
	AttributesValues []string `ch:"SpanAttributes.values"`
	StatusCode       string
	StatusMessage    string
}

type Scope struct {
	Name                     string
	Version                  string
	ResourceAttributesKeys   []string `ch:"ResourceAttributes.keys"`
	ResourceAttributesValues []string `ch:"ResourceAttributes.values"`
}

type Events struct {
	Timestamp  []time.Time         `ch:"Events.Timestamp"`
	Name       []string            `ch:"Events.Name"`
	Attributes []map[string]string `ch:"Events.Attributes"`
}

type Links struct {
	TraceId    []string            `ch:"Links.TraceId"`
	SpanId     []string            `ch:"Links.SpanId"`
	TraceState []string            `ch:"Links.TraceState"`
	Attributes []map[string]string `ch:"Links.Attributes"`
}

}

// ConvertToTraces convert the db model read from clickhouse to OTel Traces.
func (m Model) ConvertToTraces() (ptrace.Traces, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can perform the convert operation based on the structure divided above.

"go.opentelemetry.io/collector/pdata/ptrace"
)

func TestConvertToTraces(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TableDrivenTests should be useful here.

})
}

func TestConvertLink(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same.

})
}

func TestStatusCode(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same.

})
}

func TestSpanKind(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage docker Pull requests that update Docker code v2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants