store: optimized data retrieval from Prometheus #4857

GiedriusS · 2021-11-12T11:27:31Z

Changes

Use the new Range/Interval query hints to only select data which is
really needed - instead of sending one big query with min/max time,
divide it into more, smaller queries, each requesting for a smaller
amount of time. Only do this where the step is at least four times
larger than the range to avoid sending a bunch of queries that together
request the same data. Via the benchmarks, I've been able to deduce that
the difference must be at least four times as large to make sense. Here is how the
benchmark numbers look like:

name                                                                  time/op
QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16  7.37ms ± 5%
QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16  37.5ms ± 5%
QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16    6.62ms ±14%
QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16    2.84ms ±15%

name                                                                  alloc/op
QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16  41.3kB ± 1%
QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16  34.2kB ± 1%
QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16    41.1kB ± 1%
QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16    33.0kB ± 0%

name                                                                  allocs/op
QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16     389 ± 0%
QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16     265 ± 0%
QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16       389 ± 0%
QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16       264 ± 0%

In the future, this optimization could be applied to other components that have a TSDB as well
such as Receive/Rule.

I added CHANGELOG entry for this change.
Change is not relevant to the end user.

Signed-off-by: Giedrius Statkevičius [email protected]

Verification

Running manually with quickstart, tests pass.

Use the new Range/Interval query hints to only select data which is really needed - instead of sending one big query with min/max time, divide it into more, smaller queries, each requesting for a smaller amount of time. Only do this where the step is at least four times larger than the range to avoid sending a bunch of queries that together request the same data. Via the benchmarks I've been able to deduce that the difference must be at least four times as large. Here is how the benchmark numbers look like: ``` name time/op QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16 7.37ms ± 5% QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16 37.5ms ± 5% QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16 6.62ms ±14% QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16 2.84ms ±15% name alloc/op QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16 41.3kB ± 1% QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16 34.2kB ± 1% QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16 41.1kB ± 1% QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16 33.0kB ± 0% name allocs/op QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16 389 ± 0% QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16 265 ± 0% QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16 389 ± 0% QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16 264 ± 0% ``` Signed-off-by: Giedrius Statkevičius <[email protected]>

Signed-off-by: Giedrius Statkevičius <[email protected]>

GiedriusS · 2021-11-12T15:46:10Z

@yeya24 this is the optimization 💪 benchmark numbers look pretty good

matej-g

Looks like a pretty solid gain for a relatively small change, nicely done! Looks good!

yeya24 · 2021-11-12T15:53:36Z

Is this a way to make sure the query result remains the same? Can we run the query compliance suite in CI and make sure the result is consistent as the last release?

yeya24 · 2021-11-12T17:01:06Z

pkg/store/prometheus.go

+	timeranges := []timerange{}
+	// Range must be at least four times smaller than the step otherwise
+	// this optimization does not make sense. Deduced this from BenchmarkQueryRangeInterval.
+	if r.Range != 0 && r.Range*4 < r.Step {


Why it should be 4 times smaller? Could you explain it a little bit more?

My logic is the following: I ran the benchmark with different conditionals:

r.Range < r.Step

r.Range*2 < r.Step

r.Range*3 < r.Step

r.Range*4 < r.Step
...

Only after r.Range*4 < r.Step the benchmarks showed an improvement. This is presumably because Prometheus has to perform extra work for each query i.e. locking/unlocking, hence negating any possible improvement. (https://github.com/prometheus/prometheus/blob/f678ae2146a6a54222ee16c382d6a8354473491a/storage/remote/read_handler.go#L191-L245)

Actually, I see now that Prometheus itself supports passing down Range/Step via the queries 🤔 (https://github.com/prometheus/prometheus/blob/f678ae2146a6a54222ee16c382d6a8354473491a/storage/remote/read_handler.go#L216) let me try and see whether it still shows an improvement with those values simply passed down to Prometheus

Yeah, when you simply pass down hints then there are no improvements:

diff --git a/pkg/store/prometheus.go b/pkg/store/prometheus.go index 64e276f1..7b9a97b1 100644 --- a/pkg/store/prometheus.go +++ b/pkg/store/prometheus.go @@ -162,16 +162,7 @@ func (p *PrometheusStore) Series(r *storepb.SeriesRequest, s storepb.Store_Serie r.MinTime = availableMinTime } - timeranges := []timerange{} - // Range must be at least four times smaller than the step otherwise - // this optimization does not make sense. Deduced this from BenchmarkQueryRangeInterval. - if r.Range != 0 && r.Range*4 < r.Step { - for ts := r.MinTime; ts <= r.MaxTime; ts += r.Step { - timeranges = append(timeranges, timerange{ts, ts + r.Range}) - } - } else { - timeranges = []timerange{{r.MinTime, r.MaxTime}} - } + timeranges := []timerange{{r.MinTime, r.MaxTime}} if r.SkipChunks { labelMaps, err := p.client.SeriesInGRPC(s.Context(), p.base, matchers, r.MinTime, r.MaxTime) @@ -196,7 +187,15 @@ func (p *PrometheusStore) Series(r *storepb.SeriesRequest, s storepb.Store_Serie queries := []*prompb.Query{} for _, tr := range timeranges { - q := &prompb.Query{StartTimestampMs: tr.mintime, EndTimestampMs: tr.maxtime} + q := &prompb.Query{StartTimestampMs: tr.mintime, + EndTimestampMs: tr.maxtime, + Hints: &prompb.ReadHints{ + StartMs: tr.mintime, + EndMs: tr.maxtime, + StepMs: r.Step, + RangeMs: r.Range, + }, + } for _, m := range matchers { pm := &prompb.LabelMatcher{Name: m.Name, Value: m.Value}

Result:

name time/op QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16 8.08ms ± 8% QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16 41.4ms ± 4% QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16 8.01ms ± 9% QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16 39.7ms ± 3% name alloc/op QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16 41.3kB ± 0% QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16 34.4kB ± 1% QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16 41.3kB ± 1% QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16 34.5kB ± 0% name allocs/op QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16 389 ± 0% QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16 265 ± 0% QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16 389 ± 0% QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16 265 ± 0%

test/e2e/query_test.go

GiedriusS · 2021-11-15T10:57:44Z

Is this a way to make sure the query result remains the same? Can we run the query compliance suite in CI and make sure the result is consistent as the last release?

I ran it after fixing the compliance suite (#4873):

compatibility_test.go:74: ================================================================================
        General query tweaks:
        *  Thanos requires adding "external_labels" to distinguish Prometheus servers, leading to extra labels in query results that need to be stripped before comparing results.
        ================================================================================
        Total: 529 / 529 (100.00%) passed, 0 unsupported
         112 / 529 [--------------->___________________________________________________________] 21.17% ? p/s174 / 529 [------------------------>__________________________________________________] 32.89% ? p/s268 / 529 [------------------------------------->_____________________________________] 50.66% ? p/s331 / 529 [--------------------------------------------->___________________________] 62.57% 364 p/s372 / 529 [--------------------------------------------------->_____________________] 70.32% 364 p/s480 / 529 [------------------------------------------------------------------>______] 90.74% 364 p/s520 / 529 [----------------------------------------------------------------------->_] 98.30% 361 p/s524 / 529 [------------------------------------------------------------------------>] 99.05% 361 p/s528 / 529 [------------------------------------------------------------------------>] 99.81% 361 p/s529 / 529 [------------------------------------------------------------------------] 100.00% 326 p/s
=== RUN   TestPromQLCompliance/sidecar
    compatibility_test.go:82: ================================================================================
        General query tweaks:
        *  Thanos requires adding "external_labels" to distinguish Prometheus servers, leading to extra labels in query results that need to be stripped before comparing results.
        ================================================================================
        Total: 529 / 529 (100.00%) passed, 0 unsupported
         125 / 529 [----------------->_________________________________________________________] 23.63% ? p/s186 / 529 [-------------------------->________________________________________________] 35.16% ? p/s289 / 529 [---------------------------------------->__________________________________] 54.63% ? p/s355 / 529 [------------------------------------------------>________________________] 67.11% 383 p/s431 / 529 [----------------------------------------------------------->_____________] 81.47% 383 p/s519 / 529 [----------------------------------------------------------------------->_] 98.11% 383 p/s523 / 529 [------------------------------------------------------------------------>] 98.87% 377 p/s527 / 529 [------------------------------------------------------------------------>] 99.62% 377 p/s529 / 529 [------------------------------------------------------------------------] 100.00% 361 p/s

bwplotka

Thanks!

I am afraid there are small details that could introduce slight bugs in overall results with this technique. Wonder if we could use https://github.com/GiedriusS/thanos/blob/972c688041ba090c4b9d3b58e9c25b541945f5fc/pkg/testutil/testorbench.go#L26 to ensure the correctness of the result for some cases, but I doubt it will catch all problems.

The potential problems I see:

Deduplication might not work correctly for sidecars/Prometheus in HA if we return only chunks of data.
We might not return staleness markers correctly.

The amount you take for range vector is tricky too. Is it always exactly 5m no samples after and before for [5m]? 🤔 cc @brian-brazil

I think we also need at least to document it better that it works only for rates. BTW is this useful only for rates why it is only for rates - if I do `count(metric{}) for 24h with 1h step, I also could just fetch 24 samples only in theory, no?

Lot's of unknowns essentially.

bwplotka

I think at least we need to have some correctness tests with sidecar and such step.

brian-brazil · 2021-11-15T11:46:54Z

From a quick peek this looks safe from a correctness standpoint, as the discarded data is never used by PromQL. However I would wonder if the queries this helps with are reasonable queries in the first place. If you're not covering all of the range, then the result of e.g. rate() could be highly misleading as you'd be ignoring at least 75% of data. I'd suggest that it's not worth spending complexity on optimising bad queries.

yeya24 · 2021-12-12T06:51:40Z

@GiedriusS Would you still like to keep working on this or we just switch to the pushdown approach?

stale · 2022-03-02T10:26:17Z

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

GiedriusS added 3 commits November 12, 2021 13:15

CHANGELOG: add item

486d72f

Signed-off-by: Giedrius Statkevičius <[email protected]>

Merge remote-tracking branch 'origin/main' into optimize_sidecar

660a94a

Signed-off-by: Giedrius Statkevičius <[email protected]>

matej-g approved these changes Nov 12, 2021

View reviewed changes

yeya24 reviewed Nov 12, 2021

View reviewed changes

test/e2e/query_test.go Show resolved Hide resolved

bwplotka reviewed Nov 15, 2021

View reviewed changes

bwplotka requested changes Nov 15, 2021

View reviewed changes

stale bot added the stale label Mar 2, 2022

stale bot closed this Apr 16, 2022

yeya24 mentioned this pull request Jan 4, 2023

Implement distributed execution thanos-io/promql-engine#139

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

store: optimized data retrieval from Prometheus #4857

store: optimized data retrieval from Prometheus #4857

GiedriusS commented Nov 12, 2021 •

edited

Loading

GiedriusS commented Nov 12, 2021 •

edited

Loading

matej-g left a comment

yeya24 commented Nov 12, 2021

yeya24 Nov 12, 2021

GiedriusS Nov 15, 2021 •

edited

Loading

GiedriusS Nov 15, 2021

GiedriusS commented Nov 15, 2021

bwplotka left a comment •

edited

Loading

bwplotka left a comment

brian-brazil commented Nov 15, 2021

yeya24 commented Dec 12, 2021

stale bot commented Mar 2, 2022

store: optimized data retrieval from Prometheus #4857

store: optimized data retrieval from Prometheus #4857

Conversation

GiedriusS commented Nov 12, 2021 • edited Loading

Changes

Verification

GiedriusS commented Nov 12, 2021 • edited Loading

matej-g left a comment

Choose a reason for hiding this comment

yeya24 commented Nov 12, 2021

yeya24 Nov 12, 2021

Choose a reason for hiding this comment

GiedriusS Nov 15, 2021 • edited Loading

Choose a reason for hiding this comment

GiedriusS Nov 15, 2021

Choose a reason for hiding this comment

GiedriusS commented Nov 15, 2021

bwplotka left a comment • edited Loading

Choose a reason for hiding this comment

bwplotka left a comment

Choose a reason for hiding this comment

brian-brazil commented Nov 15, 2021

yeya24 commented Dec 12, 2021

stale bot commented Mar 2, 2022

GiedriusS commented Nov 12, 2021 •

edited

Loading

GiedriusS commented Nov 12, 2021 •

edited

Loading

GiedriusS Nov 15, 2021 •

edited

Loading

bwplotka left a comment •

edited

Loading