Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store: optimized data retrieval from Prometheus #4857

Closed
wants to merge 3 commits into from

Conversation

GiedriusS
Copy link
Member

@GiedriusS GiedriusS commented Nov 12, 2021

Changes

Use the new Range/Interval query hints to only select data which is
really needed - instead of sending one big query with min/max time,
divide it into more, smaller queries, each requesting for a smaller
amount of time. Only do this where the step is at least four times
larger than the range to avoid sending a bunch of queries that together
request the same data. Via the benchmarks, I've been able to deduce that
the difference must be at least four times as large to make sense. Here is how the
benchmark numbers look like:

name                                                                  time/op
QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16  7.37ms ± 5%
QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16  37.5ms ± 5%
QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16    6.62ms ±14%
QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16    2.84ms ±15%

name                                                                  alloc/op
QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16  41.3kB ± 1%
QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16  34.2kB ± 1%
QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16    41.1kB ± 1%
QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16    33.0kB ± 0%

name                                                                  allocs/op
QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16     389 ± 0%
QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16     265 ± 0%
QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16       389 ± 0%
QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16       264 ± 0%

In the future, this optimization could be applied to other components that have a TSDB as well
such as Receive/Rule.

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Signed-off-by: Giedrius Statkevičius [email protected]

Verification

Running manually with quickstart, tests pass.

Use the new Range/Interval query hints to only select data which is
really needed - instead of sending one big query with min/max time,
divide it into more, smaller queries, each requesting for a smaller
amount of time. Only do this where the step is at least four times
larger than the range to avoid sending a bunch of queries that together
request the same data. Via the benchmarks I've been able to deduce that
the difference must be at least four times as large. Here is how the
benchmark numbers look like:

```
name                                                                  time/op
QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16  7.37ms ± 5%
QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16  37.5ms ± 5%
QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16    6.62ms ±14%
QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16    2.84ms ±15%

name                                                                  alloc/op
QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16  41.3kB ± 1%
QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16  34.2kB ± 1%
QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16    41.1kB ± 1%
QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16    33.0kB ± 0%

name                                                                  allocs/op
QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16     389 ± 0%
QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16     265 ± 0%
QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16       389 ± 0%
QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16       264 ± 0%
```

Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
@GiedriusS
Copy link
Member Author

GiedriusS commented Nov 12, 2021

@yeya24 this is the optimization 💪 benchmark numbers look pretty good

Copy link
Collaborator

@matej-g matej-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a pretty solid gain for a relatively small change, nicely done! Looks good!

@yeya24
Copy link
Contributor

yeya24 commented Nov 12, 2021

Is this a way to make sure the query result remains the same? Can we run the query compliance suite in CI and make sure the result is consistent as the last release?

timeranges := []timerange{}
// Range must be at least four times smaller than the step otherwise
// this optimization does not make sense. Deduced this from BenchmarkQueryRangeInterval.
if r.Range != 0 && r.Range*4 < r.Step {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it should be 4 times smaller? Could you explain it a little bit more?

Copy link
Member Author

@GiedriusS GiedriusS Nov 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My logic is the following: I ran the benchmark with different conditionals:

  • r.Range < r.Step
  • r.Range*2 < r.Step
  • r.Range*3 < r.Step
  • r.Range*4 < r.Step
    ...

Only after r.Range*4 < r.Step the benchmarks showed an improvement. This is presumably because Prometheus has to perform extra work for each query i.e. locking/unlocking, hence negating any possible improvement. (https://github.com/prometheus/prometheus/blob/f678ae2146a6a54222ee16c382d6a8354473491a/storage/remote/read_handler.go#L191-L245)

Actually, I see now that Prometheus itself supports passing down Range/Step via the queries 🤔 (https://github.com/prometheus/prometheus/blob/f678ae2146a6a54222ee16c382d6a8354473491a/storage/remote/read_handler.go#L216) let me try and see whether it still shows an improvement with those values simply passed down to Prometheus

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, when you simply pass down hints then there are no improvements:

diff --git a/pkg/store/prometheus.go b/pkg/store/prometheus.go
index 64e276f1..7b9a97b1 100644
--- a/pkg/store/prometheus.go
+++ b/pkg/store/prometheus.go
@@ -162,16 +162,7 @@ func (p *PrometheusStore) Series(r *storepb.SeriesRequest, s storepb.Store_Serie
                r.MinTime = availableMinTime
        }
 
-       timeranges := []timerange{}
-       // Range must be at least four times smaller than the step otherwise
-       // this optimization does not make sense. Deduced this from BenchmarkQueryRangeInterval.
-       if r.Range != 0 && r.Range*4 < r.Step {
-               for ts := r.MinTime; ts <= r.MaxTime; ts += r.Step {
-                       timeranges = append(timeranges, timerange{ts, ts + r.Range})
-               }
-       } else {
-               timeranges = []timerange{{r.MinTime, r.MaxTime}}
-       }
+       timeranges := []timerange{{r.MinTime, r.MaxTime}}
 
        if r.SkipChunks {
                labelMaps, err := p.client.SeriesInGRPC(s.Context(), p.base, matchers, r.MinTime, r.MaxTime)
@@ -196,7 +187,15 @@ func (p *PrometheusStore) Series(r *storepb.SeriesRequest, s storepb.Store_Serie
 
        queries := []*prompb.Query{}
        for _, tr := range timeranges {
-               q := &prompb.Query{StartTimestampMs: tr.mintime, EndTimestampMs: tr.maxtime}
+               q := &prompb.Query{StartTimestampMs: tr.mintime,
+                       EndTimestampMs: tr.maxtime,
+                       Hints: &prompb.ReadHints{
+                               StartMs: tr.mintime,
+                               EndMs:   tr.maxtime,
+                               StepMs:  r.Step,
+                               RangeMs: r.Range,
+                       },
+               }
                for _, m := range matchers {
                        pm := &prompb.LabelMatcher{Name: m.Name, Value: m.Value}

Result:

name                                                                  time/op
QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16  8.08ms ± 8%
QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16  41.4ms ± 4%
QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16    8.01ms ± 9%
QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16    39.7ms ± 3%

name                                                                  alloc/op
QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16  41.3kB ± 0%
QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16  34.4kB ± 1%
QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16    41.3kB ± 1%
QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16    34.5kB ± 0%

name                                                                  allocs/op
QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16     389 ± 0%
QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16     265 ± 0%
QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16       389 ± 0%
QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16       265 ± 0%

@GiedriusS
Copy link
Member Author

Is this a way to make sure the query result remains the same? Can we run the query compliance suite in CI and make sure the result is consistent as the last release?

I ran it after fixing the compliance suite (#4873):

compatibility_test.go:74: ================================================================================
        General query tweaks:
        *  Thanos requires adding "external_labels" to distinguish Prometheus servers, leading to extra labels in query results that need to be stripped before comparing results.
        ================================================================================
        Total: 529 / 529 (100.00%) passed, 0 unsupported
         112 / 529 [--------------->___________________________________________________________] 21.17% ? p/s174 / 529 [------------------------>__________________________________________________] 32.89% ? p/s268 / 529 [------------------------------------->_____________________________________] 50.66% ? p/s331 / 529 [--------------------------------------------->___________________________] 62.57% 364 p/s372 / 529 [--------------------------------------------------->_____________________] 70.32% 364 p/s480 / 529 [------------------------------------------------------------------>______] 90.74% 364 p/s520 / 529 [----------------------------------------------------------------------->_] 98.30% 361 p/s524 / 529 [------------------------------------------------------------------------>] 99.05% 361 p/s528 / 529 [------------------------------------------------------------------------>] 99.81% 361 p/s529 / 529 [------------------------------------------------------------------------] 100.00% 326 p/s
=== RUN   TestPromQLCompliance/sidecar
    compatibility_test.go:82: ================================================================================
        General query tweaks:
        *  Thanos requires adding "external_labels" to distinguish Prometheus servers, leading to extra labels in query results that need to be stripped before comparing results.
        ================================================================================
        Total: 529 / 529 (100.00%) passed, 0 unsupported
         125 / 529 [----------------->_________________________________________________________] 23.63% ? p/s186 / 529 [-------------------------->________________________________________________] 35.16% ? p/s289 / 529 [---------------------------------------->__________________________________] 54.63% ? p/s355 / 529 [------------------------------------------------>________________________] 67.11% 383 p/s431 / 529 [----------------------------------------------------------->_____________] 81.47% 383 p/s519 / 529 [----------------------------------------------------------------------->_] 98.11% 383 p/s523 / 529 [------------------------------------------------------------------------>] 98.87% 377 p/s527 / 529 [------------------------------------------------------------------------>] 99.62% 377 p/s529 / 529 [------------------------------------------------------------------------] 100.00% 361 p/s

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I am afraid there are small details that could introduce slight bugs in overall results with this technique. Wonder if we could use https://github.com/GiedriusS/thanos/blob/972c688041ba090c4b9d3b58e9c25b541945f5fc/pkg/testutil/testorbench.go#L26 to ensure the correctness of the result for some cases, but I doubt it will catch all problems.

The potential problems I see:

  • Deduplication might not work correctly for sidecars/Prometheus in HA if we return only chunks of data.
  • We might not return staleness markers correctly.

The amount you take for range vector is tricky too. Is it always exactly 5m no samples after and before for [5m]? 🤔 cc @brian-brazil

I think we also need at least to document it better that it works only for rates. BTW is this useful only for rates why it is only for rates - if I do `count(metric{}) for 24h with 1h step, I also could just fetch 24 samples only in theory, no?

Lot's of unknowns essentially.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at least we need to have some correctness tests with sidecar and such step.

@brian-brazil
Copy link
Contributor

From a quick peek this looks safe from a correctness standpoint, as the discarded data is never used by PromQL. However I would wonder if the queries this helps with are reasonable queries in the first place. If you're not covering all of the range, then the result of e.g. rate() could be highly misleading as you'd be ignoring at least 75% of data. I'd suggest that it's not worth spending complexity on optimising bad queries.

@yeya24
Copy link
Contributor

yeya24 commented Dec 12, 2021

@GiedriusS Would you still like to keep working on this or we just switch to the pushdown approach?

@stale
Copy link

stale bot commented Mar 2, 2022

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants