-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store: optimized data retrieval from Prometheus #4857
Conversation
Use the new Range/Interval query hints to only select data which is really needed - instead of sending one big query with min/max time, divide it into more, smaller queries, each requesting for a smaller amount of time. Only do this where the step is at least four times larger than the range to avoid sending a bunch of queries that together request the same data. Via the benchmarks I've been able to deduce that the difference must be at least four times as large. Here is how the benchmark numbers look like: ``` name time/op QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16 7.37ms ± 5% QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16 37.5ms ± 5% QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16 6.62ms ±14% QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16 2.84ms ±15% name alloc/op QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16 41.3kB ± 1% QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16 34.2kB ± 1% QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16 41.1kB ± 1% QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16 33.0kB ± 0% name allocs/op QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16 389 ± 0% QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16 265 ± 0% QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16 389 ± 0% QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16 264 ± 0% ``` Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
@yeya24 this is the optimization 💪 benchmark numbers look pretty good |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a pretty solid gain for a relatively small change, nicely done! Looks good!
Is this a way to make sure the query result remains the same? Can we run the query compliance suite in CI and make sure the result is consistent as the last release? |
timeranges := []timerange{} | ||
// Range must be at least four times smaller than the step otherwise | ||
// this optimization does not make sense. Deduced this from BenchmarkQueryRangeInterval. | ||
if r.Range != 0 && r.Range*4 < r.Step { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why it should be 4 times smaller? Could you explain it a little bit more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My logic is the following: I ran the benchmark with different conditionals:
r.Range < r.Step
r.Range*2 < r.Step
r.Range*3 < r.Step
r.Range*4 < r.Step
...
Only after r.Range*4 < r.Step
the benchmarks showed an improvement. This is presumably because Prometheus has to perform extra work for each query i.e. locking/unlocking, hence negating any possible improvement. (https://github.com/prometheus/prometheus/blob/f678ae2146a6a54222ee16c382d6a8354473491a/storage/remote/read_handler.go#L191-L245)
Actually, I see now that Prometheus itself supports passing down Range/Step via the queries 🤔 (https://github.com/prometheus/prometheus/blob/f678ae2146a6a54222ee16c382d6a8354473491a/storage/remote/read_handler.go#L216) let me try and see whether it still shows an improvement with those values simply passed down to Prometheus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, when you simply pass down hints then there are no improvements:
diff --git a/pkg/store/prometheus.go b/pkg/store/prometheus.go
index 64e276f1..7b9a97b1 100644
--- a/pkg/store/prometheus.go
+++ b/pkg/store/prometheus.go
@@ -162,16 +162,7 @@ func (p *PrometheusStore) Series(r *storepb.SeriesRequest, s storepb.Store_Serie
r.MinTime = availableMinTime
}
- timeranges := []timerange{}
- // Range must be at least four times smaller than the step otherwise
- // this optimization does not make sense. Deduced this from BenchmarkQueryRangeInterval.
- if r.Range != 0 && r.Range*4 < r.Step {
- for ts := r.MinTime; ts <= r.MaxTime; ts += r.Step {
- timeranges = append(timeranges, timerange{ts, ts + r.Range})
- }
- } else {
- timeranges = []timerange{{r.MinTime, r.MaxTime}}
- }
+ timeranges := []timerange{{r.MinTime, r.MaxTime}}
if r.SkipChunks {
labelMaps, err := p.client.SeriesInGRPC(s.Context(), p.base, matchers, r.MinTime, r.MaxTime)
@@ -196,7 +187,15 @@ func (p *PrometheusStore) Series(r *storepb.SeriesRequest, s storepb.Store_Serie
queries := []*prompb.Query{}
for _, tr := range timeranges {
- q := &prompb.Query{StartTimestampMs: tr.mintime, EndTimestampMs: tr.maxtime}
+ q := &prompb.Query{StartTimestampMs: tr.mintime,
+ EndTimestampMs: tr.maxtime,
+ Hints: &prompb.ReadHints{
+ StartMs: tr.mintime,
+ EndMs: tr.maxtime,
+ StepMs: r.Step,
+ RangeMs: r.Range,
+ },
+ }
for _, m := range matchers {
pm := &prompb.LabelMatcher{Name: m.Name, Value: m.Value}
Result:
name time/op
QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16 8.08ms ± 8%
QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16 41.4ms ± 4%
QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16 8.01ms ± 9%
QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16 39.7ms ± 3%
name alloc/op
QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16 41.3kB ± 0%
QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16 34.4kB ± 1%
QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16 41.3kB ± 1%
QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16 34.5kB ± 0%
name allocs/op
QueryRangeInterval/unoptimized_sidecar,_worst_case,_one_big_block-16 389 ± 0%
QueryRangeInterval/unoptimized_sidecar,_small_blocks_with_2h_step-16 265 ± 0%
QueryRangeInterval/optimized_sidecar,_worst_case,_one_big_block-16 389 ± 0%
QueryRangeInterval/optimized_sidecar,_small_blocks_with_2h_step-16 265 ± 0%
I ran it after fixing the compliance suite (#4873):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I am afraid there are small details that could introduce slight bugs in overall results with this technique. Wonder if we could use https://github.com/GiedriusS/thanos/blob/972c688041ba090c4b9d3b58e9c25b541945f5fc/pkg/testutil/testorbench.go#L26 to ensure the correctness of the result for some cases, but I doubt it will catch all problems.
The potential problems I see:
- Deduplication might not work correctly for sidecars/Prometheus in HA if we return only chunks of data.
- We might not return staleness markers correctly.
The amount you take for range vector is tricky too. Is it always exactly 5m no samples after and before for [5m]
? 🤔 cc @brian-brazil
I think we also need at least to document it better that it works only for rates
. BTW is this useful only for rates
why it is only for rates
- if I do `count(metric{}) for 24h with 1h step, I also could just fetch 24 samples only in theory, no?
Lot's of unknowns essentially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at least we need to have some correctness tests with sidecar and such step.
From a quick peek this looks safe from a correctness standpoint, as the discarded data is never used by PromQL. However I would wonder if the queries this helps with are reasonable queries in the first place. If you're not covering all of the range, then the result of e.g. |
@GiedriusS Would you still like to keep working on this or we just switch to the pushdown approach? |
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Changes
Use the new Range/Interval query hints to only select data which is
really needed - instead of sending one big query with min/max time,
divide it into more, smaller queries, each requesting for a smaller
amount of time. Only do this where the step is at least four times
larger than the range to avoid sending a bunch of queries that together
request the same data. Via the benchmarks, I've been able to deduce that
the difference must be at least four times as large to make sense. Here is how the
benchmark numbers look like:
In the future, this optimization could be applied to other components that have a TSDB as well
such as Receive/Rule.
Signed-off-by: Giedrius Statkevičius [email protected]
Verification
Running manually with quickstart, tests pass.