Streaming Result for Query/QueryRange HTTP API #10040

bwplotka · 2021-12-17T10:54:34Z

In Thanos we are moving forward with query sharding, parallelization and pushdown ideas 💪🏽 Recently, we pushed a change that is capable of safely pushdown certain aggregations to Prometheus (thanos-io/thanos#4917).

What we do in sidecar now, for certain "safe" functions like min, max, group, min_over_time, and max_over_time instead of fetching all series through remote read, we change to HTTP Query API. We build PromQL from select hints we get from root Querier and perform query. Then we transform the Query result to Thanos GRPC StoreAPI.

Now, we have efficiency problem with Query API, because it's not streamed (something I was moaning about for long time 🙈). This means we unnecessary wait for full response payload, instead of streaming each resulted series by series like we do with remote read. AFAIK for most cases PromQL performs series by series calculations, so first series is calculated before next one, so that fits server part of this API too.

Now, to solve our use case, there are two options we could do to improve Prometheus for these use cases and actually win in other fields too.

A) Introduce streaming (per series) for query and query range Prometheus HTTP APIs. We could do this exactly as we did with remote read. Don't stream by default and opt-in using header negotiation for streamed JSON using chunked JSON logic we use with profobuf in new remote read. It's not straightforward as our current JSON result with this data format was never attempted to be streamed.
B) Allow remote-read to perform PromQL using hints. It's not hard to implement, but quite weird as there is strong overlap now between Query + Query range and remote write that can do PromQL. Plus it's easy right now to make mistake and pass wrong hints (aggregations/function that are NOT safe to be pushed down).

I would vote for A. I believe streaming response on query would help with many other use cases (e.g Prometheus -> Grafana interaction and Querier -> Query frontend in Thanos).

If we are ok to pursue A I could try to work with community on proposing exact format of JSON streamed API.

cc @tomwilkie @roidelapluie @juliusv @fpetkovski @brian-brazil @moadz @RichiH

The text was updated successfully, but these errors were encountered:

brian-brazil · 2021-12-17T12:39:54Z

AFAIK for most cases PromQL performs series by series calculations,

That's not the case. This is true for range vector functions internally, however basically everything else is step by step including the functions you list.
So there's not much to be streamed, and you'd have to rewrite a large chunk of PromQL to make it happen - plus ensure that failure semantics are dealt with correctly, and somehow ensure that resource usage isn't adversely affected given you're now doing more work and keeping more intermediate results in memory.

juliusv · 2021-12-17T12:57:06Z

Yeah, making the PromQL computation itself streamable on a per-series basis sounds hard, but the final JSON result could of course be streamed quite easily. Maybe that's still worth it for large payloads in some use cases?

brian-brazil · 2021-12-17T13:37:15Z

Once we have all the result samples, streaming from there makes sense. I think I ended up not doing it when making the json iterator changes as it ended up worse performance wise, but that could have changed.

bwplotka · 2021-12-17T21:59:13Z

Thanks for the input. Looks like we could descope it to just an API and check if it helps with anything (it helps for Thanos sidecar case for sure).

On separate note then...

and you'd have to rewrite a large chunk of PromQL to make it happen - plus ensure that failure semantics are dealt with correctly, and somehow ensure that resource usage isn't adversely affected given you're now doing more work and keeping more intermediate results in memory.

@brian-brazil you made me curious now! So do you think there is room to make PromQL computing as much series by series internally, even if it's not currently? 🤔

brian-brazil · 2021-12-17T23:22:33Z

It's already doing as much as it can as clearly makes sense. Beyond that would require a big redesign, and it's unclear that it'd result in any performance gains. Keeping maximum memory down is tricky.

…

On Fri 17 Dec 2021, 21:59 Bartlomiej Plotka, ***@***.***> wrote: Thanks for the input. Looks like we could descope it to just an API and check if it helps with anything (it helps for Thanos sidecar case for sure). On separate note then... and you'd have to rewrite a large chunk of PromQL to make it happen - plus ensure that failure semantics are dealt with correctly, and somehow ensure that resource usage isn't adversely affected given you're now doing more work and keeping more intermediate results in memory. @brian-brazil <https://github.com/brian-brazil> you made me curious now! So do you think there is room to make PromQL computing as much series by series internally, even if it's not currently? 🤔 — Reply to this email directly, view it on GitHub <#10040 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABWJG5TZIVQ5ROL6K63QZYTUROXDZANCNFSM5KIRQ6YQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

beorn7 · 2024-08-13T11:19:46Z

Hello from the bug scrub!

In the meantime, a lot has happened in the area of "yet another PromQL engine". Thanos has one, Mimir is working on one, but they are all meant for their specific use case and not necessarily to just plug that new engine into a vanilla Prometheus.

Having said that, the topic of creating a new PromQL engine for vanilla Prometheus is still coming up quite often, and with a new engine, a streaming as proposed here might be feasible. We'll keep this issue open as a reminder.

bwplotka added component/api kind/feature labels Dec 17, 2021

This was referenced Dec 17, 2021

Implement query pushdown for a subset of aggregations thanos-io/thanos#4917

Merged

Propagate more hints through PromQL Select #10041

Closed

beorn7 added priority/Pmaybe not-as-easy-as-it-looks labels Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming Result for Query/QueryRange HTTP API #10040

Streaming Result for Query/QueryRange HTTP API #10040

bwplotka commented Dec 17, 2021

brian-brazil commented Dec 17, 2021

juliusv commented Dec 17, 2021

brian-brazil commented Dec 17, 2021

bwplotka commented Dec 17, 2021

brian-brazil commented Dec 17, 2021 via email

beorn7 commented Aug 13, 2024

Streaming Result for Query/QueryRange HTTP API #10040

Streaming Result for Query/QueryRange HTTP API #10040

Comments

bwplotka commented Dec 17, 2021

brian-brazil commented Dec 17, 2021

juliusv commented Dec 17, 2021

brian-brazil commented Dec 17, 2021

bwplotka commented Dec 17, 2021

brian-brazil commented Dec 17, 2021 via email

beorn7 commented Aug 13, 2024