-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming Result for Query/QueryRange HTTP API #10040
Comments
That's not the case. This is true for range vector functions internally, however basically everything else is step by step including the functions you list. |
Yeah, making the PromQL computation itself streamable on a per-series basis sounds hard, but the final JSON result could of course be streamed quite easily. Maybe that's still worth it for large payloads in some use cases? |
Once we have all the result samples, streaming from there makes sense. I think I ended up not doing it when making the json iterator changes as it ended up worse performance wise, but that could have changed. |
Thanks for the input. Looks like we could descope it to just an API and check if it helps with anything (it helps for Thanos sidecar case for sure). On separate note then...
@brian-brazil you made me curious now! So do you think there is room to make PromQL computing as much series by series internally, even if it's not currently? 🤔 |
It's already doing as much as it can as clearly makes sense. Beyond that
would require a big redesign, and it's unclear that it'd result in any
performance gains. Keeping maximum memory down is tricky.
…On Fri 17 Dec 2021, 21:59 Bartlomiej Plotka, ***@***.***> wrote:
Thanks for the input. Looks like we could descope it to just an API and
check if it helps with anything (it helps for Thanos sidecar case for sure).
On separate note then...
and you'd have to rewrite a large chunk of PromQL to make it happen - plus
ensure that failure semantics are dealt with correctly, and somehow ensure
that resource usage isn't adversely affected given you're now doing more
work and keeping more intermediate results in memory.
@brian-brazil <https://github.com/brian-brazil> you made me curious now!
So do you think there is room to make PromQL computing as much series by
series internally, even if it's not currently? 🤔
—
Reply to this email directly, view it on GitHub
<#10040 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABWJG5TZIVQ5ROL6K63QZYTUROXDZANCNFSM5KIRQ6YQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hello from the bug scrub! In the meantime, a lot has happened in the area of "yet another PromQL engine". Thanos has one, Mimir is working on one, but they are all meant for their specific use case and not necessarily to just plug that new engine into a vanilla Prometheus. Having said that, the topic of creating a new PromQL engine for vanilla Prometheus is still coming up quite often, and with a new engine, a streaming as proposed here might be feasible. We'll keep this issue open as a reminder. |
In Thanos we are moving forward with query sharding, parallelization and pushdown ideas 💪🏽 Recently, we pushed a change that is capable of safely pushdown certain aggregations to Prometheus (thanos-io/thanos#4917).
What we do in sidecar now, for certain "safe" functions like
min, max, group, min_over_time, and max_over_time
instead of fetching all series through remote read, we change to HTTP Query API. We build PromQL from select hints we get from root Querier and perform query. Then we transform the Query result to Thanos GRPC StoreAPI.Now, we have efficiency problem with Query API, because it's not streamed (something I was moaning about for long time 🙈). This means we unnecessary wait for full response payload, instead of streaming each resulted series by series like we do with remote read. AFAIK for most cases PromQL performs series by series calculations, so first series is calculated before next one, so that fits server part of this API too.
Now, to solve our use case, there are two options we could do to improve Prometheus for these use cases and actually win in other fields too.
A) Introduce streaming (per series) for query and query range Prometheus HTTP APIs. We could do this exactly as we did with remote read. Don't stream by default and opt-in using header negotiation for streamed JSON using chunked JSON logic we use with profobuf in new remote read. It's not straightforward as our current JSON result with this
data
format was never attempted to be streamed.B) Allow remote-read to perform PromQL using hints. It's not hard to implement, but quite weird as there is strong overlap now between Query + Query range and remote write that can do PromQL. Plus it's easy right now to make mistake and pass wrong hints (aggregations/function that are NOT safe to be pushed down).
I would vote for A. I believe streaming response on query would help with many other use cases (e.g Prometheus -> Grafana interaction and Querier -> Query frontend in Thanos).
If we are ok to pursue A I could try to work with community on proposing exact format of JSON streamed API.
cc @tomwilkie @roidelapluie @juliusv @fpetkovski @brian-brazil @moadz @RichiH
The text was updated successfully, but these errors were encountered: