-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add minimum_interval setting to auto_date_histogram aggregation #41757
Comments
Pinging @elastic/es-analytics-geo |
Thanks for opening this, I'll work on it this week. It's definitely something that would be beneficial for a lot of use cases. |
@polyfractal @colings86 my assumption here is we want to support same class of intervals as Date Histogram Aggregation, fixed intervals as well as calendar. do you have a different opinion? |
Hmm that's an interesting question. What does the auto date histo do right now for intervals since the user only specifies number of buckets? Calendar? I think at a minimum the |
believe we use a fixed calculation for roundings. |
Ah, interesting. It probably makes sense to start with fixed intervals then, and save adding calendar support for a different enhancement. It does make the API a bit tricky. Do we say Another interesting bit is that the fixed time parsing used by date_histo and others only parses up to |
hmmm. it feels a bit weird to have So my vote is that we make it clear in the docs that WDYT? |
The auto date histogram builder uses only calendar intervals (see https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregationBuilder.java#L72 where we create the roundings). We can only use calendar intervals because merging lower roundings into higher ones would not work well with fixed intervals. This is because the length of a day and moreover the length of a month cannot be assumed regular so if we tried to merge hours into days or days into months we would end up with accumulated error which would make the error in the final result too high. I agree that an |
I also think we should make the
|
makes sense @colings86, thanks for clarifying. I was wrong about the fixed calcs, it makes sense now, given it takes a timezone. |
In TSVB, I introduced the concept of a "minimum auto interval" because metrics usually have a minimum resolution that works. For example, if a user installs Metricbeat and modifies collection interval to be 1 minute, any time they request data that returns buckets smaller than
1m
they will have gaps in their data. Once that happens most pipeline aggregations stop working as well, especially the derivative pipeline aggregation which is very common with counters like network traffic. This scenario is what usually drives users to start requesting interpolation between the points.The concept of
>=1m
interval data math was born in TSVB to address the issue above. Adding support for this was trivial because you just calculate the interval and if it's less the the minimum, the minimum is returned instead.I would rather use
auto_date_histogram
for most of the metrics UI's I develop but without support for a minimum interval, it's a non-starter.The text was updated successfully, but these errors were encountered: