Post-scripts for Continuous Dataframes #43061

andrewvc · 2019-06-10T17:58:13Z

TL;DR It would be useful to be able to specify a painless script that can transform docs before indexing when using dataframes

Continuous data frames are really powerful for a number of non-ML use cases. They're great
for turning time series data into entity centric data. We plan on using this feature to build a snapshot of what things look like 'now' for uptime.

One limitation of data frames is that the documents that are created reflect the JSON structure of the aggregations used. This is pretty ugly in our use case. What we've wound up doing is writing everything in a big scripted metric aggregation so we can control the exact naming of fields etc. We want these field names to be friendly for all users of the index, and to be ECS compatible as well.

If dataframes supported a post-script option, we could potentially use more traditional aggregations then simply clean up the output with a simpler painless script.

Additionally, even with a scripted_metric aggregation, one may only manipulate results appearing under the scripted metric's namespace. This post-script option would let us put fields exactly where we want them.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-06-10T17:58:15Z

Pinging @elastic/ml-core

benwtrent · 2019-06-11T17:39:03Z

@andrewvc this seems like something a pipeline would do right? It seems to me that since we are bulk indexing documents, we should be able to push them through a user-defined pipeline at index time.

What do you think?

andrewvc · 2019-06-11T17:47:49Z

@benwtrent yes! That's a good idea. That includes painless as an option as well, but I wonder if we could even get away with zero painless with this.

andrewvc added >enhancement :ml/Transform Transform labels Jun 10, 2019

benwtrent mentioned this issue Jun 11, 2019

[ML][Data Frame] adds new pipeline field to dest config #43124

Merged

benwtrent closed this as completed in #43124 Jun 19, 2019

Mpdreamz mentioned this issue Aug 7, 2019

[meta] 7.3 Release elastic/elasticsearch-net#4001

Closed

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Post-scripts for Continuous Dataframes #43061

Post-scripts for Continuous Dataframes #43061

andrewvc commented Jun 10, 2019 •

edited

Loading

elasticmachine commented Jun 10, 2019

benwtrent commented Jun 11, 2019

andrewvc commented Jun 11, 2019

Post-scripts for Continuous Dataframes #43061

Post-scripts for Continuous Dataframes #43061

Comments

andrewvc commented Jun 10, 2019 • edited Loading

elasticmachine commented Jun 10, 2019

benwtrent commented Jun 11, 2019

andrewvc commented Jun 11, 2019

andrewvc commented Jun 10, 2019 •

edited

Loading