Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post-scripts for Continuous Dataframes #43061

Closed
andrewvc opened this issue Jun 10, 2019 · 3 comments · Fixed by #43124
Closed

Post-scripts for Continuous Dataframes #43061

andrewvc opened this issue Jun 10, 2019 · 3 comments · Fixed by #43124

Comments

@andrewvc
Copy link
Contributor

andrewvc commented Jun 10, 2019

TL;DR It would be useful to be able to specify a painless script that can transform docs before indexing when using dataframes

Continuous data frames are really powerful for a number of non-ML use cases. They're great
for turning time series data into entity centric data. We plan on using this feature to build a snapshot of what things look like 'now' for uptime.

One limitation of data frames is that the documents that are created reflect the JSON structure of the aggregations used. This is pretty ugly in our use case. What we've wound up doing is writing everything in a big scripted metric aggregation so we can control the exact naming of fields etc. We want these field names to be friendly for all users of the index, and to be ECS compatible as well.

If dataframes supported a post-script option, we could potentially use more traditional aggregations then simply clean up the output with a simpler painless script.

Additionally, even with a scripted_metric aggregation, one may only manipulate results appearing under the scripted metric's namespace. This post-script option would let us put fields exactly where we want them.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@benwtrent
Copy link
Member

@andrewvc this seems like something a pipeline would do right? It seems to me that since we are bulk indexing documents, we should be able to push them through a user-defined pipeline at index time.

What do you think?

@andrewvc
Copy link
Contributor Author

@benwtrent yes! That's a good idea. That includes painless as an option as well, but I wonder if we could even get away with zero painless with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants