Integrate OpenSearch Ml-Commons into Data Prepper #5509

Zhangxunmt · 2025-03-07T18:38:03Z

Is your feature request related to a problem? Please describe.
ML Commons is an OpenSearch plugin that manages Machine Learning models to enhance search relevance through semantic understanding. You can deploy models directly within your OpenSearch cluster or connect to externally hosted models.

For neural search, a language model converts text into vector embeddings. During ingestion, OpenSearch generates vector embeddings for text fields in incoming requests. At search time, the same model transforms query text into vector embeddings, enabling vector similarity search. It is crucial to use the same ML model for both ingestion and search to ensure consistency.

To support offline batch ingestion, Data Prepper is proposed as the ingestion engine for transforming text into vector embeddings. This processor will also support streaming mode data transformation.

Describe the solution you'd like
Build a new processor that integrates the ml-commons ML model Predict/batch_predict APIs into the Data Prepper pipelines.

Describe alternatives you've considered (Optional)
The model management and predict/batch_predict API has already been launched in ml-commons. This feature only integrate them into the Data Prepper.

Additional context
#5433

Zhangxunmt added the untriaged label Mar 7, 2025

github-project-automation bot added this to Data Prepper Tracking Board Mar 7, 2025

github-project-automation bot moved this to Unplanned in Data Prepper Tracking Board Mar 7, 2025

Zhangxunmt changed the title ~~New ML processor to interact with Ml-Commons in OpenSearch~~ Integrate OpenSearch Ml-Commons into Data Prepper Mar 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate OpenSearch Ml-Commons into Data Prepper #5509

Integrate OpenSearch Ml-Commons into Data Prepper #5509

Zhangxunmt commented Mar 7, 2025 •

edited

Loading

Integrate OpenSearch Ml-Commons into Data Prepper #5509

Integrate OpenSearch Ml-Commons into Data Prepper #5509

Comments

Zhangxunmt commented Mar 7, 2025 • edited Loading

Zhangxunmt commented Mar 7, 2025 •

edited

Loading