You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe. ML Commons is an OpenSearch plugin that manages Machine Learning models to enhance search relevance through semantic understanding. You can deploy models directly within your OpenSearch cluster or connect to externally hosted models.
For neural search, a language model converts text into vector embeddings. During ingestion, OpenSearch generates vector embeddings for text fields in incoming requests. At search time, the same model transforms query text into vector embeddings, enabling vector similarity search. It is crucial to use the same ML model for both ingestion and search to ensure consistency.
To support offline batch ingestion, Data Prepper is proposed as the ingestion engine for transforming text into vector embeddings. This processor will also support streaming mode data transformation.
Describe the solution you'd like
Build a new processor that integrates the ml-commons ML model Predict/batch_predict APIs into the Data Prepper pipelines.
Describe alternatives you've considered (Optional)
The model management and predict/batch_predict API has already been launched in ml-commons. This feature only integrate them into the Data Prepper.
Is your feature request related to a problem? Please describe.
ML Commons is an OpenSearch plugin that manages Machine Learning models to enhance search relevance through semantic understanding. You can deploy models directly within your OpenSearch cluster or connect to externally hosted models.
For neural search, a language model converts text into vector embeddings. During ingestion, OpenSearch generates vector embeddings for text fields in incoming requests. At search time, the same model transforms query text into vector embeddings, enabling vector similarity search. It is crucial to use the same ML model for both ingestion and search to ensure consistency.
To support offline batch ingestion, Data Prepper is proposed as the ingestion engine for transforming text into vector embeddings. This processor will also support streaming mode data transformation.
Describe the solution you'd like
Build a new processor that integrates the ml-commons ML model Predict/batch_predict APIs into the Data Prepper pipelines.
Describe alternatives you've considered (Optional)
The model management and predict/batch_predict API has already been launched in ml-commons. This feature only integrate them into the Data Prepper.
Additional context
#5433
The text was updated successfully, but these errors were encountered: