-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Asynchronous Offline Batch Inference and Ingestion to OpenSearch #2891
Comments
@Zhangxunmt, I think it's important for us to continue supporting Ingest API compatibility. It's been in use since 2016, and we have major feature sets like neural search that are dependent on it. Today, users can use the Ingest API to define an ingest pipeline. PUT /_ingest/pipeline/my_pipeline They can run a bulk streaming ingestion job like my-index/_bulk?pipeline=my-pipeline . The user should be able to execute this pipeline in batch mode using a command like my-index/_batch?pipeline=my-pipeline source="s3://..." temp_stage="s3://..." (optionally override default) Batch processing support for each ingestion processor can be incrementally added. |
As @dylan-tong-aws mentioned, should the api support the final ingestion of embedding as well? When embedding is generated, OpenSearch should read it and ingest it into an index rather than asking client to do it. |
The current API supports that final ingestion, but it has to be triggered by CX. The integration with Data Prepper will make these async steps automated including the final ingestion. |
It would be nice to have a feature which ingest final document without user to trigger ingest API again. Also, there might be many users who does not use Data Prepper. |
Agree. Do you think using flow framework/fractal is a good approach to automate all these? That was the initial plan actually. |
With polling mechanism to be introduced, #3218, I think when it checked that task is completed from inference model, it can also read the output file and ingest the documents as well? |
That's right theoretically, but the ingestion needs inputs like field map, ingest index name, and likely data transformation needed before ingestion. Data Prepper can do that using their existing pipelines but we don't natively support all those. It's possible that we can do the most basic ingestion through the polling job, but that wouldn't be a complete solution. |
I noticed there are two separate APIs. Why not combine them? If users provide all the parameters in Additionally, one feature that could be useful is allowing users to specify a source index/field instead of a file path. This way, the system could collect the data, create the file, perform batch inference, and ingest the results into another index. What are your thoughts? |
The initial plan was to use the Fractal Flow Framework to stitch these two APIs together. With the current polling jobs in ML Commons, it might be worth exploring the option to merge them. However, there are two key challenges:
For the reindexing use case, I think it is a valid use case. Since the Reindex API already exists, it would be a strong proposal to integrate it with ML Commons' batch_predict functionality. |
The reason for requesting a combined API is that I can't envision a scenario where a user would perform offline batch processing without ultimately ingesting the data into OpenSearch. In other words, the offline batch API seems to lack standalone value without the
Isn't POST /_plugins/_ml/_batch_ingestion a newly introduced API in ML Common for ingestion? Additionally, the geospatial plugin also has an API for ingesting GeoJSON data: opensearch-project/geospatial#47
I believe this lack of flexibility exists whether we combine them or not, right? Apologies for the late feedback and comments, especially since most of the feature has already been implemented. I just wanted to share some thoughts on potential improvements for this feature in the future. |
Catch All Triage - 1, 2, 3 |
Problem Statement
Nowadays remote model servers like AWS SageMaker, BedRock, or OpenAI, Cohere, etc all support batch predict APIs, which allow users to send large amount of synchronous requests in a file like S3 and produce the results asynchronously into a file as the output. Different platform use different terminology, e.g. SageMaker uses "batch transform" and bedrock uses "batch inference", but they are all the same in the sense of accepting the requests in the batch mode and process them asynchronously. While some uses require to send synchronous requests, but there are many cases where requests do not need an immediate response or certain rate limits prevent you from executing a large number of queries quickly. For example, in this case, the customer experiences throttling issues by the remote model and the data ingestion cannot finish.
In this RFC, "batch inference" is used as the terminology to represent the "batch" operations in all remote servers. The benefits of utilizing batch inference can be summaries but not limited to 1) Better cost efficiency: 50% cost discount compared to synchronous APIs (number may vary for different servers) and 2) Higher rate limits compared to the synchronous APIs.
For OpenSearch users, the most common use case of batch inference is to ingest embedding data into the K-NN index for vector search. However, the entire Ingest API in OpenSearch was designed for stream data processing. So all of the data processing capabilities supported by our ingestion processors aren’t optimized for batch ingestion and they do not accept files as inputs for ingestion. The typical data ingestion for ML use cases are presented in the following diagram.

This RFC focus on the these two improvements:
Proposed Solution
Speed up a batch transform job through SageMaker
Phase 1 : Add new “batch_predict” action type in AI Connector framework (released in OpenSearch 2.16)
Add a new action type “batch_predict” in the connector blueprint. This action type is to run batch prediction using the connector in ML-Commons. An example of the connector for sageMaker is given below. Different model frameworks have different configuration of the connector, and we will define multiple ConnectorExecutors to easily extend batch prediction for other new AI models.
After the connector new action type is added, add a new API for batch inference jobs. To be consistent with the current “Predict” API in ML-Commons, the new API is named batch-prediction-job. This API maps to the “create_transform_job” API in SageMaker and “model-invocation-job” API in Bedrock. Example request of the new batch inference API.
Phase 2 : Add offline batch inference task management and new batch ingestion API.
Add the task management for the batch transform jobs which maps to the “list-prediction-jobs”, “describe-prediction-job” and “ cancel-prediction-job” etc APIs from remote model servers.
Add a new API for offline batch ingestion jobs. This API reads the batch inference results in the phase 1 and other input files, and bulk ingest the vectors into KNN index. This API reads the vector data from the S3 file, and organize the bulk ingestion request based on the field mapping. We will call the bulk ingestion to the KNN index asynchronously so a task id will be returned.
Phase 3: Integrate the offline batch inference and ingestion engine in the data prepper
Data Prepper is a server-side data collector capable of filtering, enriching, transforming, normalizing, and aggregating data for downstream analytics and visualization. Data Prepper lets users build custom pipelines to improve the operational view of applications. We can integrate the AI connector in the data prepper for customer to setup custom pipelines to run offline batch ingestion.
A sample data Prepper config is presented below. Integration with data prepper can be implemented after phase 1 and 2. That usually means users have the option to use data Prepper to setup offline batch ingestion without directly calling any Plugins.
In Scope
Out of Scope
Real Case Example using a public ML Model Service
Given an input file for batch inference using the OpenAI embedding model "text-embedding-ada-002":
Use the connector below to kick off the batch inference through OpenAI batch API
Once you receive the task_id, you can check the task through the Task API to check the remote job status in OpenAI.
After the job status is showing "completed", you can double check your output in the file id "file-Wux0Pk80dhkxi98Z5iKNjB4n" in this example. It's a file in the jsonl format that each line in the file represents a single inference result for a request in the input file.
The output file content is:
To ingest your embedding data and your other fields into OpenSearch, you can use the batch ingestion API given below. The field map is the actual fields you want to ingest in your KNN cluster. In the field map, the key is the field name, and the value is the JsonPath to find your data in the sources files. For example, source[1].$.body.input[1] means you want to use the JsonPath $.body.input[1] to fetch the element body.input[1] in the second file in the "source" array of the batch request.
Request for Comments
This Offline batch engine is implemented in the machine learning space. It's designed to incorporate with the remote Model Batch APIs to facilitate the batch inference of the LLM models and ingest ML model output data into the OpenSearch KNN index for vector search. However, the batch ingestion API can also be used for general ingestion purpose when the inputs are in a file system. So please not limit your thoughts only in the machine learning when you review this feature.
Please leave your suggestions and concerns in this RFC and your valuable inputs are appreciated.
The text was updated successfully, but these errors were encountered: