You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched the issues and found no similar issues.
KubeRay Component
ray-operator
What happened + What you expected to happen
Here's the config
apiVersion: ray.io/v1
kind: RayService
metadata:
name: demo-recsys-voyager
spec:
serveService:
metadata:
name: demo-recsys-voyager
annotations:
service.beta.kubernetes.io/aws-load-balancer-name: demo-recsys-voyager
service.beta.kubernetes.io/aws-load-balancer-scheme: internal
spec:
type: LoadBalancer
ports:
- port: 6379
targetPort: 6379
protocol: TCP
name: gcs
- port: 8265
targetPort: 8265
protocol: TCP
name: dashboard
- port: 10001
targetPort: 10001
protocol: TCP
name: client
- port: 8000
targetPort: 8000
protocol: TCP
name: serve
serveConfigV2: |
applications:
- name: demo_recsys_retrieval_voyager_index_service
route_prefix: /v1/vectors
import_path: demo_recsys.indexing.server:demo_recsys_indexing_service_entrypoint
runtime_env:
working_dir: "s3://demo-recsys-use1/serving/releases/v1.2.0.zip"
py_modules: ["s3://demo-recsys-use1/serving/releases/demo_recsys-1.2.0-py3-none-any.whl"]
pip: [
"click", "datasets==3.2.0", "torch==2.5.0", "torchrec==1.0.0", "fbgemm-gpu==1.0.0", "torchmetrics",
"mmh3==5.0.1", "pandas","pydantic==2.10.4", "ray[serve]==2.43.0", "rbloom==1.5.2", "rich",
"safetensors==0.4.5", "psycopg[binary,pool]==3.2.3", "pgvector==0.3.6", "redis>=5.2.1",
"boto3>=1.35.97", "pydantic-settings>=2.7.1", "ujson>=5.10.0", "aiohttp[speedups]>=3.11.11",
"pinecone==5.4.2", "voyager>=2.1.0", "duckdb>=1.2.0",
]
deployments:
- name: demoRetrievalVoyagerIndexServiceHandler
num_replicas: 1
ray_actor_options:
num_cpus: 2
- name: demoRetrievalVoyagerIndexServiceIngress
num_replicas: 1
ray_actor_options:
num_cpus: 2
rayClusterConfig:
rayVersion: '2.43.0' # Should match the Ray version in the image of the containers
enableInTreeAutoscaling: true
autoscalerOptions:
idleTimeoutSeconds: 1
######################headGroupSpecs#################################
# Ray head pod template.
headGroupSpec:
# The `rayStartParams` are used to configure the `ray start` command.
# See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
# See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
rayStartParams: {}
# Pod template
template:
spec:
containers:
- name: ray-head
image: rayproject/ray:2.40.0-py312-cpu
ports:
- containerPort: 6379
name: gcs
- containerPort: 8265
name: dashboard
- containerPort: 10001
name: client
- containerPort: 8000
name: serve
volumeMounts:
- mountPath: /tmp/ray
name: ray-logs
resources:
limits:
cpu: "2"
memory: "8G"
requests:
cpu: "1"
memory: "2G"
volumes:
- name: ray-logs
emptyDir: {}
workerGroupSpecs:
# The pod replicas in this group typed worker
- replicas: 1
minReplicas: 1
maxReplicas: 4
groupName: ray-serve-worker-group
rayStartParams: {}
# Pod template
template:
spec:
initContainers:
- name: test
image: amazon/aws-cli
command:
- sh
- -c
- |
echo "Hello, World!"
containers:
- name: ray-worker
image: rayproject/ray:2.40.0-py312-cpu
resources:
limits:
cpu: 12
memory: "32G"
# nvidia.com/gpu: 1
requests:
cpu: 6
memory: "24G"
volumeMounts:
- mountPath: /tmp
name: tmp-ray
# nvidia.com/gpu: 1
# Please add the following taints to the GPU node.
tolerations:
- key: "ray.io/node-type"
operator: "Equal"
value: "worker"
effect: "NoSchedule"
volumes:
- name: tmp-ray
persistentVolumeClaim:
claimName: efs-pvc
Reproduction script
Here's exception
WARNING 2025-03-04 01:04:33,560 controller 306 -- Deployment 'demoRetrievalVoyagerIndexServiceHandler' in application 'demo_recsys_retrieval_voyager_index_service' has 1 replicas that have taken more than 30s to be scheduled. This may be due to waiting for the cluster to auto-scale or for a runtime environment to be installed. Resources required for each replica: {"CPU": 2.0}, total resources available: {"CPU": 10.0}. Use `ray status` for more details.
WARNING 2025-03-04 01:05:03,580 controller 306 -- Deployment 'demoRetrievalVoyagerIndexServiceHandler' in application 'demo_recsys_retrieval_voyager_index_service' has 1 replicas that have taken more than 30s to be scheduled. This may be due to waiting for the cluster to auto-scale or for a runtime environment to be installed. Resources required for each replica: {"CPU": 2.0}, total resources available: {"CPU": 10.0}. Use `ray status` for more details.
WARNING 2025-03-04 01:05:33,665 controller 306 -- Deployment 'demoRetrievalVoyagerIndexServiceHandler' in application 'demo_recsys_retrieval_voyager_index_service' has 1 replicas that have taken more than 30s to be scheduled. This may be due to waiting for the cluster to auto-scale or for a runtime environment to be installed. Resources required for each replica: {"CPU": 2.0}, total resources available: {"CPU": 10.0}. Use `ray status` for more details.
WARNING 2025-03-04 01:06:03,678 controller 306 -- Deployment 'demoRetrievalVoyagerIndexServiceHandler' in application 'demo_recsys_retrieval_voyager_index_service' has 1 replicas that have taken more than 30s to be scheduled. This may be due to waiting for the cluster to auto-scale or for a runtime environment to be installed. Resources required for each replica: {"CPU": 2.0}, total resources available: {"CPU": 10.0}. Use `ray status` for more details.
WARNING 2025-03-04 01:06:33,747 controller 306 -- Deployment 'demoRetrievalVoyagerIndexServiceHandler' in application 'demo_recsys_retrieval_voyager_index_service' has 1 replicas that have taken more than 30s to be scheduled. This may be due to waiting for the cluster to auto-scale or for a runtime environment to be installed. Resources required for each replica: {"CPU": 2.0}, total resources available: {"CPU": 10.0}. Use `ray status` for more details.
WARNING 2025-03-04 01:07:03,847 controller 306 -- Deployment 'demoRetrievalVoyagerIndexServiceHandler' in application 'demo_recsys_retrieval_voyager_index_service' has 1 replicas that have taken more than 30s to be scheduled. This may be due to waiting for the cluster to auto-scale or for a runtime environment to be installed. Resources required for each replica: {"CPU": 2.0}, total resources available: {"CPU": 10.0}. Use `ray status` for more details.
WARNING 2025-03-04 01:07:33,949 controller 306 -- Deployment 'demoRetrievalVoyagerIndexServiceHandler' in application 'demo_recsys_retrieval_voyager_index_service' has 1 replicas that have taken more than 30s to be scheduled. This may be due to waiting for the cluster to auto-scale or for a runtime environment to be installed. Resources required for each replica: {"CPU": 2.0}, total resources available: {"CPU": 10.0}. Use `ray status` for more details.
WARNING 2025-03-04 01:08:04,031 controller 306 -- Deployment 'demoRetrievalVoyagerIndexServiceHandler' in application 'demo_recsys_retrieval_voyager_index_service' has 1 replicas that have taken more than 30s to be scheduled. This may be due to waiting for the cluster to auto-scale or for a runtime environment to be installed. Resources required for each replica: {"CPU": 2.0}, total resources available: {"CPU": 10.0}. Use `ray status` for more details
Anything else
No response
Are you willing to submit a PR?
Yes I am willing to submit a PR!
The text was updated successfully, but these errors were encountered:
Is it possible for the serve applications spending time on the pip install? Can you try to bake all the dependencies in the image and try again? If the issue still exists, we can take a look.
Search before asking
KubeRay Component
ray-operator
What happened + What you expected to happen
Here's the config
Reproduction script
Here's exception
Anything else
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: