Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add oauth-proxy to rawdeployments if odh auth label is present #419

Merged
merged 20 commits into from
Dec 9, 2024

Conversation

VedantMahabaleshwarkar
Copy link

@VedantMahabaleshwarkar VedantMahabaleshwarkar commented Oct 9, 2024

What this PR does / why we need it:

This PR adds the following :

  • If the isvc has label "security.opendatahub.io/enable-auth" = "true"
    -- an oauth proxy container is added to the deployment
    -- http port is replaced by https port in the service
    -- "service.beta.openshift.io/serving-cert-secret-name" is added to the service to allow creation of the tls secret
  • The service account, CRB and Route are created in Add reconciliation for Kserve Raw odh-model-controller#274

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes # https://issues.redhat.com/browse/RHOAIENG-10291, https://issues.redhat.com/browse/RHOAIENG-13444
TEST WITH: opendatahub-io/odh-model-controller#274

Type of changes
Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Feature/Issue validation/testing:

  • Deploy ODH/RHOAI:
  • Install the DSC (kserve spec below) :
    kserve:
      defaultDeploymentMode: RawDeployment
      devFlags:
        manifests:
          - contextDir: config
            sourcePath: overlays/odh
            uri: 'https://github.com/VedantMahabaleshwarkar/kserve/tarball/devflags'
          - contextDir: config
            sourcePath: ''
            uri: 'https://github.com/VedantMahabaleshwarkar/odh-model-controller/tarball/devflags'
      managementState: Managed
      serving:
        ingressGateway:
          certificate:
            type: OpenshiftDefaultIngress
        managementState: Managed
        name: knative-serving
  • To test enabling route for a particular ISVC
    -- Create any kserve isvc+SR
    -- isvc should have annotation : "serving.kserve.io/deploymentMode" : RawDeployment
    -- isvc should have label : "networking.kserve.io/visibility": "enable-route"

  • To test route with auth:
    -- Create any kserve isvc+SR
    -- isvc should have annotation : "serving.kserve.io/deploymentMode" : RawDeployment
    -- isvc should have label: "security.opendatahub.io/enable-auth" = "true"
    -- isvc should have label: "networking.kserve.io/visibility": "enable-route"

  • To test Inference without auth:
    -- remove isvc label "security.opendatahub.io/enable-auth" = "true"

Now inference should work without token

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

Release note:

Added oauth-proxy authentication for Kserve RawDeployment Routes

Re-running failed tests

  • /rerun-all - rerun all failed workflows.
  • /rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

Copy link

openshift-ci bot commented Oct 9, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the approved label Oct 9, 2024
@VedantMahabaleshwarkar
Copy link
Author

tests passing locally, the failures look like test infra failures

/retest-required

@VedantMahabaleshwarkar VedantMahabaleshwarkar force-pushed the j-10306 branch 2 times, most recently from 44ba660 to da8dc3a Compare October 10, 2024 18:43
@VedantMahabaleshwarkar
Copy link
Author

/retest-required

@VedantMahabaleshwarkar
Copy link
Author

not sure what's happening with the tests, they are passing locally :(

@VedantMahabaleshwarkar
Copy link
Author

VedantMahabaleshwarkar commented Oct 17, 2024

the e2e raw test failure is because https://github.com/opendatahub-io/kserve/blob/master/test/scripts/openshift-ci/run-e2e-tests.sh needs to be updated with the new behavior. Imo it can be ignored now can will be fixed later in https://issues.redhat.com/browse/RHOAIENG-14604

// Check if the route is admitted
for _, ingress := range route.Status.Ingress {
for _, condition := range ingress.Conditions {
if condition.Type == "Admitted" && condition.Status == "True" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a constant from netv1 for "Admitted"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a constant as Yuan suggested?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updates on this?

@@ -1264,6 +1270,109 @@ var _ = Describe("v1beta1 inference service controller", func() {
Eventually(func() error { return k8sClient.Get(context.TODO(), predictorHPAKey, actualHPA) }, timeout).
Should(HaveOccurred())
})
It("Should have no ingress created if labeled as cluster-local", func() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, contribute this one to upstream?

}
if val, ok := componentMeta.Labels[constants.ODHKserveRawAuth]; ok && val == "true" {
switch {
case componentExt != nil && componentExt.Batcher != nil:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like:

        - --openshift-service-account=oauth-proxy
        - --client-id=system:serviceaccount:my-namespace:oauth-proxy
        - --client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
Copy link

@israel-hdez israel-hdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite fine. I'll do some testings on my Tuesday,

@@ -457,7 +461,7 @@ func TestCreateDefaultDeployment(t *testing.T) {
ttExpected := getDefaultExpectedDeployment()

// update objectMeta using modify func
got := createRawDeployment(ttArgs.objectMeta, ttArgs.workerObjectMeta, ttArgs.componentExt, tt.modifyArgs(ttArgs).podSpec, tt.modifyArgs(ttArgs).workerPodSpec)
got, _ := createRawDeployment(clientset, ttArgs.objectMeta, ttArgs.workerObjectMeta, ttArgs.componentExt, tt.modifyArgs(ttArgs).podSpec, tt.modifyArgs(ttArgs).workerPodSpec)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is the same.

@@ -760,7 +764,7 @@ func TestCreateDefaultDeployment(t *testing.T) {
ttExpected := getDefaultExpectedDeployment()

// update objectMeta using modify func
got := createRawDeployment(tt.modifyObjectMetaArgs(ttArgs).objectMeta, tt.modifyWorkerObjectMetaArgs(ttArgs).workerObjectMeta, ttArgs.componentExt, tt.modifyPodSpecArgs(ttArgs).podSpec, tt.modifyWorkerPodSpecArgs(ttArgs).workerPodSpec)
got, _ := createRawDeployment(clientset, tt.modifyObjectMetaArgs(ttArgs).objectMeta, tt.modifyWorkerObjectMetaArgs(ttArgs).workerObjectMeta, ttArgs.componentExt, tt.modifyPodSpecArgs(ttArgs).podSpec, tt.modifyWorkerPodSpecArgs(ttArgs).workerPodSpec)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@israel-hdez
Copy link

Manual testing

Since controller functionality is the concern, all tests are using this gist as a reference, which is using a SKLearn runtime with a basic model. Despite the SKLearn runtime is not an ODH-supported one, we still should be able to verify correctness of controller functionality.

Testing is done by deploying kserve-controller from this PR, and also odh-model-controller from PR opendatahub-io/odh-model-controller#274. ODH setup is done with a custom build of odh-operator. The setup is a standard Serverless setup. For the testings, the deploymentMode annotation is used to switch the mode to Raw.

Quick regression testing for Serverless mode

🟢 OK

Follow the mentioned gist as is: https://gist.github.com/israel-hdez/af374562ef9e5b9d80890aa6f0bce20d

Deploy a slim InferenceService in Raw deployment mode

🔴 Fails

Follow the gist, but use these annotations in the InferenceService:

  annotations:
    serving.kserve.io/deploymentMode: "RawDeployment"

🟢 The model seems to deploy fine
🟢 SVC looks OK
🟢 No oauth-proxy container
🟢 No RoleBinding
🟢 No Route
🔴 Bad endpoint is reported

oc get isvc sklearn-v2-iris-test1
NAME                    URL                                                          READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION   AGE
sklearn-v2-iris-test1   https://sklearn-v2-iris-test1-kserve-raw-tests.example.com   True                                                                  10m
# ---
oc get isvc sklearn-v2-iris-test1 -o yaml | yq '{"address": .status.address, "components": .status.components, "url": .status.url}' 
address:
  url: https://sklearn-v2-iris-test1-predictor.kserve-raw-tests.svc.cluster.local # OK
components:
  predictor:
    url: http://sklearn-v2-iris-test1-predictor-kserve-raw-tests.example.com # Wrong
url: https://sklearn-v2-iris-test1-kserve-raw-tests.example.com # Wrong

Deploy a Raw InferenceService with explicit private label

🔴 Fails

Follow the gist, but use these metadata in the InferenceService:

  annotations:
    serving.kserve.io/deploymentMode: "RawDeployment"
  labels:
    networking.kserve.io/visibility: "cluster-local"

🟢 The model seems to deploy fine
🟢 SVC looks OK
🟢 No oauth-proxy container
🟢 No RoleBinding
🟢 No Route
🔴 Bad endpoint is reported

oc get isvc sklearn-v2-iris-test3                                                                                                  
NAME                    URL                                                          READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION   AGE
sklearn-v2-iris-test3   https://sklearn-v2-iris-test3-kserve-raw-tests.example.com   True                                                                  73s
# ---
oc get isvc sklearn-v2-iris-test3 -o yaml | yq '{"address": .status.address, "components": .status.components, "url": .status.url}'
address:
  url: https://sklearn-v2-iris-test3-predictor.kserve-raw-tests.svc.cluster.local # OK
components:
  predictor:
    url: http://sklearn-v2-iris-test3-predictor-kserve-raw-tests.example.com # Wrong
url: https://sklearn-v2-iris-test3-kserve-raw-tests.example.com # Wrong

Deploy a Raw InferenceService with explicit exposed label

🔴 Fails

Follow the gist, but use these metadata in the InferenceService:

  annotations:
    serving.kserve.io/deploymentMode: "RawDeployment"
  labels:
    networking.kserve.io/visibility: "exposed"

🟢 The model seems to deploy fine

🔴 Reconcile error in odh-model-controller:

2024-12-04T21:43:14Z	ERROR	Reconciler error	{"controller": "inferenceservice", "controllerGroup": "serving.kserve.io", "controllerKind": "InferenceService", "InferenceService": {"name":"sklearn-v2-iris-test4","namespace":"kserve-raw-tests"}, "namespace": "kserve-raw-tests", "name": "sklearn-v2-iris-test4", "reconcileID": "cb88aa95-d15f-4c17-bca8-8fc409cfe671", "error": "2 errors occurred:\n\t* clusterrolebindings.rbac.authorization.k8s.io \"kserve-raw-tests-default-auth-delegator\" already exists\n\t* Route.route.openshift.io \"sklearn-v2-iris-test4\" is invalid: spec.port.targetPort: Required value\n\n"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227

🔴 Reconcile error in kserve-controller (surely a consequence of the previous):

{
  "level": "error",
  "ts": "2024-12-04T21:37:26Z",
  "msg": "Reconciler error",
  "controller": "inferenceservice",
  "controllerGroup": "serving.kserve.io",
  "controllerKind": "InferenceService",
  "InferenceService": {
    "name": "sklearn-v2-iris-test4",
    "namespace": "kserve-raw-tests"
  },
  "namespace": "kserve-raw-tests",
  "name": "sklearn-v2-iris-test4",
  "reconcileID": "75d4b916-3760-4e1c-879d-10a54ae7a20a",
  "error": "fails to reconcile ingress: Route.route.openshift.io \"sklearn-v2-iris-test4\" not found",
  "errorVerbose": "Route.route.openshift.io \"sklearn-v2-iris-test4\" not found\nfails to reconcile ingress\ngithub.ghproxy.top/kserve/kserve/pkg/controller/v1beta1/inferenceservice.(*InferenceServiceReconciler).Reconcile\n\t/go/src/github.com/kserve/kserve/pkg/controller/v1beta1/inferenceservice/controller.go:247\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1695",
  "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224"
}

Deploy a slim Raw InferenceService with auth enabled

🔴 Fails

Follow the gist, but use these metadata in the InferenceService:

  annotations:
    serving.kserve.io/deploymentMode: "RawDeployment"
  labels:
    networking.kserve.io/visibility: "exposed"

🔴 The model does not deploy; kserve-controller reconcile error:

{
  "level": "error",
  "ts": "2024-12-04T21:50:14Z",
  "msg": "Reconciler error",
  "controller": "inferenceservice",
  "controllerGroup": "serving.kserve.io",
  "controllerKind": "InferenceService",
  "InferenceService": {
    "name": "sklearn-v2-iris-test5",
    "namespace": "kserve-raw-tests"
  },
  "namespace": "kserve-raw-tests",
  "name": "sklearn-v2-iris-test5",
  "reconcileID": "cdb5c0a4-1c92-47ca-b6cd-a598521c7894",
  "error": "fails to reconcile component: fails to create NewRawKubeReconciler for predictor: invalid character '}' looking for beginning of object key string",
  "errorVerbose": "invalid character '}' looking for beginning of object key string\nfails to create NewRawKubeReconciler for predictor\ngithub.ghproxy.top/kserve/kserve/pkg/controller/v1beta1/inferenceservice/components.(*Predictor).Reconcile\n\t/go/src/github.com/kserve/kserve/pkg/controller/v1beta1/inferenceservice/components/predictor.go:342\ngithub.ghproxy.top/kserve/kserve/pkg/controller/v1beta1/inferenceservice.(*InferenceServiceReconciler).Reconcile\n\t/go/src/github.com/kserve/kserve/pkg/controller/v1beta1/inferenceservice/controller.go:208\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1695\nfails to reconcile component\ngithub.ghproxy.top/kserve/kserve/pkg/controller/v1beta1/inferenceservice.(*InferenceServiceReconciler).Reconcile\n\t/go/src/github.com/kserve/kserve/pkg/controller/v1beta1/inferenceservice/controller.go:216\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1695",
  "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224"
}

Deploy a Raw InferenceService with auth enabled and explicit private label

🔴 Fails

Follow the gist, but use these metadata in the InferenceService:

  annotations:
    serving.kserve.io/deploymentMode: "RawDeployment"
  labels:
    networking.kserve.io/visibility: "cluster-local"
    security.opendatahub.io/enable-auth: "true"

🔴 The model does not deploy; kserve-controller reconcile error:

{
  "level": "error",
  "ts": "2024-12-04T22:00:43Z",
  "msg": "Reconciler error",
  "controller": "inferenceservice",
  "controllerGroup": "serving.kserve.io",
  "controllerKind": "InferenceService",
  "InferenceService": {
    "name": "sklearn-v2-iris-test6",
    "namespace": "kserve-raw-tests"
  },
  "namespace": "kserve-raw-tests",
  "name": "sklearn-v2-iris-test6",
  "reconcileID": "6f55c5df-0843-4cea-9e5a-23fbb715445a",
  "error": "fails to reconcile component: fails to create NewRawKubeReconciler for predictor: invalid character '}' looking for beginning of object key string",
  "errorVerbose": "invalid character '}' looking for beginning of object key string\nfails to create NewRawKubeReconciler for predictor\ngithub.ghproxy.top/kserve/kserve/pkg/controller/v1beta1/inferenceservice/components.(*Predictor).Reconcile\n\t/go/src/github.com/kserve/kserve/pkg/controller/v1beta1/inferenceservice/components/predictor.go:342\ngithub.ghproxy.top/kserve/kserve/pkg/controller/v1beta1/inferenceservice.(*InferenceServiceReconciler).Reconcile\n\t/go/src/github.com/kserve/kserve/pkg/controller/v1beta1/inferenceservice/controller.go:208\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1695\nfails to reconcile component\ngithub.ghproxy.top/kserve/kserve/pkg/controller/v1beta1/inferenceservice.(*InferenceServiceReconciler).Reconcile\n\t/go/src/github.com/kserve/kserve/pkg/controller/v1beta1/inferenceservice/controller.go:216\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1695",
  "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224"
}

Deploy a Raw InferenceService with auth enabled and route enabled

🔴 Fails

Follow the gist, but use these metadata in the InferenceService:

  annotations:
    serving.kserve.io/deploymentMode: "RawDeployment"
  labels:
    networking.kserve.io/visibility: "exposed"
    security.opendatahub.io/enable-auth: "true"

🔴 The model does not deploy; kserve-controller reconcile error:

{
  "level": "error",
  "ts": "2024-12-04T22:03:05Z",
  "msg": "Reconciler error",
  "controller": "inferenceservice",
  "controllerGroup": "serving.kserve.io",
  "controllerKind": "InferenceService",
  "InferenceService": {
    "name": "sklearn-v2-iris-test7",
    "namespace": "kserve-raw-tests"
  },
  "namespace": "kserve-raw-tests",
  "name": "sklearn-v2-iris-test7",
  "reconcileID": "cf46d9e9-ee52-48dc-a93c-47a17e023244",
  "error": "fails to reconcile component: fails to create NewRawKubeReconciler for predictor: invalid character '}' looking for beginning of object key string",
  "errorVerbose": "invalid character '}' looking for beginning of object key string\nfails to create NewRawKubeReconciler for predictor\ngithub.ghproxy.top/kserve/kserve/pkg/controller/v1beta1/inferenceservice/components.(*Predictor).Reconcile\n\t/go/src/github.com/kserve/kserve/pkg/controller/v1beta1/inferenceservice/components/predictor.go:342\ngithub.ghproxy.top/kserve/kserve/pkg/controller/v1beta1/inferenceservice.(*InferenceServiceReconciler).Reconcile\n\t/go/src/github.com/kserve/kserve/pkg/controller/v1beta1/inferenceservice/controller.go:208\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1695\nfails to reconcile component\ngithub.ghproxy.top/kserve/kserve/pkg/controller/v1beta1/inferenceservice.(*InferenceServiceReconciler).Reconcile\n\t/go/src/github.com/kserve/kserve/pkg/controller/v1beta1/inferenceservice/controller.go:216\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1695",
  "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224"
}

Other notes

  • I just realized we are using a ClusterRoleBinding in odh-model-controller. I greatly suggest to use a namespaced RoleBinding.

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
@israel-hdez
Copy link

/test images

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
Copy link

openshift-ci bot commented Dec 6, 2024

@VedantMahabaleshwarkar: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-fast 96f4b7d link true /test e2e-fast
ci/prow/e2e-slow 96f4b7d link true /test e2e-slow
ci/prow/e2e-raw 96f4b7d link true /test e2e-raw

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@israel-hdez
Copy link

Manual testing - round 2

(See previous test round for background)

Quick regression testing for Serverless mode

🟢 OK

Follow the mentioned gist as is: https://gist.github.com/israel-hdez/af374562ef9e5b9d80890aa6f0bce20d

Deploy a slim InferenceService in Raw deployment mode

🟡 OK, needs follow-up PR for fixes

Follow the gist, but use these annotations in the InferenceService:

  annotations:
    serving.kserve.io/deploymentMode: "RawDeployment"

🟢 The model seems to deploys fine
🟡 Inference works (see the reason of the yellow at the end)
🟢 SVC looks OK
🟢 No oauth-proxy container
🟡 ClusterRoleBinding created... I though this would be created only when needed.
🟢 No Route
🟡 The Status is not right. But OK for now.

oc get isvc sklearn-v2-iris-test1 -o yaml | yq '{"address": .status.address, "components": .status.components, "url": .status.url}'
address:
  url: https://sklearn-v2-iris-test1-predictor.kserve-raw-tests.svc.cluster.local # Schema should be plain-text HTTP
components:
  predictor:
    url: http://sklearn-v2-iris-test1-predictor-kserve-raw-tests.example.com # Wrong
url: http://sklearn-v2-iris-test1-predictor.kserve-raw-tests.svc.cluster.local # OK

Deploy a Raw InferenceService with explicit private label

🟡 OK, needs follow-up PR for fixes

Follow the gist, but use these metadata in the InferenceService:

  annotations:
    serving.kserve.io/deploymentMode: "RawDeployment"
  labels:
    networking.kserve.io/visibility: "cluster-local"

🟢 The model seems to deploys fine
🟡 Inference works (see the reason of the yellow at the end)
🟢 SVC looks OK
🟢 No oauth-proxy container
🟡 ClusterRoleBinding created... I though this would be created only when needed.
🟢 No Route
🟡 The Status is not right. But OK for now.

oc get isvc sklearn-v2-iris-test2 -o yaml | yq '{"address": .status.address, "components": .status.components, "url": .status.url}'
address:
  url: https://sklearn-v2-iris-test2-predictor.kserve-raw-tests.svc.cluster.local # Schema should be HTTP
components:
  predictor:
    url: http://sklearn-v2-iris-test2-predictor-kserve-raw-tests.example.com # Wrong
url: http://sklearn-v2-iris-test2-predictor.kserve-raw-tests.svc.cluster.local # OK

Deploy a Raw InferenceService with explicit exposed label

🔴 Fails

Follow the gist, but use these metadata in the InferenceService:

  annotations:
    serving.kserve.io/deploymentMode: "RawDeployment"
  labels:
    networking.kserve.io/visibility: "exposed"

🟢 The model seems to deploys fine
🔴 Inference doesn't work. Route returns application not available error page.
🟢 SVC looks OK
🟢 No oauth-proxy container
🟡 ClusterRoleBinding created...
🔴 Route created; it doesn't work
🟡 The Status is not right. But OK for now.

oc get isvc sklearn-v2-iris-test3 -o yaml | yq '{"address": .status.address, "components": .status.components, "url": .status.url}'
address:
  url: https://sklearn-v2-iris-test3-predictor.kserve-raw-tests.svc.cluster.local # Schema should be HTTP
components:
  predictor:
    url: http://sklearn-v2-iris-test3-predictor-kserve-raw-tests.example.com # Wrong
url: https://sklearn-v2-iris-test3-kserve-raw-tests.apps-crc.testing # OK

Deploy a slim Raw InferenceService with auth enabled

🟡 OK, needs follow-up PR for fixes

Follow the gist, but use these metadata in the InferenceService:

  annotations:
    serving.kserve.io/deploymentMode: "RawDeployment"
  labels:
    security.opendatahub.io/enable-auth: "true"

🟢 The model deploys fine
🟢 Auth works
🟢 Inference works
🟢 SVC looks OK
🟢 oauth-proxy container is present
🟢 ClusterRoleBinding created...
🟢 Route is absent
🟡 The Status is not right. But OK for now.

oc get isvc sklearn-v2-iris-test4 -o yaml | yq '{"address": .status.address, "components": .status.components, "url": .status.url}'
address:
  url: https://sklearn-v2-iris-test4-predictor.kserve-raw-tests.svc.cluster.local:8443 # OK
components:
  predictor:
    url: http://sklearn-v2-iris-test4-predictor-kserve-raw-tests.example.com # Wrong
url: http://sklearn-v2-iris-test4-predictor.kserve-raw-tests.svc.cluster.local:8443 # Schema should be HTTPS.

Deploy a Raw InferenceService with auth enabled and explicit private label

🟡 OK, needs follow-up PR for fixes

Follow the gist, but use these metadata in the InferenceService:

  annotations:
    serving.kserve.io/deploymentMode: "RawDeployment"
  labels:
    networking.kserve.io/visibility: "cluster-local"
    security.opendatahub.io/enable-auth: "true"

🟡 Same results as previous.

Deploy a Raw InferenceService with auth enabled and route enabled

🟡 OK, needs follow-up PR for fixes

Follow the gist, but use these metadata in the InferenceService:

  annotations:
    serving.kserve.io/deploymentMode: "RawDeployment"
  labels:
    networking.kserve.io/visibility: "exposed"
    security.opendatahub.io/enable-auth: "true"

🟢 The model deploys fine
🟢 Auth works
🟢 Inference works
🟢 SVC looks OK
🟢 oauth-proxy container is present
🟢 ClusterRoleBinding created...
🟢 Route is present
🟡 The Status is not right. But OK for now.

oc get isvc sklearn-v2-iris-test6 -o yaml | yq '{"address": .status.address, "components": .status.components, "url": .status.url}'
address:
  url: https://sklearn-v2-iris-test6-predictor.kserve-raw-tests.svc.cluster.local:8443 # OK
components:
  predictor:
    url: http://sklearn-v2-iris-test6-predictor-kserve-raw-tests.example.com # Wrong
url: https://sklearn-v2-iris-test6-kserve-raw-tests.apps-crc.testing # OK

Other notes

Copy link

@israel-hdez israel-hdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Approving, but needs follow-up work.

Copy link

openshift-ci bot commented Dec 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: israel-hdez, VedantMahabaleshwarkar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [VedantMahabaleshwarkar,israel-hdez]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@israel-hdez israel-hdez merged commit d987799 into opendatahub-io:master Dec 9, 2024
24 of 28 checks passed
VedantMahabaleshwarkar added a commit to VedantMahabaleshwarkar/kserve that referenced this pull request Jan 16, 2025
…atahub-io#419)

* add oauth-proxy to rawdeployments if odh auth label is present
* remove ingress modifications
* bug fix
* consume oauth proxy params from configmap
* fix oauth proxy sar and minor bugs
* revert some unneeded changes
* add oauth proxy flag to prevent login page redirect on invalid request
* address feedback
* update to newer oauth proxy image
* minor fix
* fix unit test
* more feedback
* cookie secret
* test and other fixes
* fix lint issues
* address latest feedback
* missed import sort
* address more feedback
* bug fix
* fix lint error

(cherry picked from commit d987799)
Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
openshift-merge-bot bot pushed a commit that referenced this pull request Jan 16, 2025
* add oauth-proxy to rawdeployments if odh auth label is present (#419)

* add oauth-proxy to rawdeployments if odh auth label is present
* remove ingress modifications
* bug fix
* consume oauth proxy params from configmap
* fix oauth proxy sar and minor bugs
* revert some unneeded changes
* add oauth proxy flag to prevent login page redirect on invalid request
* address feedback
* update to newer oauth proxy image
* minor fix
* fix unit test
* more feedback
* cookie secret
* test and other fixes
* fix lint issues
* address latest feedback
* missed import sort
* address more feedback
* bug fix
* fix lint error

(cherry picked from commit d987799)
Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

* introduce service configuration at configmap level (kserve#3672)

(cherry picked from commit 23c0396)
Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

* [RHOAIENG-17229] - Routing and Headless Service Support in KServe Raw Mode Deployment

chore:	Follow up: remove the hardcoded clsuterIP setting and add the service
	configuration.

Signed-off-by: Spolti <[email protected]>
(cherry picked from commit 33b1600)

* [RHOAIENG-16851] - Rawdeployment bug fixes (#462)

* [RHOAIENG-16851] fix scheme bugs in status.url and status.address.url for rawdeployment

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

* [RHOAIENG-16851] Remove component url temporarily

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

* [RHOAIENG-16851] Use transformer spec to set upstream port in oauth-proxy if a transformer-container is present

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

* [RHOAIENG-16851] address feedback

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

---------

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
(cherry picked from commit 13b5166)

* go.mod fixes

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

---------

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
Co-authored-by: Filippe Spolti <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants