Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement authorization for Raw InferenceGraphs #499

Conversation

israel-hdez
Copy link

@israel-hdez israel-hdez commented Feb 18, 2025

What this PR does / why we need it:

Authorization is implemented by using the TokenReview and the SubjectAccessReview Kubernetes APIs. A Middleware function is setup when some arguments are specified that trigger plugging-in the middleware func.

Some additional reconciliation is added toInferenceGraph controller to:

  • Switch to a different ServiceAccount so that privileges for using the cluster APIs are granted.
  • Creating the needed ServiceAccount for the auth-protected InferenceGraph to run.
  • Managing a ClusterRoleBinding to give the required privileges for auth verification.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes https://issues.redhat.com/browse/RHOAIENG-17832

Type of changes
Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature/Issue validation/testing:

  • Create an auth-enabled InfrerenceGraph. Notice the logs of the IG pod mention that auth is enabled.
  • Try to send a non-authenticated request to the IG. Notice the request is rejected.
  • Try to send an authenticated request but without enough privileges. Notice the request is rejected.
  • Try to send an authenticated request with enough privileges. Notice the request is accepted.

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

Copy link

openshift-ci bot commented Feb 18, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@israel-hdez israel-hdez force-pushed the j17832-raw-ig-auth branch 6 times, most recently from cb3c634 to 9068707 Compare February 20, 2025 00:35
@israel-hdez israel-hdez marked this pull request as ready for review February 20, 2025 00:39
Copy link

@VedantMahabaleshwarkar VedantMahabaleshwarkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some comments

}

// Bind the required privileges to the Service Account
err = addAuthPrivilegesToGraphServiceAccount(ctx, clientset, graph)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
err = addAuthPrivilegesToGraphServiceAccount(ctx, clientset, graph)
err = addAuthPrivilegesToGraphServiceAccount(ctx, clientset, graph, saName)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, do you prefer sending it as an argument rather than re-getting it inside the func?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, would avoid having to re-get. but it's only a nit :)

return err
}
} else {
err := removeAuthPrivilegesFromGraphServiceAccount(ctx, clientset, graph)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
err := removeAuthPrivilegesFromGraphServiceAccount(ctx, clientset, graph)
err := removeAuthPrivilegesFromGraphServiceAccount(ctx, clientset, graph, saName)

return err
}

err = deleteGraphServiceAccount(ctx, clientset, graph)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
err = deleteGraphServiceAccount(ctx, clientset, graph)
err = deleteGraphServiceAccount(ctx, clientset, graph, saName)

if err != nil {
return ctrl.Result{}, errors.Wrapf(err, "fails to reconcile resources for auth verification")
}

// Create inference graph resources such as deployment, service, hpa in raw deployment mode
deployment, url, err := handleInferenceGraphRawDeployment(r.Client, r.Clientset, r.Scheme, graph, routerConfig)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the controller flow from here, handleInferenceGraphRawDeployment -> NewRawKubeReconciler -> NewDeploymentReconciler -> createRawDeploymentODH where we have

if val, ok := componentMeta.Labels[constants.ODHKserveRawAuth]; ok && val == "true" {
			err := addOauthContainerToDeployment(clientset, deployment, componentMeta, componentExt, podSpec)
			if err != nil {
				return nil, err
			}
		}

Does this mean that if the graph has the ODHKserveRawAuth label, the graph deployment will get the oauth-proxy container. Not sure if I'm missing something 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my testings it didn't get oauth-proxy. It could be because of RHOAIENG-20326 which we discussed today.

Most likely, when RHOAIENG-20326 is fixed, the IG will get the proxy. If that's the case, I'll appreciate you create a follow-up ticket for fixing it (although you'd be kind if you fix it as part of RHOAIENG-20326 😄 ).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this feels like more in scope with this PR whereas https://issues.redhat.com/browse/RHOAIENG-20326 should IMO be limited to making the switch from label to annotation

Copy link
Author

@israel-hdez israel-hdez Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not fit here, nor in RHOAIENG-20326. With the current code, there is nothing to fix in this PR. This is my motivation for that to be a follow-up Jira.

By doing any fix here, I'd need to ensure no regression, and also, I'd need to implement RHOAIENG-20326, just to ensure that the fix will actually work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// checkRequestIsAuthorized verifies that the user in the provided tokenReviewResult has privileges to query the
// Kubernetes API and get the InferenceGraph resource that belongs to this pod. If so, the request is considered
// as allowed and `true` is returned. Otherwise, the HTTP response is sent rejecting the request and setting
// a meaningful status code along with a reason (if available).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question:
What happens if the token has get privileges for the IG but not 1 or more of the ISVCs in the IG? Should we be verifying that the token has the correct privileges for the IG + all the ISVCs?

Copy link
Author

@israel-hdez israel-hdez Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The token needs to have privileges for both the IG and the ISVC. The IG is doing only its check.

Later, the token is forwarded to the ISVC, so that it also can do its own check. This, effectively, delegates such check to the ISVC.

This may not be optimal, if the auth-protected ISVC is the last one on the IG and the previous ones are not protected. The request would fail after wasting resources. But I think current implementation is good enough, given we are not sure how users will use InferenceGraph. So, I'd say that we should optimize once we are sure.

@@ -173,9 +178,53 @@ func (r *InferenceGraphReconciler) Reconcile(ctx context.Context, req ctrl.Reque
return reconcile.Result{}, errors.Wrapf(err, "fails to create DeployConfig")
}

// name of our custom finalizer
finalizerName := "inferencegraph.finalizers"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move it to the constants section.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is moved to a constant.

@spolti
Copy link
Member

spolti commented Mar 6, 2025

@Jooho do you want to take a look before merging?

Copy link

@Jooho Jooho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

I just leave something not directly related to this PR :) so I can defer to Vedant because his comments are not solved yet.

@israel-hdez
Copy link
Author

/retest

Copy link

@VedantMahabaleshwarkar VedantMahabaleshwarkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Copy link

openshift-ci bot commented Mar 13, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: israel-hdez, VedantMahabaleshwarkar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [VedantMahabaleshwarkar,israel-hdez]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Jooho
Copy link

Jooho commented Mar 13, 2025

/lgtm

Copy link

openshift-ci bot commented Mar 13, 2025

@israel-hdez: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-raw 9068707 link true /test e2e-raw
ci/prow/e2e-graph 9068707 link true /test e2e-graph

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Authorization is implemented by using the TokenReview and the SubjectAccessReview Kubernetes APIs. A Middleware function is setup when some arguments are specified that trigger plugging-in the middleware func.

Some additional reconciliation is added toInferenceGraph controller to:
* Switch to a different ServiceAccount so that privileges for using the cluster APIs are granted.
* Creating the needed ServiceAccount for the auth-protected InferenceGraph to run.
* Managing a ClusterRoleBinding to give the required privileges for auth verification.

Signed-off-by: Edgar Hernández <[email protected]>
Fix comment
Signed-off-by: Edgar Hernández <[email protected]>
@Jooho
Copy link

Jooho commented Mar 14, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Mar 14, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit b2b599f into opendatahub-io:release-v0.14 Mar 14, 2025
27 checks passed
israel-hdez added a commit to israel-hdez/kserve that referenced this pull request Mar 14, 2025
* Implement authorization for Raw InferenceGraphs

Authorization is implemented by using the TokenReview and the SubjectAccessReview Kubernetes APIs. A Middleware function is setup when some arguments are specified that trigger plugging-in the middleware func.

Some additional reconciliation is added toInferenceGraph controller to:
* Switch to a different ServiceAccount so that privileges for using the cluster APIs are granted.
* Creating the needed ServiceAccount for the auth-protected InferenceGraph to run.
* Managing a ClusterRoleBinding to give the required privileges for auth verification.

Signed-off-by: Edgar Hernández <[email protected]>

* Feedback: Jooho

Fix comment

* Fix unit test

Signed-off-by: Edgar Hernández <[email protected]>

---------

Signed-off-by: Edgar Hernández <[email protected]>
israel-hdez added a commit to israel-hdez/kserve that referenced this pull request Mar 14, 2025
* Implement authorization for Raw InferenceGraphs

Authorization is implemented by using the TokenReview and the SubjectAccessReview Kubernetes APIs. A Middleware function is setup when some arguments are specified that trigger plugging-in the middleware func.

Some additional reconciliation is added toInferenceGraph controller to:
* Switch to a different ServiceAccount so that privileges for using the cluster APIs are granted.
* Creating the needed ServiceAccount for the auth-protected InferenceGraph to run.
* Managing a ClusterRoleBinding to give the required privileges for auth verification.

Signed-off-by: Edgar Hernández <[email protected]>

* Feedback: Jooho

Fix comment

* Fix unit test

Signed-off-by: Edgar Hernández <[email protected]>

---------

Signed-off-by: Edgar Hernández <[email protected]>
openshift-merge-bot bot pushed a commit that referenced this pull request Mar 14, 2025
* Implement authorization for Raw InferenceGraphs

Authorization is implemented by using the TokenReview and the SubjectAccessReview Kubernetes APIs. A Middleware function is setup when some arguments are specified that trigger plugging-in the middleware func.

Some additional reconciliation is added toInferenceGraph controller to:
* Switch to a different ServiceAccount so that privileges for using the cluster APIs are granted.
* Creating the needed ServiceAccount for the auth-protected InferenceGraph to run.
* Managing a ClusterRoleBinding to give the required privileges for auth verification.



* Feedback: Jooho

Fix comment

* Fix unit test



---------

Signed-off-by: Edgar Hernández <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants