Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KServe new incubating branch release-v0.12.0 #286

Closed
wants to merge 74 commits into from

Conversation

Jooho
Copy link

@Jooho Jooho commented Apr 4, 2024

The previous release-0.12.0 branch was created based on kserve/release-0.12 and this pr will add commits only for ODH.

taneem-ibrahim and others added 30 commits July 10, 2023 18:15
**What this does / why we need it**:

    This PR adds custom code to make KServe run on OpenShift without the need for anyuid SCC.

**More context**:

OpenShift uses istio-cni which causes an issue with init-containers when calling external services
like S3 or similar. Setting the uid for the storage-initializer to the same uid as the
uid of the istio-proxy resolves the issue. In OpenShift the istio-proxy always gets assigned
the first uid from the namespaces uid range + 1 (The range is defined in an annotation on the namespace).

**Release note**:

```release-note
The `storage-initializer` container will now run with the same `uid` as the `istio-proxy` which resolves an issue when istio-cni is used.
```

---
Squashed commit titles:
* add storage-initializer uid handling for OpenShift with istio-cni
* update storage_initializer_injector tests
* Also use annotation on pod to override uid
* Remove manager's rbac-proxy and add ODH requried network policies
* Workaround for Kustomize bug about creationTimestamp
  See: kubernetes-sigs/kustomize#5031

  Without this workaround, some Kustomize versions are generating
  `creationTimestamp: "null"` (null as a string).

Signed-off-by: Edgar Hernández <[email protected]>
* Workflows are changed to push to Quay.io.
* The go.yml workflow is changed to omit updating the coverage badge (we don't have one, for now).
* The README.md file is updated to right ODH urls.

Signed-off-by: Edgar Hernández <[email protected]>
Adapt GH-workflows to correctly push to ODH container repositories
Kustomize patches for OpenShift were cherry-picked from the release-v0.10 branch. The cherry-pick succeeded, but the resulting manifests were not working, because of the differences. This fixes the manifests and bring them back to a working state.

Signed-off-by: Edgar Hernández <[email protected]>
These are the needed changes to have openshift-ci running the E2E tests successfully.

There are several groups of E2E tests that can be deduced from the .github/workflows/e2e-test.yaml file: fast, slow, explainer, transformer-mms, qpext, grpc, helm, raw and kourier. For ODH, the `fast`, `slow` and `grpc` groups are the ones that cover the features that are going to be supported in the initial adoption of ODH.

This commit contains the needed adaptations to the E2E tests of the `fast` and `slow` groups to successfully run them in an openshift cluster. It also adds a few scripts on test/scripts/openshift-ci to run these E2Es in the openshift-ci operator.

Some of these changes should be seen as provisional and should be rolled back:
* test/e2e/common/utils.py: because of the networking/DNS expectations, that are currently not covered in ODH's installation.
* test/e2e/predictor/*:
  * In general all changes under this path should be seen as provisional. However, since ODH won't support all ServingRuntimes, it is possible that some of the tests will stay out.
  * There are some GRPC-related tests marked as skipped. Since this work is not enabling the `grpc` group, a subsequent commit/PR for enabling GRPC E2Es should remove/revert those skip marks.
  * Also, there are some tests skipped with the `Not testable in ODH at the moment` reason. The root cause of the failure should be investigated to re-enable these tests.
* python/kserve/kserve/models/v1beta1_inference_service.py: This is injecting an annotation that is required given the specifics of OSSM/Maistra and OpenShift-Serverless that are used in ODH. This annotation is, currently, user responsibility and this was the cleanest way to add it in the E2Es. Being platform-specific, it's been discussed that this (and some other) annotation should be injected by some controller to relief the user from this responsibility. If this happens, this change should be reverted.

Also, ideally, changes to the following files should be contributed back to upstream. Those changes are not required in upstream and should have no effect, but in openshift-ci become required because a different builder image is being used:
* Dockerfile
* agent.Dockerfile

Signed-off-by: Edgar Hernández <[email protected]>
Openshift-ci onboarding

These are the needed changes to have openshift-ci running the E2E tests successfully.

There are several groups of E2E tests that can be deduced from the .github/workflows/e2e-test.yaml file: fast, slow, explainer, transformer-mms, qpext, grpc, helm, raw and kourier. For ODH, the `fast`, `slow` and `grpc` groups are the ones that cover the features that are going to be supported in the initial adoption of ODH.

This commit contains the needed adaptations to the E2E tests of the `fast` and `slow` groups to successfully run them in an openshift cluster. It also adds a few scripts on test/scripts/openshift-ci to run these E2Es in the openshift-ci operator.

Some of these changes should be seen as provisional and should be rolled back:
* test/e2e/common/utils.py: because of the networking/DNS expectations, that are currently not covered in ODH's installation. These changes should be rolled back once the following ticked is fixed: opendatahub-io/odh-model-controller#59
* test/e2e/predictor/*:
  * In general all changes under this path should be seen as provisional. However, since ODH won't support all ServingRuntimes, it is possible that some of the tests will stay out.
  * There are some GRPC-related tests marked as skipped. Since this work is not enabling the `grpc` group, a subsequent commit/PR for enabling GRPC E2Es should remove/revert those skip marks.
  * Also, there are some tests skipped with the `Not testable in ODH at the moment` reason. The root cause of the failure should be investigated to re-enable these tests.
* python/kserve/kserve/models/v1beta1_inference_service.py: This is injecting an annotation that is required given the specifics of OSSM/Maistra and OpenShift-Serverless that are used in ODH. This annotation is, currently, user responsibility and this was the cleanest way to add it in the E2Es. Being platform-specific, it's been discussed that this (and some other) annotation should be injected by some controller to relief the user from this responsibility. If this happens, this change should be reverted.

Also, ideally, changes to the following files should be contributed back to upstream. Those changes are not required in upstream and should have no effect, but in openshift-ci become required because a different builder image is being used:
* Dockerfile
* agent.Dockerfile
Augments the `default` profile with some changes expected by an ODH installation:
* Removes the `Namespace` CR, because the ODH operator does not expect such resource. The Namespace is expected to be created in advance to later create a KfDef on it, where resources are going to be installed.
* Adds cluster roles, to extend the cluster's default user-facing roles with KServe privileges.

Signed-off-by: Edgar Hernández <[email protected]>
[Sync] kserve/kserve-master to master branch
Signed-off-by: Edgar Hernández <[email protected]>
Signed-off-by: Edgar Hernández <[email protected]>
automate addition of new isues into ODH board
Code sync up to upstream commit for v0.11.1
Open Data Hub operator v2 is going to be consuming Kustomize manifests
from component repos, and `odh-manifests` repo is going to be archived.

This is moving/copying artifacts from `odh-manifests` into an already
existent odh overlay. With these changes, the overlay can be directly
consumed by ODH-operator v2.

Signed-off-by: Edgar Hernández <[email protected]>
…master

[master] Preparation for odh-opeartor v2
Partial revert of
opendatahub-io/odh-manifests#916, because
opendatahub-io/odh-model-controller#84 has been
completed.

Signed-off-by: Edgar Hernández <[email protected]>
openshift-merge-bot bot and others added 9 commits February 27, 2024 23:20
…ableingress

set disableIngressCreation to true in odh overlay
add .tekton to gitignore to avoid conflict due to Konflux
Fixes CVE-2024-24762 - Regular Expression Denial of Service (ReDoS)
Remove the fastapi when this is addressed:  https://issues.redhat.com/browse/RHOAIENG-3894
or ray releses a new version that removes the fastapi version pinning and it gets updated on KServe

Signed-off-by: Spolti <[email protected]>
…ableingress

fix syntax error in inferenceservice-config patch
[RHOAIENG-3551] - fastapi - Regular Expression Denial of Service (ReDoS)
The increased memory limit is for the controller pod to work normally in clusters having 9k+ secrets.

Related https://issues.redhat.com/browse/RHOAIENG-3996

Signed-off-by: Edgar Hernández <[email protected]>
…d-master

Increase memory limit of kserve-controller pod
@Jooho
Copy link
Author

Jooho commented Apr 4, 2024

/retest

ajstewart and others added 5 commits April 4, 2024 12:17
)

* Add model_version to InferResponse class

Signed-off-by: Adam Stewart <[email protected]>

* Update infer_type tests

Signed-off-by: Adam Stewart <[email protected]>

* Updated from_grpc class method argument ordering

Signed-off-by: Adam Stewart <[email protected]>

---------

Signed-off-by: Adam Stewart <[email protected]>
For that a new injector that adds a so-called "modelcar"
container to "kserve-container" as a sidecar has been added.
This setups a pod for sharing the process namespace
(shareProcessNamespace = true).

The following configuration options have been added:

* `enableModelcar` to switch on this feature (default: false)
* `cpuModelcar` and `memoryModelcar` to set the resources for the
   modelcar container
* `uidModelcar` for the UID to use for the user-container *and*
   the modelcar contained

See kserve#3110 for more information and
architecture of this feature.

Signed-off-by: Roland Huß <[email protected]>
Signed-off-by: Dan Sun <[email protected]>
Since ODH would support KServe's RawDeployment mode, this modifies the scripts around OpenShift-ci setup to be possible to run RawDeployment-related E2Es.

The run-e2e-tests.sh script is modified to exclude installation of Service Mesh and Serverless, when RawDeployments E2Es are requested to run. A supporting file inferenceservice-openshift-ci-raw.yaml was added to patch KServe's configuration to use RawDeployment mode by default and to use OpenShift Ingress when exposing Inference Services.

Since the E2Es use some annotations in the InferenceService, changes done to the v1beta1_inference_service.py file in commit ecff079 were reverted. As an alternative, the `enablePassthrough` annotation was moved to the ServingRuntime resources. This is not only cleaner, but also reduces the diverging code with the upstream repository. Furthermore, this seems to be an auto-generated file that should not be touched.

Signed-off-by: Edgar Hernández <[email protected]>
In Serverless version 1.32.0, the `domain-mapping` deployment no longer exists. This updates the deploy.serverless.sh script used in openshift-ci to no longer wait for this deployment.

Signed-off-by: Edgar Hernández <[email protected]>
@Jooho Jooho changed the title KServe new incubating branch release-0.12.0 KServe new incubating branch release-v0.12.0 Apr 5, 2024
@Jooho
Copy link
Author

Jooho commented Apr 5, 2024

ci : openshift/release#50652

Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link

openshift-ci bot commented Apr 9, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Jooho, terrytangyuan

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Jooho
Copy link
Author

Jooho commented Apr 9, 2024

/verify-owners

Copy link

openshift-ci bot commented Apr 9, 2024

The following users are mentioned in OWNERS file(s) but are untrusted for the following reasons. One way to make the user trusted is to add them as members of the opendatahub-io org. You can then trigger verification by writing /verify-owners in a comment.

  • rpancham
    • User is not a member of the org. User is not a collaborator. Satisfy at least one of these conditions to make the user trusted.

@israel-hdez
Copy link

Instead of this, I suggest re-creating the release-v0.12.0 branch using commit 926a43c, and then applying this small PR on it: #262

@Jooho Jooho deleted the branch opendatahub-io:release-v0.12.0 April 10, 2024 12:48
@Jooho Jooho closed this Apr 10, 2024
@Jooho
Copy link
Author

Jooho commented Apr 10, 2024

@israel-hdez As you requested, I created a new branch based on the commit 926a43c

xieshenzh pushed a commit to xieshenzh/kserve that referenced this pull request Jul 11, 2024
…tudio-kserve-storage-initializer-210

Red Hat Konflux update kserve-storage-initializer-210
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.