-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add image trigger annotation for filling in image-field-value of container. Use imagestream name and tag for JUPYTER_IMAGE #800
Conversation
Hi @shalberd. Thanks for your PR. I'm waiting for a opendatahub-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good. We typically don't use underscores for property names, but if we don't have a linter for it, it's not worth changing until we do.
Sven and I are in talks about issues with backoff errors when starting a Notebook with his PR.
/hold Looks like there may be an issue with the OpenShift side of things that prevents us from merging this atm. The change in image references seems to cause it not to find the image itself and pulling falls and the pod does not start (backoff error). Sven is investigating - openshift/openshift-apiserver#339 |
Larger issue with image-resolve from imagestreams in kubeflow notebook controller. Explained the situation to Kubeflow at kubeflow/notebooks#98 Basically, the issue lies less with the Notebook CRD, but with the resource creating the StatefulSet from a Notebook CR; that is the notebook-controller. I have tested this within-namespace (Notebook / StatefulSet and imagestream in same namespace) and across-namespaces and it is working beautifully for StatefulSets: When one of the core Kubernetes resources contains both a pod template and this annotation, OpenShift Container Platform attempts to update the object by using the image currently associated with the image stream tag that is referenced by trigger. The update is performed against the fieldPath specified. Example, having a StatefulSet in odhsven2, referring to an imagestream in odhsven namespace, just adding an annotation and leaving the image-field empty automatically fills in the correct image. The StatefulSet annotation is like this, in metadata:
So, notebook controller needs to exclude this annotation as well as the image-field from reconciliation. Most importantly, notebook controller needs to take it from the Notebook CR, if present, on StatefulSet creation. With the lookup-fieldPath to the image-value being the same value as Notebook metadata.name, as that is also the notebook containers[0].name. |
/retest |
@shalberd can you rebase latest changes? there has been some upgrades in the environment and depencies, once that's done we can re-review it. Thanks in advance! |
Also if you could run our linter and fix the issues the PR is flagging, that'll also help. |
Also @lucferbux has a issue to fix this -- but @shalberd we are using Node 18 now. You'll have to upgrade that. We have to fix the documentation (which is for the next release -- end of next week) Ref: #1034 |
a75bad2
to
3061877
Compare
@andrewballantyne rebased, will run linter tomorrow, but most issues should be fixed now. The general idea of the new annotation is that there can be any value in the container image-field really, so I used an empty space. It is later overwritten with the correct value by the image policy admission plug-in. Can you run th linting remotely? See what happens to the image-field cause of the image policy admission plug-in doing its work:
if spec.tags[tagname].referencePolicy.type of the imagestream is set to Source then
if spec.tags[tagname].referencePolicy.type of the imagestream is set to Local then
|
/retest |
@lucferbux ok, squashing achieved. One single, meaningful commit. Because there was a mix of local changes and merge commits in-between, I had to use a tool called git-squash being in the current feature branch and having main branch up-to-date, doing That helped me with the process, in the end having that one new commit and porce-pushing to the feature branch afterwards. |
@lucferbux @atheo89 In the manifests deployment recipes, we currently still have image pull policy Always, so this fast-changing image behind the pr-800 tag is not an issue when using tag-format. If you want to test with modified odh-manifests, you can replace the values, either leave it at newTag: or use digest: sha256:xxx notation. Advantage of using digests is also that is it 100% comparable in terms of image used between several people trying it out, as the digest is truly unique. put
or
then, for kubeflow notebook controller put
or
An advantage I see with image behind a certain tag changing, even when using imagePullPolicy: IfNotPresent: when using digest notation, collaborators can be absolutely sure they are using the same images :-) https://quay.io/repository/opendatahub/kubeflow-notebook-controller?tab=tags https://quay.io/repository/opendatahub/odh-dashboard?tab=tags |
I'll check the code once more once we have kubeflow notebook controller PR-133 done for references to https://github.com/search?q=repo%3Aopendatahub-io%2Fodh-dashboard%20%22image.split%22&type=code and resolve conflicts. update Oct 19: rebase done to latest HEAD of main. Will change the base branch of this PR from main to f/imagestreams once that one is fast-forwarded. |
d3f0d96
to
f4a6755
Compare
641f214
to
8af82c3
Compare
8af82c3
to
dcb33b3
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
8105b82
to
56ef006
Compare
…me as value of env JUPYTER_IMAGE while using the combination of imagestream/image name and tag as the image source for notebook; achieving independence from the internal openshift docker registry
cb61d43
to
1f6c127
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes implemented
frontend/src/pages/projects/screens/detail/notebooks/useNotebookImageData.ts
Outdated
Show resolved
Hide resolved
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for you continued patience with this feature.
Provided my review on this, i believe , we should this whole part towards the notebook-controller itself, instead of trying to do this via dashboard, as it would require more changes based on restart and creation of notebook.
If you agree, i would share that lets address the changes in Notebook-controller and close this PR if all works as we require.
@@ -268,6 +266,7 @@ export const assembleNotebook = async ( | |||
'opendatahub.io/username': username, | |||
'kubeflow-resource-stopped': null, | |||
'opendatahub.io/accelerator-name': accelerator.accelerator?.metadata.name || '', | |||
'image.openshift.io/triggers': `[{"from":{"kind":"ImageStreamTag","name":"${imageSelection}", "namespace":"${namespace}"},"fieldPath":"spec.template.spec.containers[?(@.name==\\"${name}\\")].image"}]`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the preview test of this,
In case of the change of referencePolicy
from Local
to Source
or vice-versa; or change to the image tag source. The annotation quickly updates the resources, which causes the pods to restart.
As it restarts, in case of a change of image tag, this becomes an issue for long-running notebooks.
We have a requirement for not hindering long-running notebook, and also that we update tag for the same imagestream tag based on cve fix.
In this case, we would need to use the image-trigger annotation field called paused, based on notebook scale down or up. If the user scales down, then only the image should be changed or else it should stay the same.
Perhaps, we should do this whole action directly on Notebook-Controller, which notebook-controller takes care of the Notebook Image, based on its type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Harshad, sorry for the delay, thank you for your thoughts. Generally speaking, you are right when it comes to long-running notebooks (of course, if a node is ever down, even the best notebook when not replicated is not truly long-running, there is no guarantee).
In case of the change of referencePolicy from Local to Source or vice-versa; or change to the image tag source. The annotation quickly updates the resources, which causes the pods to restart.
Yes, that is true, it kind of "ripples through" without much delay, leading to pod restarts. We have that currently in our private Harbor repo that imagestream tag.from.name is based on, when the sha hash changes, yup. Leads to restarts of workbench pods.
We have a requirement for not hindering long-running notebook ... In this case, we would need to use the image-trigger annotation field called paused, based on notebook scale down or up. If the user scales down, then only the image should be changed
paused: true or false could be based on a custom notebook annotation. Possibly together with a slider in dashboard GUI that allows a user to override if they want to update.
About stopped slider in Dashboard GUI, so a stopped workbench and notebook / statefulset: the change in the imagestream tag underlying digest applies there, too, which is good. What I mean:
I looked up a workbench statefulset with replicas: 0 and looked at the notebook container image field value.
It contained the latest digest behind the tag of the imgestream / docker image base. So when paused: false is set in the annotation (which is the default) https://docs.openshift.com/container-platform/4.12/openshift_images/triggering-updates-on-imagestream-changes.html then changes ripple though to even a replicas: 0 statefulset / podspec.
You wrote: "if the user scales down, then only should the image be changed". It does not matter whether scaled down or not, image is always changed. What matters only the the value of paused: in the image change trigger annotation.
I think I understand what you want to achieve with either odh notebook controller or kubeflow notebook controller with respect to paused: true vs paused: false, I just think setting the value on the image change trigger annotation as a whole should start on odh dashboard. That is of course up fro discussion :-)
Perhaps, we should do this whole action directly on Notebook-Controller, which notebook-controller takes care of the Notebook Image, based on its type.
You mean based on a new notebook opendatahub annotation, like long-running: true/false or pauseable: true/false?
And by whole action, you mean the injection of the image stream change trigger annotation into the pod (not container)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot just close this PR as dashboard notebook assembly won't work since it is based on imagestream status:
dockerImageRepository, which is empty when no internal openshift registry is present.
Also, on each notebook podspec update via dashboard GUI (requests and limits sizes, env vars ...), the image-field would again be set by dashboard. So we do need to touch the assemble and update logic here in some minimal way at least.
I agree on handling some aspects by odh (or kubeflow?) notebook controller, but we defininitely need some changes here in dashboard, too.
@harshad16 @jiridanek @jstourac @atheo89 @lucferbux closed in favor odh odh notebook controller doing the notebook image field lookup in all cases (no openshift internal registry, openshift internal registry): opendatahub-io/kubeflow#336 and opendatahub-io/kubeflow#329 as well as dashboard changes #2867 Thank you all very much, neat work minor remaining: changing notebook container imagePullPolicy: IfNotPresent from old imagePullPolicy: Always as a nice-to-have feature for external registry and large images with a unique hash. Needs to be validated if that works with internal openshift registry, too. Saves times on workbench startup. |
Thanks again Sven for all your hard work on this front! |
but not for Notebook container image ref
achieving independence from the internal openshift docker registry in handling images in conjunction with imagestreams.
See https://itnext.io/variations-on-imagestreams-in-openshift-4-f8ee5e8be633 for background.
Jupyter uses referencepolicy type: Source, making an internal openshift image repo unneccessary.
However, the internal openshift repo can be used, just via imagestream name and tag, not via direct reference of the image location in the internal openshift docker repo.
to be tested in conjunction with Kubeflow notebook controller PR-133
Description
fixes kubeflow/kubeflow#792 kubeflow/kubeflow#813 kubeflow/kubeflow#1340
no dependency on dockerImageRepository field of ImageStream in imageUrl variable, thereby no dependency on internal Docker registry when spawning images that are coming from an imageStream.
Also, now, the image: field of Notebook spec has the combination of image name derived from imagestream name plus imagestream tag value (imageSelection variable).
The Notebook annotation
notebooks.opendatahub.io/last-image-selection
https://github.com/opendatahub-io/odh-dashboard/blob/main/backend/src/utils/notebookUtils.ts#L266, keeps the imageSelection variable, that is referencing imagestream name plus tag value.With imagestreams, be they spec.tags.referencePolicy type Source (external) or Local (internal openshift registry), we can use the imagestream tag and name to reference the image, wherever it may be. So, we use the imageSelection variable in the container image-field.
That is so because spec.lookupPolicy.local is set to true, it is thus absolutely ok to reference the image stream name directly for any image: field of a container, as long as the ImageStream is in the same namespace as the dashboard and controller, which it is.
Even for the DSG feature in which you spawn Notebooks in different namespaces, you are already covering that correctly with system:image-puller, making it even more logical not to reference from the internal docker repo location in the image field of the notebook.
Image-field fill-in happens on StatefulSet creation via metadata annotation and image policy admission plugin of openshift. Image-field can either initially be set empty or even be left out completely, tested with StatefulSet outside of notebook controller context.
Also, for JUPYTER_IMAGE env variable, I am using the value of imagestreammetadata.name:imgestreamtag / variable imageUrl, that is, the original location in the source repo. That is the only location where I could think of a direct image reference being useful, but I have not gotten any response on how JUPYTER_IMAGE is used in RHOSDS.
Do not merge until changes in notebook controller reconciliation logic have been implemented.
How Has This Been Tested?
Would like to test this with a sample docker image someone would provide. In an Openshift environment that does NOT have an internal openshift docker registry. Need assistence from odh-dashboard developers.
@andrewballantyne Would like to test how the Notebook CR is created with the latest changes. Coordination via slack. It will not lead to a working pod deployment, but I want to check the status of Notebook CR instance in a notebook namespace that is != dashboard namespace.
Merge criteria: