Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Model status from LocalModelNode status #4056

Merged
merged 11 commits into from
Nov 20, 2024

Conversation

greenmoon55
Copy link
Contributor

@greenmoon55 greenmoon55 commented Nov 15, 2024

What this PR does / why we need it:
This is built on top of #4053

  1. Updates Model status from LocalModelNode status - this controller now watches LocalModelNode
  2. Remove code and permissions for managing jobs

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Type of changes
Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Feature/Issue validation/testing:

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test A

  • Test B
    updated the tests

  • Logs

Special notes for your reviewer:

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

Release note:


Re-running failed tests

  • /rerun-all - rerun all failed workflows.
  • /rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

@greenmoon55 greenmoon55 marked this pull request as ready for review November 15, 2024 22:28
@yuzisun
Copy link
Member

yuzisun commented Nov 19, 2024

@greenmoon55 Need to rebase the PR as #4053 is merged

Copy link
Member

@yuzisun yuzisun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@yuzisun yuzisun merged commit 8f7f44e into kserve:model_cache Nov 20, 2024
46 checks passed
yuzisun added a commit that referenced this pull request Dec 3, 2024
* LocalModelNode Daemonset Controller Skeleton (#4026)

* hello world controller

Signed-off-by: Gavin Li <[email protected]>

* go fmt

Signed-off-by: Gavin Li <[email protected]>

* daemonset

Signed-off-by: Gavin Li <[email protected]>

* Update Makefile

Co-authored-by: Jin Dong <[email protected]>
Signed-off-by: Gavin Li <[email protected]>

* make generate

Signed-off-by: Gavin Li <[email protected]>

* install LocalModelNode CRD

Signed-off-by: Gavin Li <[email protected]>

* feedback

Signed-off-by: Gavin Li <[email protected]>

* make manifests

Signed-off-by: Gavin Li <[email protected]>

* agent

Signed-off-by: Gavin Li <[email protected]>

Co-authored-by: Jin Dong <[email protected]>

* LocalModelController creates LocalModelNode resource for ready nodes (#4036)

* Manage localmodelNode

Signed-off-by: Jin Dong <[email protected]>

* Update patch

Signed-off-by: Jin Dong <[email protected]>

* Fix rbac

Signed-off-by: Jin Dong <[email protected]>

* Add a test to controller_test.go

Signed-off-by: Jin Dong <[email protected]>

* Update pkg/controller/v1alpha1/localmodel/controller.go

Co-authored-by: Dan Sun <[email protected]>
Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>
Co-authored-by: Dan Sun <[email protected]>

* Delete from LocalModelNode when the localmodel is deleted (#4053)

* Delete model from LocalModelNode

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Fix lint

Signed-off-by: Jin Dong <[email protected]>

* Initializer node status map

Signed-off-by: Jin Dong <[email protected]>

* Address comments

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>

* Update Model status from LocalModelNode status (#4056)

* Delete model from LocalModelNode

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Fix lint

Signed-off-by: Jin Dong <[email protected]>

* Initializer node status map

Signed-off-by: Jin Dong <[email protected]>

* Update status

Signed-off-by: Jin Dong <[email protected]>

* Update localmodel node status

Signed-off-by: Jin Dong <[email protected]>

* Remove job dependency from localmodel controller

Signed-off-by: Jin Dong <[email protected]>

* Remove some unused lines

Signed-off-by: Jin Dong <[email protected]>

* Add comments

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>

* LocalModelNode Agent that creates download jobs and update statuses from jobs (#4075)

* download working

Signed-off-by: Gavin Li <[email protected]>

* delete working

Signed-off-by: Gavin Li <[email protected]>

* cleanup

Signed-off-by: Gavin Li <[email protected]>

* gofmt

Signed-off-by: Gavin Li <[email protected]>

* Delete model from LocalModelNode

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Fix lint

Signed-off-by: Jin Dong <[email protected]>

* Initializer node status map

Signed-off-by: Jin Dong <[email protected]>

* Update status

Signed-off-by: Jin Dong <[email protected]>

* Update localmodel node status

Signed-off-by: Jin Dong <[email protected]>

* Remove job dependency from localmodel controller

Signed-off-by: Jin Dong <[email protected]>

* Remove some unused lines

Signed-off-by: Jin Dong <[email protected]>

* Add comments

Signed-off-by: Jin Dong <[email protected]>

* Update manager

Signed-off-by: Jin Dong <[email protected]>

* Update rbac

Signed-off-by: Jin Dong <[email protected]>

* Add tests and temporarily remove delete models code

Signed-off-by: Jin Dong <[email protected]>

* Do not create download jobs if model is already downloaded

Signed-off-by: Jin Dong <[email protected]>

* remove mislieading log line

Signed-off-by: Jin Dong <[email protected]>

* Clean up code a little bit

Signed-off-by: Jin Dong <[email protected]>

* Update configurations

Signed-off-by: Jin Dong <[email protected]>

* update test

Signed-off-by: Jin Dong <[email protected]>

* Use a fixed name for the download container

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Gavin Li <[email protected]>
Signed-off-by: Jin Dong <[email protected]>
Co-authored-by: Gavin Li <[email protected]>

* Delete models from local disk when they are not in LocalModelNode spec (#4084)

* download working

Signed-off-by: Gavin Li <[email protected]>

* delete working

Signed-off-by: Gavin Li <[email protected]>

* Delete model from LocalModelNode

Signed-off-by: Jin Dong <[email protected]>

* Initializer node status map

Signed-off-by: Jin Dong <[email protected]>

* Update status

Signed-off-by: Jin Dong <[email protected]>

* Update localmodel node status

Signed-off-by: Jin Dong <[email protected]>

* Update manager

Signed-off-by: Jin Dong <[email protected]>

* Update rbac

Signed-off-by: Jin Dong <[email protected]>

* Add tests and temporarily remove delete models code

Signed-off-by: Jin Dong <[email protected]>

* Do not create download jobs if model is already downloaded

Signed-off-by: Jin Dong <[email protected]>

* Delete function

Signed-off-by: Jin Dong <[email protected]>

* Update configurations

Signed-off-by: Jin Dong <[email protected]>

* Add test and Fix deletion code

Signed-off-by: Jin Dong <[email protected]>

* Use a fixed name for the download container

Signed-off-by: Jin Dong <[email protected]>

* Remove deleted models from status and periodically trigger reconciliation

Signed-off-by: Jin Dong <[email protected]>

* Fix storagecontainer permissions and a minor change

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Gavin Li <[email protected]>
Signed-off-by: Jin Dong <[email protected]>
Co-authored-by: Gavin Li <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>
Signed-off-by: Gavin Li <[email protected]>
Co-authored-by: Gavin Li <[email protected]>
Co-authored-by: Jin Dong <[email protected]>
yuzisun added a commit to yuzisun/kserve that referenced this pull request Dec 14, 2024
* LocalModelNode Daemonset Controller Skeleton (kserve#4026)

* hello world controller

Signed-off-by: Gavin Li <[email protected]>

* go fmt

Signed-off-by: Gavin Li <[email protected]>

* daemonset

Signed-off-by: Gavin Li <[email protected]>

* Update Makefile

Co-authored-by: Jin Dong <[email protected]>
Signed-off-by: Gavin Li <[email protected]>

* make generate

Signed-off-by: Gavin Li <[email protected]>

* install LocalModelNode CRD

Signed-off-by: Gavin Li <[email protected]>

* feedback

Signed-off-by: Gavin Li <[email protected]>

* make manifests

Signed-off-by: Gavin Li <[email protected]>

* agent

Signed-off-by: Gavin Li <[email protected]>

Co-authored-by: Jin Dong <[email protected]>

* LocalModelController creates LocalModelNode resource for ready nodes (kserve#4036)

* Manage localmodelNode

Signed-off-by: Jin Dong <[email protected]>

* Update patch

Signed-off-by: Jin Dong <[email protected]>

* Fix rbac

Signed-off-by: Jin Dong <[email protected]>

* Add a test to controller_test.go

Signed-off-by: Jin Dong <[email protected]>

* Update pkg/controller/v1alpha1/localmodel/controller.go

Co-authored-by: Dan Sun <[email protected]>
Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>
Co-authored-by: Dan Sun <[email protected]>

* Delete from LocalModelNode when the localmodel is deleted (kserve#4053)

* Delete model from LocalModelNode

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Fix lint

Signed-off-by: Jin Dong <[email protected]>

* Initializer node status map

Signed-off-by: Jin Dong <[email protected]>

* Address comments

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>

* Update Model status from LocalModelNode status (kserve#4056)

* Delete model from LocalModelNode

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Fix lint

Signed-off-by: Jin Dong <[email protected]>

* Initializer node status map

Signed-off-by: Jin Dong <[email protected]>

* Update status

Signed-off-by: Jin Dong <[email protected]>

* Update localmodel node status

Signed-off-by: Jin Dong <[email protected]>

* Remove job dependency from localmodel controller

Signed-off-by: Jin Dong <[email protected]>

* Remove some unused lines

Signed-off-by: Jin Dong <[email protected]>

* Add comments

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>

* LocalModelNode Agent that creates download jobs and update statuses from jobs (kserve#4075)

* download working

Signed-off-by: Gavin Li <[email protected]>

* delete working

Signed-off-by: Gavin Li <[email protected]>

* cleanup

Signed-off-by: Gavin Li <[email protected]>

* gofmt

Signed-off-by: Gavin Li <[email protected]>

* Delete model from LocalModelNode

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Fix lint

Signed-off-by: Jin Dong <[email protected]>

* Initializer node status map

Signed-off-by: Jin Dong <[email protected]>

* Update status

Signed-off-by: Jin Dong <[email protected]>

* Update localmodel node status

Signed-off-by: Jin Dong <[email protected]>

* Remove job dependency from localmodel controller

Signed-off-by: Jin Dong <[email protected]>

* Remove some unused lines

Signed-off-by: Jin Dong <[email protected]>

* Add comments

Signed-off-by: Jin Dong <[email protected]>

* Update manager

Signed-off-by: Jin Dong <[email protected]>

* Update rbac

Signed-off-by: Jin Dong <[email protected]>

* Add tests and temporarily remove delete models code

Signed-off-by: Jin Dong <[email protected]>

* Do not create download jobs if model is already downloaded

Signed-off-by: Jin Dong <[email protected]>

* remove mislieading log line

Signed-off-by: Jin Dong <[email protected]>

* Clean up code a little bit

Signed-off-by: Jin Dong <[email protected]>

* Update configurations

Signed-off-by: Jin Dong <[email protected]>

* update test

Signed-off-by: Jin Dong <[email protected]>

* Use a fixed name for the download container

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Gavin Li <[email protected]>
Signed-off-by: Jin Dong <[email protected]>
Co-authored-by: Gavin Li <[email protected]>

* Delete models from local disk when they are not in LocalModelNode spec (kserve#4084)

* download working

Signed-off-by: Gavin Li <[email protected]>

* delete working

Signed-off-by: Gavin Li <[email protected]>

* Delete model from LocalModelNode

Signed-off-by: Jin Dong <[email protected]>

* Initializer node status map

Signed-off-by: Jin Dong <[email protected]>

* Update status

Signed-off-by: Jin Dong <[email protected]>

* Update localmodel node status

Signed-off-by: Jin Dong <[email protected]>

* Update manager

Signed-off-by: Jin Dong <[email protected]>

* Update rbac

Signed-off-by: Jin Dong <[email protected]>

* Add tests and temporarily remove delete models code

Signed-off-by: Jin Dong <[email protected]>

* Do not create download jobs if model is already downloaded

Signed-off-by: Jin Dong <[email protected]>

* Delete function

Signed-off-by: Jin Dong <[email protected]>

* Update configurations

Signed-off-by: Jin Dong <[email protected]>

* Add test and Fix deletion code

Signed-off-by: Jin Dong <[email protected]>

* Use a fixed name for the download container

Signed-off-by: Jin Dong <[email protected]>

* Remove deleted models from status and periodically trigger reconciliation

Signed-off-by: Jin Dong <[email protected]>

* Fix storagecontainer permissions and a minor change

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Gavin Li <[email protected]>
Signed-off-by: Jin Dong <[email protected]>
Co-authored-by: Gavin Li <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>
Signed-off-by: Gavin Li <[email protected]>
Co-authored-by: Gavin Li <[email protected]>
Co-authored-by: Jin Dong <[email protected]>
yuzisun added a commit to yuzisun/kserve that referenced this pull request Dec 20, 2024
* LocalModelNode Daemonset Controller Skeleton (kserve#4026)

* hello world controller

Signed-off-by: Gavin Li <[email protected]>

* go fmt

Signed-off-by: Gavin Li <[email protected]>

* daemonset

Signed-off-by: Gavin Li <[email protected]>

* Update Makefile

Co-authored-by: Jin Dong <[email protected]>
Signed-off-by: Gavin Li <[email protected]>

* make generate

Signed-off-by: Gavin Li <[email protected]>

* install LocalModelNode CRD

Signed-off-by: Gavin Li <[email protected]>

* feedback

Signed-off-by: Gavin Li <[email protected]>

* make manifests

Signed-off-by: Gavin Li <[email protected]>

* agent

Signed-off-by: Gavin Li <[email protected]>

Co-authored-by: Jin Dong <[email protected]>

* LocalModelController creates LocalModelNode resource for ready nodes (kserve#4036)

* Manage localmodelNode

Signed-off-by: Jin Dong <[email protected]>

* Update patch

Signed-off-by: Jin Dong <[email protected]>

* Fix rbac

Signed-off-by: Jin Dong <[email protected]>

* Add a test to controller_test.go

Signed-off-by: Jin Dong <[email protected]>

* Update pkg/controller/v1alpha1/localmodel/controller.go

Co-authored-by: Dan Sun <[email protected]>
Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>
Co-authored-by: Dan Sun <[email protected]>

* Delete from LocalModelNode when the localmodel is deleted (kserve#4053)

* Delete model from LocalModelNode

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Fix lint

Signed-off-by: Jin Dong <[email protected]>

* Initializer node status map

Signed-off-by: Jin Dong <[email protected]>

* Address comments

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>

* Update Model status from LocalModelNode status (kserve#4056)

* Delete model from LocalModelNode

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Fix lint

Signed-off-by: Jin Dong <[email protected]>

* Initializer node status map

Signed-off-by: Jin Dong <[email protected]>

* Update status

Signed-off-by: Jin Dong <[email protected]>

* Update localmodel node status

Signed-off-by: Jin Dong <[email protected]>

* Remove job dependency from localmodel controller

Signed-off-by: Jin Dong <[email protected]>

* Remove some unused lines

Signed-off-by: Jin Dong <[email protected]>

* Add comments

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>

* LocalModelNode Agent that creates download jobs and update statuses from jobs (kserve#4075)

* download working

Signed-off-by: Gavin Li <[email protected]>

* delete working

Signed-off-by: Gavin Li <[email protected]>

* cleanup

Signed-off-by: Gavin Li <[email protected]>

* gofmt

Signed-off-by: Gavin Li <[email protected]>

* Delete model from LocalModelNode

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Fix lint

Signed-off-by: Jin Dong <[email protected]>

* Initializer node status map

Signed-off-by: Jin Dong <[email protected]>

* Update status

Signed-off-by: Jin Dong <[email protected]>

* Update localmodel node status

Signed-off-by: Jin Dong <[email protected]>

* Remove job dependency from localmodel controller

Signed-off-by: Jin Dong <[email protected]>

* Remove some unused lines

Signed-off-by: Jin Dong <[email protected]>

* Add comments

Signed-off-by: Jin Dong <[email protected]>

* Update manager

Signed-off-by: Jin Dong <[email protected]>

* Update rbac

Signed-off-by: Jin Dong <[email protected]>

* Add tests and temporarily remove delete models code

Signed-off-by: Jin Dong <[email protected]>

* Do not create download jobs if model is already downloaded

Signed-off-by: Jin Dong <[email protected]>

* remove mislieading log line

Signed-off-by: Jin Dong <[email protected]>

* Clean up code a little bit

Signed-off-by: Jin Dong <[email protected]>

* Update configurations

Signed-off-by: Jin Dong <[email protected]>

* update test

Signed-off-by: Jin Dong <[email protected]>

* Use a fixed name for the download container

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Gavin Li <[email protected]>
Signed-off-by: Jin Dong <[email protected]>
Co-authored-by: Gavin Li <[email protected]>

* Delete models from local disk when they are not in LocalModelNode spec (kserve#4084)

* download working

Signed-off-by: Gavin Li <[email protected]>

* delete working

Signed-off-by: Gavin Li <[email protected]>

* Delete model from LocalModelNode

Signed-off-by: Jin Dong <[email protected]>

* Initializer node status map

Signed-off-by: Jin Dong <[email protected]>

* Update status

Signed-off-by: Jin Dong <[email protected]>

* Update localmodel node status

Signed-off-by: Jin Dong <[email protected]>

* Update manager

Signed-off-by: Jin Dong <[email protected]>

* Update rbac

Signed-off-by: Jin Dong <[email protected]>

* Add tests and temporarily remove delete models code

Signed-off-by: Jin Dong <[email protected]>

* Do not create download jobs if model is already downloaded

Signed-off-by: Jin Dong <[email protected]>

* Delete function

Signed-off-by: Jin Dong <[email protected]>

* Update configurations

Signed-off-by: Jin Dong <[email protected]>

* Add test and Fix deletion code

Signed-off-by: Jin Dong <[email protected]>

* Use a fixed name for the download container

Signed-off-by: Jin Dong <[email protected]>

* Remove deleted models from status and periodically trigger reconciliation

Signed-off-by: Jin Dong <[email protected]>

* Fix storagecontainer permissions and a minor change

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Gavin Li <[email protected]>
Signed-off-by: Jin Dong <[email protected]>
Co-authored-by: Gavin Li <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>
Signed-off-by: Gavin Li <[email protected]>
Co-authored-by: Gavin Li <[email protected]>
Co-authored-by: Jin Dong <[email protected]>
yuzisun added a commit to yuzisun/kserve that referenced this pull request Dec 22, 2024
* LocalModelNode Daemonset Controller Skeleton (kserve#4026)

* hello world controller

Signed-off-by: Gavin Li <[email protected]>

* go fmt

Signed-off-by: Gavin Li <[email protected]>

* daemonset

Signed-off-by: Gavin Li <[email protected]>

* Update Makefile

Co-authored-by: Jin Dong <[email protected]>
Signed-off-by: Gavin Li <[email protected]>

* make generate

Signed-off-by: Gavin Li <[email protected]>

* install LocalModelNode CRD

Signed-off-by: Gavin Li <[email protected]>

* feedback

Signed-off-by: Gavin Li <[email protected]>

* make manifests

Signed-off-by: Gavin Li <[email protected]>

* agent

Signed-off-by: Gavin Li <[email protected]>

Co-authored-by: Jin Dong <[email protected]>

* LocalModelController creates LocalModelNode resource for ready nodes (kserve#4036)

* Manage localmodelNode

Signed-off-by: Jin Dong <[email protected]>

* Update patch

Signed-off-by: Jin Dong <[email protected]>

* Fix rbac

Signed-off-by: Jin Dong <[email protected]>

* Add a test to controller_test.go

Signed-off-by: Jin Dong <[email protected]>

* Update pkg/controller/v1alpha1/localmodel/controller.go

Co-authored-by: Dan Sun <[email protected]>
Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>
Co-authored-by: Dan Sun <[email protected]>

* Delete from LocalModelNode when the localmodel is deleted (kserve#4053)

* Delete model from LocalModelNode

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Fix lint

Signed-off-by: Jin Dong <[email protected]>

* Initializer node status map

Signed-off-by: Jin Dong <[email protected]>

* Address comments

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>

* Update Model status from LocalModelNode status (kserve#4056)

* Delete model from LocalModelNode

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Fix lint

Signed-off-by: Jin Dong <[email protected]>

* Initializer node status map

Signed-off-by: Jin Dong <[email protected]>

* Update status

Signed-off-by: Jin Dong <[email protected]>

* Update localmodel node status

Signed-off-by: Jin Dong <[email protected]>

* Remove job dependency from localmodel controller

Signed-off-by: Jin Dong <[email protected]>

* Remove some unused lines

Signed-off-by: Jin Dong <[email protected]>

* Add comments

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>

* LocalModelNode Agent that creates download jobs and update statuses from jobs (kserve#4075)

* download working

Signed-off-by: Gavin Li <[email protected]>

* delete working

Signed-off-by: Gavin Li <[email protected]>

* cleanup

Signed-off-by: Gavin Li <[email protected]>

* gofmt

Signed-off-by: Gavin Li <[email protected]>

* Delete model from LocalModelNode

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Fix lint

Signed-off-by: Jin Dong <[email protected]>

* Initializer node status map

Signed-off-by: Jin Dong <[email protected]>

* Update status

Signed-off-by: Jin Dong <[email protected]>

* Update localmodel node status

Signed-off-by: Jin Dong <[email protected]>

* Remove job dependency from localmodel controller

Signed-off-by: Jin Dong <[email protected]>

* Remove some unused lines

Signed-off-by: Jin Dong <[email protected]>

* Add comments

Signed-off-by: Jin Dong <[email protected]>

* Update manager

Signed-off-by: Jin Dong <[email protected]>

* Update rbac

Signed-off-by: Jin Dong <[email protected]>

* Add tests and temporarily remove delete models code

Signed-off-by: Jin Dong <[email protected]>

* Do not create download jobs if model is already downloaded

Signed-off-by: Jin Dong <[email protected]>

* remove mislieading log line

Signed-off-by: Jin Dong <[email protected]>

* Clean up code a little bit

Signed-off-by: Jin Dong <[email protected]>

* Update configurations

Signed-off-by: Jin Dong <[email protected]>

* update test

Signed-off-by: Jin Dong <[email protected]>

* Use a fixed name for the download container

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Gavin Li <[email protected]>
Signed-off-by: Jin Dong <[email protected]>
Co-authored-by: Gavin Li <[email protected]>

* Delete models from local disk when they are not in LocalModelNode spec (kserve#4084)

* download working

Signed-off-by: Gavin Li <[email protected]>

* delete working

Signed-off-by: Gavin Li <[email protected]>

* Delete model from LocalModelNode

Signed-off-by: Jin Dong <[email protected]>

* Initializer node status map

Signed-off-by: Jin Dong <[email protected]>

* Update status

Signed-off-by: Jin Dong <[email protected]>

* Update localmodel node status

Signed-off-by: Jin Dong <[email protected]>

* Update manager

Signed-off-by: Jin Dong <[email protected]>

* Update rbac

Signed-off-by: Jin Dong <[email protected]>

* Add tests and temporarily remove delete models code

Signed-off-by: Jin Dong <[email protected]>

* Do not create download jobs if model is already downloaded

Signed-off-by: Jin Dong <[email protected]>

* Delete function

Signed-off-by: Jin Dong <[email protected]>

* Update configurations

Signed-off-by: Jin Dong <[email protected]>

* Add test and Fix deletion code

Signed-off-by: Jin Dong <[email protected]>

* Use a fixed name for the download container

Signed-off-by: Jin Dong <[email protected]>

* Remove deleted models from status and periodically trigger reconciliation

Signed-off-by: Jin Dong <[email protected]>

* Fix storagecontainer permissions and a minor change

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Gavin Li <[email protected]>
Signed-off-by: Jin Dong <[email protected]>
Co-authored-by: Gavin Li <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>
Signed-off-by: Gavin Li <[email protected]>
Co-authored-by: Gavin Li <[email protected]>
Co-authored-by: Jin Dong <[email protected]>
Signed-off-by: Dan Sun <[email protected]>
yuzisun added a commit that referenced this pull request Dec 22, 2024
* Local Model Node CR (#3978)

* init CR

Signed-off-by: Gavin Li <[email protected]>

* make generate

Signed-off-by: Gavin Li <[email protected]>

* make manifests

Signed-off-by: Gavin Li <[email protected]>

* black format

Signed-off-by: Gavin Li <[email protected]>

* fix generated python code

Signed-off-by: Gavin Li <[email protected]>

* feedback

Signed-off-by: Gavin Li <[email protected]>

* more feedback

Signed-off-by: Gavin Li <[email protected]>

* black format

Signed-off-by: Gavin Li <[email protected]>

* make manifests

Signed-off-by: Gavin Li <[email protected]>

---------

Signed-off-by: Gavin Li <[email protected]>
Signed-off-by: Dan Sun <[email protected]>

* Model cache controller and node agent  (#4089)

* LocalModelNode Daemonset Controller Skeleton (#4026)

* hello world controller

Signed-off-by: Gavin Li <[email protected]>

* go fmt

Signed-off-by: Gavin Li <[email protected]>

* daemonset

Signed-off-by: Gavin Li <[email protected]>

* Update Makefile

Co-authored-by: Jin Dong <[email protected]>
Signed-off-by: Gavin Li <[email protected]>

* make generate

Signed-off-by: Gavin Li <[email protected]>

* install LocalModelNode CRD

Signed-off-by: Gavin Li <[email protected]>

* feedback

Signed-off-by: Gavin Li <[email protected]>

* make manifests

Signed-off-by: Gavin Li <[email protected]>

* agent

Signed-off-by: Gavin Li <[email protected]>

Co-authored-by: Jin Dong <[email protected]>

* LocalModelController creates LocalModelNode resource for ready nodes (#4036)

* Manage localmodelNode

Signed-off-by: Jin Dong <[email protected]>

* Update patch

Signed-off-by: Jin Dong <[email protected]>

* Fix rbac

Signed-off-by: Jin Dong <[email protected]>

* Add a test to controller_test.go

Signed-off-by: Jin Dong <[email protected]>

* Update pkg/controller/v1alpha1/localmodel/controller.go

Co-authored-by: Dan Sun <[email protected]>
Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>
Co-authored-by: Dan Sun <[email protected]>

* Delete from LocalModelNode when the localmodel is deleted (#4053)

* Delete model from LocalModelNode

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Fix lint

Signed-off-by: Jin Dong <[email protected]>

* Initializer node status map

Signed-off-by: Jin Dong <[email protected]>

* Address comments

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>

* Update Model status from LocalModelNode status (#4056)

* Delete model from LocalModelNode

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Fix lint

Signed-off-by: Jin Dong <[email protected]>

* Initializer node status map

Signed-off-by: Jin Dong <[email protected]>

* Update status

Signed-off-by: Jin Dong <[email protected]>

* Update localmodel node status

Signed-off-by: Jin Dong <[email protected]>

* Remove job dependency from localmodel controller

Signed-off-by: Jin Dong <[email protected]>

* Remove some unused lines

Signed-off-by: Jin Dong <[email protected]>

* Add comments

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>

* LocalModelNode Agent that creates download jobs and update statuses from jobs (#4075)

* download working

Signed-off-by: Gavin Li <[email protected]>

* delete working

Signed-off-by: Gavin Li <[email protected]>

* cleanup

Signed-off-by: Gavin Li <[email protected]>

* gofmt

Signed-off-by: Gavin Li <[email protected]>

* Delete model from LocalModelNode

Signed-off-by: Jin Dong <[email protected]>

* Cleanup code

Signed-off-by: Jin Dong <[email protected]>

* Fix lint

Signed-off-by: Jin Dong <[email protected]>

* Initializer node status map

Signed-off-by: Jin Dong <[email protected]>

* Update status

Signed-off-by: Jin Dong <[email protected]>

* Update localmodel node status

Signed-off-by: Jin Dong <[email protected]>

* Remove job dependency from localmodel controller

Signed-off-by: Jin Dong <[email protected]>

* Remove some unused lines

Signed-off-by: Jin Dong <[email protected]>

* Add comments

Signed-off-by: Jin Dong <[email protected]>

* Update manager

Signed-off-by: Jin Dong <[email protected]>

* Update rbac

Signed-off-by: Jin Dong <[email protected]>

* Add tests and temporarily remove delete models code

Signed-off-by: Jin Dong <[email protected]>

* Do not create download jobs if model is already downloaded

Signed-off-by: Jin Dong <[email protected]>

* remove mislieading log line

Signed-off-by: Jin Dong <[email protected]>

* Clean up code a little bit

Signed-off-by: Jin Dong <[email protected]>

* Update configurations

Signed-off-by: Jin Dong <[email protected]>

* update test

Signed-off-by: Jin Dong <[email protected]>

* Use a fixed name for the download container

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Gavin Li <[email protected]>
Signed-off-by: Jin Dong <[email protected]>
Co-authored-by: Gavin Li <[email protected]>

* Delete models from local disk when they are not in LocalModelNode spec (#4084)

* download working

Signed-off-by: Gavin Li <[email protected]>

* delete working

Signed-off-by: Gavin Li <[email protected]>

* Delete model from LocalModelNode

Signed-off-by: Jin Dong <[email protected]>

* Initializer node status map

Signed-off-by: Jin Dong <[email protected]>

* Update status

Signed-off-by: Jin Dong <[email protected]>

* Update localmodel node status

Signed-off-by: Jin Dong <[email protected]>

* Update manager

Signed-off-by: Jin Dong <[email protected]>

* Update rbac

Signed-off-by: Jin Dong <[email protected]>

* Add tests and temporarily remove delete models code

Signed-off-by: Jin Dong <[email protected]>

* Do not create download jobs if model is already downloaded

Signed-off-by: Jin Dong <[email protected]>

* Delete function

Signed-off-by: Jin Dong <[email protected]>

* Update configurations

Signed-off-by: Jin Dong <[email protected]>

* Add test and Fix deletion code

Signed-off-by: Jin Dong <[email protected]>

* Use a fixed name for the download container

Signed-off-by: Jin Dong <[email protected]>

* Remove deleted models from status and periodically trigger reconciliation

Signed-off-by: Jin Dong <[email protected]>

* Fix storagecontainer permissions and a minor change

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Gavin Li <[email protected]>
Signed-off-by: Jin Dong <[email protected]>
Co-authored-by: Gavin Li <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>
Signed-off-by: Gavin Li <[email protected]>
Co-authored-by: Gavin Li <[email protected]>
Co-authored-by: Jin Dong <[email protected]>
Signed-off-by: Dan Sun <[email protected]>

* Update ClusterLocalModel to LocalModelCache (#4105)

* Update ClusterLocalModel to LocalModelCache

Signed-off-by: Dan Sun <[email protected]>

* Fix generation fmt

Signed-off-by: Dan Sun <[email protected]>

* black fmt

Signed-off-by: Dan Sun <[email protected]>

* Fix generated code

Signed-off-by: Dan Sun <[email protected]>

* Run go mod tidy

Signed-off-by: Dan Sun <[email protected]>

* Fix model status

Signed-off-by: Dan Sun <[email protected]>

---------

Signed-off-by: Dan Sun <[email protected]>

* Fix LocalModelCache controller reconciles deleted resource (#4106)

* Fix LocalModel controller reconciles deleted resource

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Rebase

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Fix path base routing e2e workflow

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

---------

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
Signed-off-by: Dan Sun <[email protected]>

* Add namespace to localmodel and localmodelnode ServiceAccount helm chart (#4111)

add localmodelnode agent image

Signed-off-by: Rituraj Singh <[email protected]>
Co-authored-by: Rituraj Singh <[email protected]>
Signed-off-by: Dan Sun <[email protected]>

* Detect missing models and redownload models (#4095)

* another squash

Signed-off-by: Jin Dong <[email protected]>

* Add JobTTLSecondsAfterFinished option

Signed-off-by: Jin Dong <[email protected]>

* Update config

Signed-off-by: Jin Dong <[email protected]>

* Use labels to filter jobs instead of deleting old jobs

Signed-off-by: Jin Dong <[email protected]>

* Add log in test

Signed-off-by: Jin Dong <[email protected]>

* Fix test and helm chart

Signed-off-by: Jin Dong <[email protected]>

* Create a seperate file system utils file

Signed-off-by: Jin Dong <[email protected]>

* Add comments

Signed-off-by: Jin Dong <[email protected]>

* Fix status update bug

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>
Signed-off-by: Dan Sun <[email protected]>

* Allow multiple node groups in the model cache CR (#4134)

* Allow multiple node groups in the model cache CR

Signed-off-by: Jin Dong <[email protected]>

* Fix test

---------

Signed-off-by: Jin Dong <[email protected]>
Signed-off-by: Dan Sun <[email protected]>

* Annotation to disable model cache (#4118)

Signed-off-by: Jin Dong <[email protected]>
Signed-off-by: Dan Sun <[email protected]>

* Clean up jobs in model cache agent (#4140)

* Clean up jobs

Signed-off-by: Jin Dong <[email protected]>

* fix lint

Signed-off-by: Jin Dong <[email protected]>

* Fix lint

Signed-off-by: Jin Dong <[email protected]>

* Fix deletion propagation policy

Signed-off-by: Jin Dong <[email protected]>

* Update test

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>
Signed-off-by: Dan Sun <[email protected]>

* Ensure Model root folder exists (#4142)

Signed-off-by: Jin Dong <[email protected]>
Signed-off-by: Dan Sun <[email protected]>

* Add NodeGroup Name Into PVC Name (#4141)

* Add NodeGroup Name Into PVC Name

Signed-off-by: Gavin Li <[email protected]>

* Add comment to fix multiple node group

Signed-off-by: Dan Sun <[email protected]>

* fix openvino dependency

Signed-off-by: Dan Sun <[email protected]>

---------

Signed-off-by: Gavin Li <[email protected]>
Signed-off-by: Dan Sun <[email protected]>
Co-authored-by: Dan Sun <[email protected]>
Signed-off-by: Dan Sun <[email protected]>

* Make LocalModel Agent reconcilation frequency configurable (#4143)

* Make reconcilation configurable

Signed-off-by: Jin Dong <[email protected]>

* Fix codegen

Signed-off-by: Jin Dong <[email protected]>

* Remove a redudant space

Signed-off-by: Jin Dong <[email protected]>

* Rename config

Signed-off-by: Jin Dong <[email protected]>

* Fix lint

Signed-off-by: Jin Dong <[email protected]>

---------

Signed-off-by: Jin Dong <[email protected]>
Co-authored-by: Dan Sun <[email protected]>
Signed-off-by: Dan Sun <[email protected]>

* LocalModelCache Admission Webhook (#4102)

* init

Signed-off-by: Gavin Li <[email protected]>

* broken code

Signed-off-by: Gavin Li <[email protected]>

* register webhook

Signed-off-by: Gavin Li <[email protected]>

* rename + working

Signed-off-by: Gavin Li <[email protected]>

* pass in client

Signed-off-by: Gavin Li <[email protected]>

* check storageURI

Signed-off-by: Gavin Li <[email protected]>

---------

Signed-off-by: Gavin Li <[email protected]>
Signed-off-by: Dan Sun <[email protected]>

* Fix isvc role localmodelcache permission (#4131)

* Fix localmodelcache permission for isvc

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

* Patch localmodelcache webhook for kubeflow overlay

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

---------

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
Signed-off-by: Dan Sun <[email protected]>

---------

Signed-off-by: Gavin Li <[email protected]>
Signed-off-by: Dan Sun <[email protected]>
Signed-off-by: Jin Dong <[email protected]>
Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
Signed-off-by: Rituraj Singh <[email protected]>
Co-authored-by: Gavin Li <[email protected]>
Co-authored-by: Gavin Li <[email protected]>
Co-authored-by: Jin Dong <[email protected]>
Co-authored-by: Sivanantham <[email protected]>
Co-authored-by: Rituraj Singh <[email protected]>
Co-authored-by: Rituraj Singh <[email protected]>
@greenmoon55 greenmoon55 deleted the update-status branch December 23, 2024 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants