Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HA-Kubernetes mangles data when running mixed apiserver versions (upgrades) #46073

Closed
justinsb opened this issue May 19, 2017 · 13 comments
Closed
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.

Comments

@justinsb
Copy link
Member

We saw an interesting edge case during a user's 1.5 -> 1.6 upgrade. The installation tool (kops) did kubectl apply with a 1.6 manifest, but the 1.6 specific fields were removed from the deployment at some stage - in particular the tolerations.

The kubectl.kubernetes.io/last-applied-configuration reflected the 1.6 fields, but the toleration fields were missing from the spec itself.

I hypothesize that a 1.5 kube-controller-manager was the leader, and it did e.g. an UpdateStatus at some stage during the update.

How can we prevent this happening? The ideas I've had are that we could either report the lowest API version across the cluster, or we could use a protobuf unknown-fields approach so that we don't mangle extra fields. Neither of those are great options.

@justinsb justinsb added the area/api Indicates an issue on api area. label May 19, 2017
@erictune erictune changed the title HA-Kubernetes mangles data when ruinning mixed apiserver versions (upgrades) HA-Kubernetes mangles data when running mixed apiserver versions (upgrades) May 19, 2017
@erictune
Copy link
Member

Justin, can you please add a sig label. We are trying to standarize on sig rather than area labels. Or if there no identifiable SIG for it, say so.

@justinsb justinsb added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed area/api Indicates an issue on api area. labels May 19, 2017
@justinsb
Copy link
Member Author

Sorry @erictune - fixed!

@kubernetes kubernetes deleted a comment from justinsb Jul 31, 2017
@bgrant0607 bgrant0607 added sig/cli Categorizes an issue or PR as relevant to SIG CLI. and removed sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Jul 31, 2017
@bgrant0607
Copy link
Member

Apply is kubectl.

@kubernetes kubernetes deleted a comment from 0xmichalis Jul 31, 2017
@bgrant0607 bgrant0607 added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jul 31, 2017
@bgrant0607
Copy link
Member

And I consider this a feature request rather than a bug.

cc @kubernetes/sig-api-machinery-feature-requests @kubernetes/sig-cli-feature-requests

@bgrant0607 bgrant0607 added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 31, 2017
@bgrant0607
Copy link
Member

This is basically a manifestation of the problem discussed in #4855.

New fields shouldn't be exposed until all components are upgraded and there is no risk of rollback.

We can't store unknown fields for reasons discussed in #30819.

However, all components, but especially kubelet, really should be using patch to write status.

@justinsb
Copy link
Member Author

justinsb commented Jul 31, 2017

Edit: this "crossed in the mail" with bgrant's previous reply, but I think the patch issue still holds.

So I think there are two things here:

  1. kubectl apply will apply a N+1 manifest, reflect that in last-applied-configuration, but if it was applied to version N apiserver, some fields may be missing. Even after the upgrade, kubectl apply will believe all is well.
  2. Mixing versions of components, even within our version skew, can lose fields that are added if a version N component updates a version N+1 object. I believe this even applies to patch commands, if the apiserver is at version N.

I am happy to split out the kubectl issue if desired. The API issue is more problematic though.

@mml
Copy link
Contributor

mml commented Aug 17, 2017

@justinsb Looking forward to seeing a design doc for this when it's ready.
/sig archictecture

@justinsb
Copy link
Member Author

@mml I think sig-apimachinery should own this; it's too fundamental an issue I believe.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 3, 2018
@liggitt liggitt removed the sig/cli Categorizes an issue or PR as relevant to SIG CLI. label Jan 6, 2018
@bgrant0607 bgrant0607 mentioned this issue Jan 22, 2018
4 tasks
@bgrant0607
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 23, 2018
@lavalamp lavalamp self-assigned this Jan 30, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 30, 2018
@bgrant0607
Copy link
Member

/remove-lifecycle stale
/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 1, 2018
@liggitt
Copy link
Member

liggitt commented Feb 20, 2019

fyi, with the guidance in https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api_changes.md#alpha-field-in-existing-api-version (and the sweeps done in #72169 and #72651), n-1 skew among API servers will no longer drop data from beta+ fields supported in version n, as long as those fields existed in alpha level at version n

@liggitt liggitt closed this as completed Feb 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

No branches or pull requests

8 participants