-
Notifications
You must be signed in to change notification settings - Fork 497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to collect all selectors for PID" error="workload attestor \"k8s\" failed: rpc error: code = DeadlineExceeded desc = workloadattestor(k8s): no selectors found after max poll attempts" #3092
Comments
@tanjunchen It looks like your attest workload exit when the spire agent collects its selectors. |
yes,but I want to get ca by SDS,it failed.
|
@tanjunchen I've seen similar error, but was related to the fact I tried to fetch an identity from a process running in initContainer, which led to failing attestation. |
This error message indicates that the K8s workload attestor was unable to locate the pod containing the workload container after a configurable period of time. The process is basically as follows:
To debug this, it may be necessary to perform these steps yourself to determine where the breakdown lies. From the agent container, you can query
(caveat: i haven't tested this command) You can then see if you can identify the workload pod. |
hello,could you explain the version of k8s,envoy and so and?do you have resole it ? |
ok, Thanks for your explanation, thank you very much. |
@loveyana @radoslav-tomov @azdagron
This should be a bug in spire. |
I'm not sure I understand how that flag is impacting workload attestation. Envoy is the workload being attested in this situation, correct? Presumably the envoy container has a good status even if the application hasn't started? Can you share the pod spec and status for the time period when workload attestation is being attempted? |
pls see https://istio.io/latest/docs/reference/config/istio.mesh.v1alpha1/ for holdApplicationUntilProxyStarts. |
If you have the pod spec and status available, that would really help speed up debugging. If not, that is totally ok. It will however probably take some time for someone to find the time to reproduce. @maxlambrecht have you seen this before? |
the logs of istio-proxy in httpbin-8cb97b8f4-cq9xh:
the httpbin pod info:
the spire-agent log:
|
The container ID that SPIRE is trying to attest is not present in the pod status. Is it possible that the Envoy proxy is crashing on startup? That would explain why SPIRE can't find the container and why the peertracker layer is no longer able to watch the calling process. What do the envoy logs show? |
@azdagron |
What version of Istio is this? What environment is it deployed into (your own k8s cluster, minikube, k3s, kind, etc)? What container runtime is being used? This will help me reproduce. If SPIRE can't find the container in any of the returned pod status, then one possibility is that SPIRE has misidentified the container ID from the cgroups. Is it possible for you to:
|
I was able to reproduce the error using Istio
I see the same error on the SPIRE Agent logs, related to the attestation of the process:
|
|
I've repro'd this and am digging in as I have time. From what I can tell, the envoy containers started up in the workload pods aren't showing up in the pod listing retrieved from the kubelet. I'm still trying to figure out why. |
@azdagron I don't recall al the details, but I've noticed that the pod needs to be in certain status for the info to be available. For example I've seen that prior to condition |
Hmm, I haven't been able to observe that behavior. If I stand up a vanilla kubernetes cluster (via kind 0.12.0) and deploy a workload with an init container that never terminates and is thus perpetually in an Init state, I can still gather the init container's pod UID and container ID via cgroups from the agent container and see them in the pod listing returned by the kubelet. |
@faisal-memon Could you take a look? |
sounds good |
Was able to reproduce this. The container ID is not getting populated in the pod container status while it is in the initializing state. Maybe the init container is different? |
@faisal-memon @rturner3 how is it? |
How is it different than init container? Im not sure, just guessing that init container populates container id sooner to explain the difference with what @azdagron observed. There are two ways to resolve this issue:
Im exploring option 1 right now. |
@faisal-memon hello, I see the above issue for kubernetes Pod Pid。 |
IIRC, removing the container ID check is not tenable. We use that check to sharp shoot the correct container from the pod to generate accurate selectors for the workload. If we cannot identify the container in the pod, then attestation is weakened. |
@tanjunchen may I ask why you want to use |
|
holdApplicationUntilProxyStarts indicates whether the application starts first or istio-proxy starts first. |
The issue described in spiffe#3092 is unlikely to be solved any time. Adding it to the docs so its more easily discovered. Signed-off-by: Evan Gilman <[email protected]>
The issue described in spiffe#3092 is unlikely to be solved any time. Adding it to the docs so its more easily discovered. Signed-off-by: Evan Gilman <[email protected]>
Thanks @tanjunchen - yes I understand what the feature does, I am wondering why you need it It doesn't look like this issue will be resolved on the k8s side any time soon. The next best thing could be to convince istio to implement the feature differently ... that also seems like a long shot but is worth it if we can find an alternate mechanism (one that does not lean on these lifecycle hooks). In the meantime, I raised #3443 which adds this as a known issue to the plugin docs |
The issue described in #3092 is unlikely to be solved any time. Adding it to the docs so its more easily discovered. Signed-off-by: Evan Gilman <[email protected]>
In thinking further about this, one potential option is to provide a new configurable on the k8s workload attestor e.g. |
This change introduces a new configurable, `disable_container_selectors` which configures the K8s Workload Attestor to only produce pod-related selectors. This allows for workload attesation to succeed when the attestor can positively locate the workload pod but cannot yet locate the workload container at the time of attestation (e.g. postStart hook is still executing). See issue spiffe#3092 for more details. Fixes: spiffe#3092 Signed-off-by: Andrew Harding <[email protected]>
This change introduces a new configurable, `disable_container_selectors` which configures the K8s Workload Attestor to only produce pod-related selectors. This allows for workload attesation to succeed when the attestor can positively locate the workload pod but cannot yet locate the workload container at the time of attestation (e.g. postStart hook is still executing). See issue spiffe#3092 for more details. Fixes: spiffe#3092 Signed-off-by: Andrew Harding <[email protected]>
This change introduces a new configurable, `disable_container_selectors` which configures the K8s Workload Attestor to only produce pod-related selectors. This allows for workload attesation to succeed when the attestor can positively locate the workload pod but cannot yet locate the workload container at the time of attestation (e.g. postStart hook is still executing). See issue #3092 for more details. Fixes: #3092 Signed-off-by: Andrew Harding <[email protected]>
The issue described in spiffe#3092 is unlikely to be solved any time. Adding it to the docs so its more easily discovered. Signed-off-by: Evan Gilman <[email protected]>
This change introduces a new configurable, `disable_container_selectors` which configures the K8s Workload Attestor to only produce pod-related selectors. This allows for workload attesation to succeed when the attestor can positively locate the workload pod but cannot yet locate the workload container at the time of attestation (e.g. postStart hook is still executing). See issue spiffe#3092 for more details. Fixes: spiffe#3092 Signed-off-by: Andrew Harding <[email protected]>
Version:
1.0.1
Platform:
k8s + istio
Subsystem:
spire-server:
spire-agent:
When I get the certificate through spire-agent, I get an error.
the logs of spire-agent:
the logs of spire-server:
the agent list in spire-server.
the entry in spire-server.
The text was updated successfully, but these errors were encountered: