-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
token times out under load? #188
Comments
Can we increase the frequency of the publishing beacon to every minute? Is there a concern of running out of bandwidth? |
Yep, there should be no harm (other than an obscene amount of CI traffic) in doing that! Although it's strange that only Java is seeing that error -- we've seen other flakes on the Python client, but not consistent (or with that error). |
It might be worth trying to pinpoint the reason here -- I have not seen this issue myself since the current token infra was setup...
|
Here's a successful but slow run: https://github.com/sigstore/sigstore-java/actions/runs/12915297054/job/36016838227 |
I think the problem is partly that Java runs this 4 times per PR (against [java17, java 11]x[staging, prod]). And this usually happens on our dependabot update cycle, when 10 prs are created. |
@woodruffw I don't think this is related but I did notice that this is building the (Also, I think we should probably be compiling our dependencies here as well...) |
Maybe I'm not looking in the right place but all production runs seem to be 15-17 mins each? This seems fairly consistent, in fact I've not seen any other results so far. |
Also maybe we can run the test suite with |
Expand out: https://github.com/sigstore/sigstore-java/actions/runs/12915296222 And you should see the individual runs where prod takes 15+ and staging takes 1-2m. But looking at it more now, it looks like maybe prod is getting rate limited or something? what do |
@loosebazooka Out of curiosity, how is the conformance CLI client invoked? Do you spin up a new process each time? (I'm wondering if JVM startup is playing a role -- when running the CPython bundle tests we end up spawning 1000+ processes, which I could see taking quite a bit of time.) |
I think there are just so many more tests for prod that it takes a lot longer: The selftest here takes 2 mins for prod and 0.2 mins for staging as well.
good question. |
Hmm, yeah -- it looks like
Yeah, fully agreed! We have them fully pinned at the moment, but not fully resolved or hashed. I can look into that last part today. |
I think the comma indicates a hung/slow test, and pytest outputs the comma to keep the output going? This happens with Python too: https://github.com/sigstore/sigstore-python/actions/runs/12894538382/job/35953394258 |
Looks like later, pre-releases have wheels: https://pypi.org/project/grpclib/0.4.8rc2/#files |
Ah yep, I just started writing an issue and noticed vmagamedov/grpclib#188 🙂 |
Updating with some details:
So something clearly goes wrong in the test run on GHA for sigstore-java -- none of this explains the original issue of token not being available but it may explain why only sigstore-java sees the problem: Like maybe there is a bug in how the token is refreshed when it expires multiple times during the test run but the situation only comes up on java tests... |
Oh I thought cpython tests were skipped by default? Is that not true? Locally they are skipped because of the no github_workspace, but I guess the default on the skip is "false". |
Each instance of the test is starting up a new jvm though. So there's that. We could make conformance a long running process that could help, but it would complicate what the cli is. |
I've had a closer look at the TUF conformance tests (since that test suite is even more blocked by the client startup time: I know because I've profiled the python case) and this does seem like a good explanation: the python client test run already spends > 90% of its time in python module imports and the java client test run takes ~4x as long. So startup cost for sigstore-java client is roughly 4x that of sigstore-python client So
|
I think we can do a release, and close this for now. I believe @jku 's PR treats the symptoms sufficiently. There's still some longer term issues with starting the jvm for each test. But it's much lower priority now. |
Sounds good to me 🙂 -- I can do a release in a moment. |
Just cut 0.0.17. |
Not sure what the root cause is here. But we frequently see
in sigstore-java. I imagine it happens when there's some slow down on GHA. Wondering if we should explore an alternative here?
The text was updated successfully, but these errors were encountered: