Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to launch VM with error "The connection to service named com.apple.fonts was invalidated" #612

Closed
ruimarinho opened this issue Sep 26, 2023 · 9 comments
Labels
bug Something isn't working

Comments

@ruimarinho
Copy link
Contributor

Hi,

Recently one of our Macs started failing to run VMs with the following message:

{"level":"warn","ts":1695721794.078638,"msg":"'tart run orchard-01HB8D202QXGFN4EATPEHGA1FC-10ad125f-67e1-4a06-9b78-4c888dd20719-0' failed with exit code 1: 2023-09-26 10:49:53.929 tart[11835:635144] XType: failed to connect - Error Domain=NSCocoaErrorDomain Code=4099 \"The connection to service named com.apple.fonts was invalidated: failed at lookup with error 3 - No such process.\" UserInfo={NSDebugDescription=The connection to service named com.apple.fonts was invalidated: failed at lookup with error 3 - No such process.}","vm_uid":"10ad125f-67e1-4a06-9b78-4c888dd20719","vm_name":"01HB8D202QXGFN4EATPEHGA1FC","vm_restart_count":0}

Any idea what could cause such issue? Running tart 1.12.0.

@ruimarinho
Copy link
Contributor Author

A few observations:

  • fontd was running on this particular machine.
  • killing fontd and ensuring it did not make a difference
  • restart the orchard worker fixed the issue, so unsure if this is a tart or an orchard issue, even though the exit code codes from tart.

@fkorotkov
Copy link
Contributor

Interesting error! Seems something that is coming from SwiftUI because Tart is only using labels and stuff in non headkess mode. I wonder if Orchard should use headless mode for VMs by default. What do you think @edigaryev?

@fkorotkov fkorotkov added the bug Something isn't working label Sep 26, 2023
@ruimarinho
Copy link
Contributor Author

Should I switch to headless for now while we search for the root cause?

@fkorotkov
Copy link
Contributor

Yeah, I think it's in general the right mode for Orchard use case. Headless only affects whether there is a Tart windows. The VM itself has a display either way.

@edigaryev
Copy link
Collaborator

@fkorotkov it does use headless by default when creating the VM via the CLI.

@ruimarinho as for the issue, it might be due to a GUI pop-up showing up on the worker's machine, which doesn't get answered. If the issue is reproducible, can you verify that?

@ruimarinho
Copy link
Contributor Author

@edigaryev I am unable to replicate right now because I had to restart the orchard worker in order to resume jobs, but there wasn't (and there isn't) anything on the worker's UI that could justify this (no popup; nothing from the OS was being shown, etc).

I'm wondering if this could be in any way related to the 256 VMs/day being reach but somehow being bubbled on a different domain?

Regardless, I'll switch to headless and see if we hit anything similar again. Perhaps it could be worth running a small stress testing with headless mode enabled.

@edigaryev
Copy link
Collaborator

I'm wondering if this could be in any way related to the 256 VMs/day being reach but somehow being bubbled on a different domain?

This is unlikely, because the error seems to be happened within a second from the moment of tart run invocation, but it normally takes way more time for the VM to boot to the stage where it'll start interacting with the DHCP server.

Regardless, I'll switch to headless and see if we hit anything similar again. Perhaps it could be worth running a small stress testing with headless mode enabled.

I've just checked and Tart doesn't open any fonts when running with --no-graphics aka headless mode.

This can be easily verified by running lsof -p $(pgrep tart) | grep Fonts.

So hopefully this will help. By the way, do I understand correctly that are you creating the VMs from the REST API and not from the CLI?

@ruimarinho
Copy link
Contributor Author

Unfortunately I had to revert that decision because of cirruslabs/orchard#138 (i.e. VMs becoming completely unresponsive).

This can be easily verified by running lsof -p $(pgrep tart) | grep Fonts

Indeed. I even unloaded the service and made sure it stayed that way, restart the worker and everything came back to life again. Very weird..

So hopefully this will help. By the way, do I understand correctly that are you creating the VMs from the REST API and not from the CLI?

Yes, all interaction is done via the REST API through the controller.

@fkorotkov
Copy link
Contributor

Closing since we can't reproduce and there is a workaround in place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants