-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
comma three with NVMe: loggerd crash #34742
Comments
It hit this assert. Do you ever SSH into your device? Is it possible you changed the perms on the log directory? Was it fine on subsequent drives? |
Hmm, looks like it's trying to create a folder for the route itself?
Once long ago, but not on this device in over a year at least and haven't had any issues since.
Previous drives also work (I can see routes I drove earlier that same morning with Also, let me see if Settings > Device > Reboot fixes the issue... stay tuned. |
Yes. Rebooting the device also fixes it... so it couldn't be a permissions issue right? No reason for the permission to change
right? |
Ah, if you haven't SSH'd in for a long time or used a fork without doing a factory reset, then this should definitely be handled. |
But even 9735cf2 already shipped in https://github.com/commaai/openpilot/releases/tag/v0.9.7 so if this is, in fact, a 0.9.8 regression it would have to
For what it's worth, the last path to upload correctly before the issue was so presumably I attempted to engage openpilot 4 times in between (sounds about right) and all of those hit the assert (you should see exactly 4 reports of that assert triggering from my device, if so) |
So that number is the route count, not engaged count (route = one "ignition" session). If this is a regression, I suspect it has to do with an NVMe race condition. If that's true, I'd expect the issue to start on the first route after a bootup and persist until reboot. |
Yeah, it's NVMe related. You're also getting: Traceback (most recent call last):
File "/data/openpilot/system/loggerd/uploader.py", line 58, in listdir_by_creation
paths = [f for f in os.listdir(d) if os.path.isdir(os.path.join(d, f))]
^^^^^^^^^^^^^
OSError: [Errno 5] Input/output error: '/data/media/0/realdata/' Unfortunately, I haven't been able to repro this on my desk so far, and we don't even have a bootlog from the bad boot. If it happens again, it'd be great if you can post the output of this:
|
FWIW it's happened in four separate occasions since installing Seemingly, as long as I can avoid having the device turn off (e.g. if the 12V battery in the car gets low, the Comma device will turn itself off) it does seem like it will always eventually trigger at some point. Here are the kernel & systemd logs: I've connected a trickle charger to my Prius just now so we can hopefully keep the device in this state for as long as possible, in case there's anything else you want to try and debug |
Your NVMe is getting disconnected.
Although we made big userspace updates, the kernel is largely unchanged. It's possible this is a HW failure rather than a regression in the software. Since it's so reproducible, can you try to repro on the current release? |
It seemed to happen right as you went onroad though... perhaps it's reproducible by cycling ignition rather than pure uptime. |
Ah, I forgot that I recently made this stricter: 8272221. The NVMe behavior likely always happened on your device, but it didn't show the alert until now. I'm going to just revert that for this release and revisit for the next one. |
release3-staging
Error message: Process not running (loggerd)
Yeah. I did see it only once in the middle of driving (it would have been the
Once on release, is there anything else I can log or capture other than
If we revert the change, I won't be able to tell when the issue triggers — so I won't know when to |
Describe the bug
I got "Process not running" (loggerd) during
release3-staging
But it won't create logs to upload if loggerd isn't working... so I don't have a route to share? (At least, connect.comma.ai isn't showing this particular route)
Provide a route where the issue occurs
Dongle ID is 21015c36062c3cef (without loggerd working there is no logs, right?)
openpilot version
release3-staging (installed Feb 28, 2025)
Additional info
Original thread: https://discord.com/channels/469524606043160576/616456819027607567/1345445493869777119
The text was updated successfully, but these errors were encountered: