Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandad: rewrite from C++ to Python #34711

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from
Draft

Conversation

deanlee
Copy link
Contributor

@deanlee deanlee commented Feb 26, 2025

No description provided.

@adeebshihadeh
Copy link
Contributor

Beautiful red diff 😍

@adeebshihadeh adeebshihadeh added this to the 0.9.9 milestone Feb 26, 2025
@deanlee deanlee force-pushed the python_pandad branch 2 times, most recently from 96296f8 to 3b8f96d Compare February 26, 2025 03:29
@adeebshihadeh
Copy link
Contributor

trigger-jenkins

@deanlee deanlee force-pushed the python_pandad branch 11 times, most recently from 8800f2c to 5b9876e Compare February 27, 2025 12:54
@deanlee deanlee marked this pull request as ready for review February 27, 2025 12:59
@adeebshihadeh
Copy link
Contributor

trigger-jenkins

@adeebshihadeh
Copy link
Contributor

The "loopback" and "test pandad spi" tests are failing in Jenkins.

Do you have a comma 3X to test the SPI? If not, I can send you a couple.

@adeebshihadeh adeebshihadeh marked this pull request as draft February 28, 2025 22:11
@deanlee deanlee force-pushed the python_pandad branch 2 times, most recently from b61a500 to d3dc85a Compare March 1, 2025 12:23
@deanlee
Copy link
Contributor Author

deanlee commented Mar 1, 2025

Thanks for pointing this out! I have a comma 3X and can test the SPI locally. There are two key issues causing the 'loopback' and 'test pandad spi' failures in Jenkins:

  1. msgq GIL Handling: The current SubSocket implementation holds the GIL during recv calls, blocking other python threads. I’ve submitted commaai/msgq#643 to fix this by adding with nogil to SubSocket methods (e.g., receive), allowing concurrent thread execution without GIL contention.
  2. spidev Performance: The spidev module suffers from performance bottlenecks, delaying can_recv() responses. This causes timing assertion failures in test_pandad_spi.py (e.g., assert edt * 0.9 < np.mean(dts) < edt * 1.1), as CAN messages aren’t returned quickly enough.

I noticed you’ve already set a bounty for 'switch to kernel driver for low-level SPI comms,' which should resolve the spidev performance issue. Once that’s implemented, the python pandad should operate smoothly.

@adeebshihadeh
Copy link
Contributor

Merged #1. I was hoping we can get good enough performance to not be blocked on the kernel driver, but if you don't think we can, let's move back to the UI rewrite.

Would be really great if we can ship the rewrite for 0.9.9!

Copy link
Contributor

@adeebshihadeh adeebshihadeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's the can and sendcan performance on the capnp side? Any bottlenecks there?

I really think we can get the performance similar to the C++ version without the kernel driver. I'd rather not block merging this on that.

return ignition

def _fill_state(self, ps, hw_type, health):
ps.voltage = health['voltage']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose we refactor the panda health reponse to output a format compatible for dict init on the capnp message so we can do:

msg = messaging.new_message('pandaStates', 0)
msg.pandaStates[0] = panda.health()

@@ -44,13 +41,6 @@ int main(int argc, char *argv[]) {
stream = new DeviceStream(&app);
} else if (cmd_parser.isSet("zmq")) {
stream = new DeviceStream(&app, cmd_parser.value("zmq"));
} else if (cmd_parser.isSet("panda") || cmd_parser.isSet("panda-serial")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's split out this cabana part into another PR

@deanlee
Copy link
Contributor Author

deanlee commented Mar 3, 2025

How's the can and sendcan performance on the capnp side? Any bottlenecks there?

@adeebshihadeh : I don’t think there are bottlenecks in capnp or msgq. I’ll run some tests and share a C++ vs. Python sendcan performance comparison to confirm.

@adeebshihadeh
Copy link
Contributor

trigger-jenkins

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants