-
Notifications
You must be signed in to change notification settings - Fork 5
feat: use vectoring to jump straight to interruptN #11
Conversation
Previously, the vector table was set up in a pseudo-direct mode where all interrupts were redirected to a single `_start_trap` handler regardless of number. This change introduces 31 new `_start_trapN` targets that encode which `interruptN` function to jump to, at a cost of 180 bytes per function (~5kb total). Not shown (because the linker scripts live in the HAL) are the big block of `PROVIDES`: ```ld PROVIDE(_start_trap1 = default_start_trap1) PROVIDE(_start_trap2 = default_start_trap2) ... PROVIDE(_start_trap31 = default_start_trap31) ``` As a side benefit, this does enable overriding on a per-irq basis and could even allow someone to switch back to the current psuedo-direct mode if they so chose (though, I think they wouldn't see anything get GC'd yet). Potential improvements include: * Shrinking the saved register set to just being the callee-save parts of the C ABI, since only `default_start_trap` really has a use for the full frame * Finer-grained control around which interrupts are vectored, and which are semi-direct: maybe we can get away with just three modes at the start (all direct, only 1-16 vectored, all vectored), but I'm curious if we could find a way to precisely express in a downstream crate "I want precisely 1, 2, 4..." to be vectored. Maybe a proc macro or other generator script?
Maybe a macro to generate all those
We need the full trap frame in e.g. esp-wifi for the task scheduler (in a timer triggered interrupt). But maybe having this a feature would be an option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I've been working on essentially the same thing but wasn't 100% happy with how it looks, cool to have another set of eyes on it, and some discussion
What is your take on how the hardware -> CPU interrupt mapping should work? Currently the HAL will just shove all HW interrupts of prio x onto CPU interrupt x, which i guess works, but feels inefficient (you have to check which HW interrupt is actually triggered). I think a nice feature would be letting the user map interrupts as they see fit and define these interruptx CPU interrupt handlers themselves. In my case i'm working on a runtime specific to RTIC so i let the framework map the HW interrupts so no sharing of the CPU interrupts happens if possible (less than 31 enabled interrupts), and generate interrupt handlers automatically.(this maybe isn't too relevant for this change specifically but general interrupt flow).
Thanks for taking a look @bjoernQ and @onsdagens!
I tried a few things, but I wasn't super thrilled with any of them. Do you have an idea of what you'd want it to look like? Lacking the ability to evaluate e.g.
Oh, interesting: is it just the timer triggered interrupt that needs the whole frame? This change would allow overriding a specific entry point to pass whatever, but there's probably a lot we could do to make it easier.
I think we're of similar mind that priority -> irq num is a good broad default for a bunch of different cases, but offers the opportunity for a few improvements. Truth be told, I haven't gelled an idea yet of what I want to do, so hopefully you don't mind a bit of a brain dump. <brain dump> In addition to the points you raised about using the full suite of available hardware I've got some concerns about the shape of defined handlers. In our project, pretty much immediately @dougli1sqrd built us a way to register a closure so we could stop messing with statics quite so much, which was nice. Sadly, I had to pretty severely limit it as part of this work to only take function pointers and not full closures. The dynamically generated functions were getting emitted as part of There's also 4-5 pieces that need to line up just so for the handler to get called at all ( That's to say nothing of It does seem common to have the magic mapping of e.g. </brain dump> So yeah. My plan is to try and push the vectoring a little further "down" closer to the user-defined handlers, which I expect will be informative to me about what the challenges are, since I don't think I really understand them yet. "Fence was torn down; other side contained tiger. 0 stars."
Oooh, that sounds neat: can you share a link? In messing around with the debug assist (which was really an exercise in interrupt handling too), I wrote a little baby assembler in an effort to get runtime-remappable vector slots so I could avoid having to clobber the whole interrupt table to change one target. But, alas, writes to flash memory don't exactly work the way writes to RAM do 🙃, so there was at least a partial hard dependency on moving the vector to RAM. Or, the changes in this PR would let me do what I needed much more readily by just overriding e.g. Thanks for getting the discussion rolling: I'm looking forward to collaborating more on this! |
Required if we want to use `#![feature(...)]`s, else clippy complains: ``` error[E0554]: `#![feature]` may not be used on the stable release channel --> src/lib.rs:18:1 | 18 | #![feature(naked_functions)] | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For more information about this error, try `rustc --explain E0554`. error: could not compile `esp-riscv-rt` due to previous error Error: Process completed with exit code 101. ```
maybe weakly linked default_start_trapx() pointing to the "direct" handler could be something? letting the user hook the vector table directly by overriding them? not sure if it's possible, just brain dumping. would be really interested in something like this though since right now i've just kicked the can down the road and feature gated each of the CPU interrupts so i can have the linker not complain about missing symbols and work on RTIC directly instead in the meanwhile.
It sort of makes sense on a Cortex-M at least since HW interrupts are 1 to 1 with what actually happens on hardware level. In the case of ESP's though, you've got this arbitrary user defined mapping between HW interrupts and CPU interrupts, with the CPU interrupts being what is actually driving the interrupt flow on hardware level, meaning a GPIO interrupt doesn't always move you to the same vector table entry depending on implementation. So i think it would make sense to stray from convention and make everything more transparent in this case, at least as a feature.
These generate handlers for software/hardware interrupts |
after some experimenting i've ended up with this https://github.com/onsdagens/esp32c3-rtic-rt/blob/main/src/lib.rs . |
Oh, I like that: both paying far fewer bytes per function and keeping the whole layout thing out of linker scripts and in the esp-riscv-rt crate seem like wins to me. I do think it trades off the hook-ability/code size a little bit though: e.g. esp-backtrace wants to override the
I think the rust-level equivalent would be more like: #[linkage = "weak"]
#[no_mangle] // or #[export_name = "..."]
extern "C" fn default_handler() {
loop {}
}
#[linkage = "weak"]
#[no_mangle]
extern "C" fn handler1() {
default_handler()
} which would recover the ability to override the default once. It'd still cost us 4*31 bytes or so over the linker script version when no handlers are defined. That looks like a cost baked in to the relative in-expressiveness of the rust linker model (per the discussion around this pre-RFC, which someone might be able to pick back up with our non-GNU example). I wonder how weak linkage interacts with |
I don't think i follow 😅 |
If we told the linker At least to my aesthetic sense, well-scoped opt-in complexity like that which solves a concrete problem is very nice: it's a good way to learn about linker scripts and
I've been musing on these truths that you provided, and I think you're exactly right: in our case, at least, the by-name GPIO binding isn't really helping us much, and I think if we made the cpu interrupt mapping explicit we could probably end up with a pretty good outcome. Especially if/when my |
Now i gotcha, i agree.
Not sure which PR you're referring to. I think this PR warrants something be done about the interrupt enable on HAL level as well, to make the mapping explicit as you said. Right now it's sort of arbitrary which makes sense if you're not hooking the vector table anyway. How are you looking on time? I'd love to see this upstreamed somewhere in the near future if possible, since my RTIC port is pretty much merge ready, only thing missing is all-vanilla crates (only this one is patched at this stage). Otherwise, i'm of course willing to work on this as well. Feel free to DM me on matrix if you want to discuss anything in real-time, i'm @\onsdag in the esp-rs room |
The PR that'd open up some possibilities here is rust-lang/rust#111891 : then we might be able to delete everything related to interrupts from this crate except the vector itself (+/- some linker shenanigans). But, that'll probably take weeks if the rust-lang docs are to be believed. So I'm thinking for the purposes of this PR the goal ought to be enabling the downstream work you & I are trying to do: to me, the shortest path to that sounds like getting the Then, once your generated handlers can successfully be slotted in to the table and I can skip over the whole |
This reverts commit 9b77193.
Previously, the vector table was set up in a pseudo-direct mode where all interrupts were redirected to a single `_start_trap` handler regardless of number. This change introduces 31 new `_start_trapN` targets that, by default, retain the pseudo-direct behavior. However, they're weak symbols, so they allow for individually overriding entries in the _vector_table, as well as overriding the entire _vector_table itself. This is in contrast with the earlier attempt in 9b77193 that jumped straight to the corresponding `extern "C" interruptN`; instead, this defers that work to the HAL and other downstream crates.
Is there a reason why you were using edge interrupts with UART? AFAIK, edge mode is only required for peripherals that don't support level based interrupts (i.e they hold there signal until cleared).
I suppose this most likely stems from the fact that ARM chips have an interrupt handler per peripheral (up to 256 or something like that), all of which are dispatched by the CPU directly. The ARM part is relevant as most of the rust-embedded early days were shaped around ARM (RISCV support was early stages + not much silicon around). In general though, when you're not trying to shave microsecond interrupt latencies down, it is very convenient to have peripheral based interrupt handlers. Before I added it in esp-rs/esp-hal#118, the interrupt examples we not so nice to read; they used the CPU handlers directly and then did register reads to figure out which peripheral bits needed clearing. It also makes writing |
Firstly, this is awesome! I would love to see this land. I wasn't aware of the functionality in LLVM. This could be a huge perf win without costing much if at all on ergonomics/maintainability.
This sounds good to me, I see you've already made the changes so I will test them next week :). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small suggestion, but i really like the way this ended up looking for what it's worth.
@@ -534,7 +597,7 @@ abort: | |||
*/ | |||
|
|||
.section .trap, "ax" | |||
.global _vector_table | |||
.weak _vector_table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.weak _vector_table | |
.weak _vector_table | |
.type _vector_table, @function | |
.option push | |
.balign 0x100 | |
.option norelax | |
.option norvc | |
_vector_table: | |
j _start_trap | |
j _start_trap1 | |
j _start_trap2 | |
j _start_trap3 | |
j _start_trap4 | |
j _start_trap5 | |
j _start_trap6 | |
j _start_trap7 | |
j _start_trap8 | |
j _start_trap9 | |
j _start_trap10 | |
j _start_trap11 | |
j _start_trap12 | |
j _start_trap13 | |
j _start_trap14 | |
j _start_trap15 | |
j _start_trap16 | |
j _start_trap17 | |
j _start_trap18 | |
j _start_trap19 | |
j _start_trap20 | |
j _start_trap21 | |
j _start_trap22 | |
j _start_trap23 | |
j _start_trap24 | |
j _start_trap25 | |
j _start_trap26 | |
j _start_trap27 | |
j _start_trap28 | |
j _start_trap29 | |
j _start_trap30 | |
j _start_trap31 |
My suggestion is replacing the default vector table with one hooking the weakly linked handlers. This should come at no cost (since by default they just point to the generic handler anyway) and gives a much more powerful default implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to tame the review functionality here, so the suggestion is just a gist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, excellent point! I'll fix that right up.
@@ -40,7 +40,7 @@ jobs: | |||
- uses: actions/checkout@v3 | |||
- uses: dtolnay/rust-toolchain@v1 | |||
with: | |||
toolchain: stable | |||
toolchain: nightly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is nightly really still needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, huh: looks like it's not for clippy
(this part). I lean towards keeping it, to line it up with the rustfmt and check builds that both use nightly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't looked, in that case, i agree.
I'd also like to add a feature flag providing simple trap entry hooks so you don't have to provide a context store/restore for each of the overridden handlers, but that's probably irrelevant to what you've done here, so i'll do it in another PR i think. |
Not an on-purpose reason, we got it from the example: https://github.com/esp-rs/esp-hal/blob/171e66e87ad994cc9f8004a42d7ea6461214c20e/esp32c3-hal/examples/serial_interrupts.rs#L73 . That example actually has the same safety issue: if more than 30 bytes come in over the UART between the
Thanks! I was thrilled with how that turned out; I had stumbled across the interrupt clang attribute at some point, and it seems like pretty much a slam dunk for a lot of cases. I think it'll even allow some ergonomics wins for any handler that doesn't need the full trap frame (there's no provision for recovering that from LLVM that I can see). And I appreciate the additional context around peripheral-based interrupt handlers: that PR and the |
Oh, sorry yeah I meant to look into that: I think it does make sense to do it separately, since you've got more context on it at this point. I'm definitely happy to review, though! A feature flag sounds good, though if you wanted to avoid that I think you could try putting it in a |
Unless any of the `.weak` symbols are overridden, this produces the exact same assembly as before. But it permits overriding handlers on a slot-by-slot basis by overriding those `.weak` symbols individually. Co-Authored-By: onsdagens <[email protected]> Co-Authored-By: sethp <[email protected]>
232cdd0
to
d5e5cc9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a bit longer to get around to testing this, sorry about that! LGTM, thanks!
Previously, the vector table was set up in a pseudo-direct mode where all interrupts were redirected to a single
_start_trap
handler regardless of number.This change introduces 31 new
_start_trapN
targets that encode whichinterruptN
function to jump to, at a cost of 180 bytes per function (~5kb total).Not shown (because the linker scripts live in the HAL) are the big block of
PROVIDES
:As a side benefit, this does enable overriding on a per-irq basis and could even allow someone to switch back to the current psuedo-direct mode if they so chose (though, I think they wouldn't see anything get GC'd yet).
Potential improvements include:
default_start_trap
really has a use for the full frame