-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration testing harness #2359
Conversation
helix-term/src/application.rs
Outdated
pub async fn event_loop_until_idle<S>(&mut self, input_stream: &mut S) -> bool | ||
where | ||
S: Stream<Item = crossterm::Result<crossterm::event::Event>> + Unpin, | ||
{ | ||
loop { | ||
if self.editor.should_close() { | ||
return false; | ||
} | ||
|
||
use futures_util::StreamExt; | ||
|
||
tokio::select! { | ||
biased; | ||
|
||
event = reader.next() => { | ||
self.handle_terminal_events(event) | ||
Some(event) = input_stream.next() => { | ||
self.handle_terminal_events(event); | ||
self.editor.reset_idle_timer(); | ||
} | ||
Some(signal) = self.signals.next() => { | ||
self.handle_signals(signal).await; | ||
self.editor.reset_idle_timer(); | ||
} | ||
Some((id, call)) = self.editor.language_servers.incoming.next() => { | ||
self.handle_language_server_message(call, id).await; | ||
// limit render calls for fast language server messages | ||
let last = self.editor.language_servers.incoming.is_empty(); | ||
if last || last_render.elapsed() > deadline { | ||
|
||
if last || self.last_render.elapsed() > LSP_DEADLINE { | ||
self.render(); | ||
last_render = Instant::now(); | ||
self.last_render = Instant::now(); | ||
} | ||
|
||
self.editor.reset_idle_timer(); | ||
} | ||
Some(payload) = self.editor.debugger_events.next() => { | ||
let needs_render = self.editor.handle_debugger_message(payload).await; | ||
if needs_render { | ||
self.render(); | ||
} | ||
self.editor.reset_idle_timer(); | ||
} | ||
Some(config_event) = self.editor.config_events.1.recv() => { | ||
self.handle_config_events(config_event); | ||
self.render(); | ||
self.editor.reset_idle_timer(); | ||
} | ||
Some(callback) = self.jobs.futures.next() => { | ||
self.jobs.handle_callback(&mut self.editor, &mut self.compositor, callback); | ||
self.render(); | ||
self.editor.reset_idle_timer(); | ||
} | ||
Some(callback) = self.jobs.wait_futures.next() => { | ||
self.jobs.handle_callback(&mut self.editor, &mut self.compositor, callback); | ||
self.render(); | ||
self.editor.reset_idle_timer(); | ||
} | ||
_ = &mut self.editor.idle_timer => { | ||
// idle timeout | ||
self.editor.clear_idle_timer(); | ||
self.handle_idle_timeout(); | ||
return true; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only had the time for a brief look, but I don't like the event loop changes.
- It adds an extra async function call to what was before a single loop (that's very frequently called)
- Events & callbacks should not interact with the idle timer because it's intended to only be affected by keypresses. Continually resetting the idle timer on LSP events for example will ensure the timer never triggers for the first 10-20s while rust-analyzer boots up and continually spews log messages.
Alternate frontends will re-implement the application loop themselves, so maybe the solution is to treat the integration tests as a separate frontend? i.e. define a separate test::Application
that has no rendering and declares it's own loop. I started working on this previously by moving the handle_debugger_message
onto helix-view
/Editor
. We'd need to do the same for handle_language_server_message
so we'd be able to share most of the code.
(Also, I move commands & parts of UI to helix-view on this branch: https://github.com/helix-editor/helix/tree/gui)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only had the time for a brief look, but I don't like the event loop changes.
- It adds an extra async function call to what was before a single loop (that's very frequently called)
Could you please elaborate on what your concern is about this? Is it performance? Aren't we just talking about an extra jump that happens once every few hundred milliseconds in the worst case?
- Events & callbacks should not interact with the idle timer because it's intended to only be affected by keypresses. Continually resetting the idle timer on LSP events for example will ensure the timer never triggers for the first 10-20s while rust-analyzer boots up and continually spews log messages.
I believe this is already the case. It's a biased select, so it will handle key presses, signals, LSP, etc, all before it polls the idle timer. I believe the only practical difference this makes is adding a few hundred ms once that initialization finishes.
Also, even if this were not the case, the LSP would not be able to serve completions until it's fully initialized anyway.
Perhaps we could make the reset calls conditional with the integration feature?
Alternate frontends will re-implement the application loop themselves, so maybe the solution is to treat the integration tests as a separate frontend? i.e. define a separate
test::Application
that has no rendering and declares it's own loop.
This was my first thought when I ran into this problem as well, but then I considered that we want to be able to test all features, and I realized we would just be copy/pasting the entire loop with a few small tweaks. And this kind of defeats the point of integration testing, which is to test the very same code that the user will be running.
We want to be able to test things like :wq
saves the file and exits the Application, in that order. Or doesn't exit when there's an error saving.
And we don't want to have to worry about updating the testing harness's event loop every time we touch the term Application's event loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@archseer I consolidated the reset calls into a single place and put it behind the integration feature flag. This way, non test builds will see no difference in behavior; and additionally, since key presses are the only thing that triggers idle timer resets, this reduces the frequency that the call stack exits and reenters the inner event loop to once every idle timer fire, which should make the performance impact negligible. How to you feel about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replying to @archseer's message from Matrix:
right, but that's the loop that the terminal app will be running (with the rendering disabled anyway). What about the UI app etc? Are we going to start running integration tests per frontend?
For me, significantly shrinking application.rs to a small event loop + platform specific stuff, then having a custom loop is a reasonable tradeoff. We're primarily interested in covering commands & how they interact with the editor state
This makes a lot of sense; however, I think if the goal is to provide a sensible abstraction for different frontends, this is already an orthogonal problem. The Application event loop is driving a whole bunch of stuff that should be frontend-agnostic, like config changes, LSP message handling, debug events, and file writes. As it is, a GUI frontend would also need to copy/paste a lot of the event loop.
What you're saying makes sense, but I think fixing that would require a larger refactor to better enable different frontends, and might be outside the scope of this PR.
I wish we could somehow force inline https://github.com/helix-editor/helix/pull/2359/files#diff-d9796de334de9f547d45fd8b7476be53c771200ab48f28266e5f62e3493b5df9R252= to avoid more async state generated by the compiler
If it's really that concerning, maybe the event loop logic could be made into a macro? But does the extra await really cause that much trouble?
8fc5075
to
22a3b33
Compare
3292fc2
to
074c4ed
Compare
Can you give this a quick rebase? It should fix the the tests |
Sure thing, done. Also, I mentioned this in the chat room, but if you'd like, I could take some extra time to refactor at least the main event loop to make all the events happen behind a single function, somewhat similar to how I had to put all the editor events behind one function in the write path PR. That was due to borrowing restrictions, but it could be extended to all events. That way, it would be more reasonable to make a separate test Application object. I'd be happy to do that if that would help decouple the integration tests a bit from the term Application. |
I'm okay with that happening in a follow-up, this PR is already quite large and it's useful as a standalone. |
Ok cool. Any thoughts on how to handle the feature flag? There are two things I could think of that I wrote at the bottom of the PR description. If merged as is, users would have to explicitly opt in to integration tests. |
I would add an |
|
||
#[cfg(not(feature = "integration"))] | ||
use tui::backend::CrosstermBackend; | ||
|
||
#[cfg(not(feature = "integration"))] | ||
use std::io::stdout; | ||
use tui::backend::{Backend, CrosstermBackend}; | ||
|
||
#[cfg(not(feature = "integration"))] | ||
type Terminal = tui::terminal::Terminal<CrosstermBackend<std::io::Stdout>>; | ||
|
||
#[cfg(feature = "integration")] | ||
type Terminal = tui::terminal::Terminal<TestBackend>; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting. Would that goal be to get rid of the feature flags? I think we'd still have to have them, since we're still building an Application.
Or are you just asking me to do it to get the code change into master?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's part of an upcoming refactor and I think it would simplify the amount of cfg
clauses you need
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I tried, but the code seems to be built on top of other bigger breaking API changes, like adding a new RenderContext
. It's difficult to take just those changes without the rest of the branch.
- name: Copy minimal languages config | ||
run: cp .github/workflows/languages.toml ./languages.toml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need the full languages config for integration tests
Do we? I'd list any languages we test with but avoid fetching uncommon grammars that won't ever be used in tests, that way we're not waiting for 50+ grammars to build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can do that. It just means that a contributor will have to remember to add a language if they want to add tests for it. And it will be a bit confusing because their tests will pass locally for them, but fail on GitHub, and for not obvious reasons. It was failing for me on the C indentation test, but the failure was the wrong number of spaces. It took a while to figure out it was because of the grammar missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, maybe we should include languages.toml
in the test directory, then conditionally compile using that languages.toml over the one in the root dir if running tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I just tried implementing this, but I just realized, the feature flag would help if we were only running one test command, but if we separate regular unit tests from integration tests into separate phases, then I'm not sure how to detect if we're running unit tests without the integration feature flag.
636d6a0
to
01eb375
Compare
* Use new macro syntax for encoding sequences of keys * Make convenience helpers for common test pattern * Use indoc for inline indented raw strings * Add feature flag for integration testing to disable rendering
When a new View of a Document is created, a default cursor of 0, 0 is created, and it does not get normalized to a single width cursor until at least one movement of the cursor happens. This appears to have no practical negative effect that I could find, but it makes tests difficult to work with, since the initial selection is not what you expect it to be. This changes the initial selection of a new View to be the width of the first grapheme in the text.
Use the Application's main event loop to allow LSP, file writes, etc
During write-quit, if the file fails to be written for any reason, helix will still quit without saving the changes. This fixes this behavior by introducing fallibility to the asynchronous job queues. This will also benefit all contexts which may depend on these job queues. Fixes helix-editor#1575
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get this merged! The other pending comments can always be resolved later. Great work! 🎉
I was already working on an integration testing branch in the background for a little while, and I thought since this is such a critical code path, it warrants introducing the integration testing harness at the same time as #2267. This is very much a "just getting the ball rolling" approach, and anything is up for discussion. I fully expect the integration test framework to change over time.
Some things to note about this implementation:
In order to facilitate easier testing, I moved the toml config parsing back into the main function. This undoes the move that was done in Add support for local language configuration #1249.
The integration tests require shutting off actual rendering, so to do this, a feature branch was introduced to disable rendering for the integration tests. Tests are run by doing
cargo test --features integration
We need some way to run the Application with some starting document and input keys, and then inspect the internal state while the Application is still running. We also need to run the actual main event loop in order to handle more things than just the key press event—we need to drive document writes, config changes, and eventually LSP messages, if and when any tests are added for these things. The problem is the main event loop doesn't exit until all documents are closed, and then it discards a lot of internal state.
So in order to inspect internal state, I first just did this with a simple timeout, but it quickly became clear that this is problematic, as the amount of time that a test finishes can differ dramatically, and it can lead to non-determinism: if the host system becomes busy for some reason, tests fail because they time out too soon, even when there isn't a problem. Additionally, we can't just use very long timeouts as this would artificially make it take a very long time for each integration test to finish, even if the actual work they did was done very quickly.
So I came up with the idea that we could piggy-back on the idle timer: since the event loop is using a biased event loop, and we submit all key strokes ahead of actually running the Application, the idle timer should only fire when every key has been handled. To implement this, the event loop was split into one which exits after a successful idle timer handle, and the main call site simply loops over this as well. Additionally, the idle timer is reset every time any event gets handled successfully. This lets the application signal when it is done processing the test key strokes, and leaves the internal state open for inspection.
Another bug was discovered along the way while writing these integration tests: when a document is first opened, the cursor has the initial state of (0, 0), and is not updated to reflect the document's contents until the first cursor movement. This led to some weird bugs where the selection did not end up in the right direction after an edit. As far as I have been able to find, this does not cause any bugs in the actual text transformation, so it's likely this did not cause any real harm to users, but nonetheless, it was making testing very confusing until it was fixed. Now, if there is text in the opened document, the cursor is placed over the whole first grapheme cluster.
This PR was written along with #2267, so there are tests that pertain to that PR, but I left them in and ignored them, for purposes of demonstrating different kinds of tests. I'll unignore them along with that PR.
Open questions
test
to dotest --features integration
? The latter would help by not requiring contributors to remember that you have to do this to run the integration tests, but has the issue that eventually, we may have tests that depend upon external state, such as an LSP on the build system. Perhaps there could be an xtask or something? I'm open to suggestions.