-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace DirectSound backend with WASAPI #124
Replace DirectSound backend with WASAPI #124
Conversation
8596c28
to
2e3aeb9
Compare
WASAPI stuttering fixed!Pushed a new commit 2e3aeb9. With this change, I can get nearly stutter-free audio on Windows and Wine, with latency set to 1ms, 10ms, 40ms, 100ms! Note that latencies below ~22ms actually act like ~22ms! Why is that? On my computer (with speakers at 48000 hz) WASAPI always allocates 22ms of buffer even if you specify a lower buffer size. This is because WASAPI shared mode actually has a minimum latency floor (since I'm not using IAudioClient3 right now). In fact, the API docs say you're supposed to set latency to 0ms and let WASAPI pick the minimum buffer size (link). In practice, WASAPI clamps all buffer sizes below the minimum up to the minimum. Fixing DirectSound stuttering?The old DirectSound backend is built around writing full pages of audio at a time. It may or may not be possible to change it to support partial buffer writes. DiscussionCommit 2e3aeb9 changes the code to write all remaining audio at the end of every engine frame, even if it's not enough audio to fill the remaining room in the WASAPI buffer. This goes against "audio principles" (mimicking ALSA/jackd's equally-sized periods/pages, and splitting up audio synthesis based off buffer period boundaries), but it works well in practice. Additionally, shared-mode WASAPI is built for "write what you have" rather than ALSA/jackd's equally-sized buffer pages, since it exposes randomly-sized portions of the buffer for the app to write. Wrapping it in an "equally sized periods" API (which is what RtAudio does) adds extra complexity. The simple approach which adds latency (taken by RtAudio) requires either buffering a full period of data from the callback, handing it out to WASAPI in chunks, and asking the callback for more when it runs out (which adds latency). The PipeWire-like approach (taken by PortAudio? PortAudio/portaudio#303) is waiting (somehow, with a high-resolution timer?) for half of the buffer to be writable before asking the app for data to fill the gap. Additionally FamiTracker is ill-suited for RtAudio and other callback-based APIs. Rewriting the code to divide work into callbacks is difficult and involves changing audio meter behavior (link). I'm deferring it to a later date (or possibly abandoning it entirely). If I do implement it, it may involve reverting the changes (partial buffer writes) in this commit. I haven't explored making FT wait for available room before generating audio (which would improve sound latency slightly, but it's good enough as-is). |
295df07
to
89a86fb
Compare
can i create MMDeviceEnumerator on one thread and access on another, with sync barriers?
>MMDeviceEnumerator oh, the same class I'm Sending, so it should be fine |
This fixes stuttering at low buffer sizes.
CSoundInterface previously called CoInitializeEx and CoUninitialize in its constructor and destructor. However this failed on Windows 7: On startup or after changing configuration, CSoundInterface::OpenChannel () failed to call IAudioClient::Initialize(), and it instead returned 0x800401f0 CO_E_NOTINITIALIZED. This is because CSoundInterface() called CoInitializeEx() on the GUI thread, while CSoundInterface::OpenChannel() is called on the audio thread. This is a program bug, and Windows 10 and Wine happened to mask this error for some reason. To avoid this issue, call CoInitializeEx and CoUninitialize on the GUI/audio threads' startup/teardown functions (CFamiTrackerApp and CSoundGen::InitInstance/ExitInstance()), rather than the CSoundInterface object's constructor and destructor. Hopefully there aren't problems with accessing COM objects on the wrong thread. It doesn't matter where IMMDeviceEnumerator/ IMMDeviceCollection/IMMDevice are constructed and accessed, since those types are only accessed in startup/teardown(which don't need to be real-time). All real-time logic talks to IAudioClient and IAudioRenderClient, which are created on the audio thread in CSoundInterface::OpenChannel() (and hopefully belong to the audio thread, and if not they use a free-threaded marshaller anyway) and only accessed on the audio thread. COM threading is hard. RustAudio/cpal#597, mozilla/cubeb#534
With this change, the audio thread no longer needs to allocate memory or acquire locks to send graph data to the visualizer thread. Unfortunately, the audio thread still needs to lock the document so the sequencer and driver can read it. This also eliminates visualizer discontinuities which occur during lag (the old code treated a double buffer as a queue).
Now if you set the wrong sampling rate, you get two error dialogs upon startup. I don't really care.
Fixes stuttering on Linux kwin_x11 Wine, except when Xorg is overloaded by Register View repainting. Mac Wine still stutters, but I think that's Wine's fault.
Results
The commits "Write trailing audio at end of each frame" and "Return immediately from CSoundStream::WaitForReady() if room to write audio" both reduce audio stuttering at lower latencies. (The former is no longer necessary, but harmless to keep.)
Audio results at 1ms latency (actually 22ms on Windows):
On Windows 7, setting a too-low latency results in an audio hang; just don't do that (this is unchanged from DirectSound FT).Upon further testing, it acts like Windows 10 now, not sure what was going on.The downside with WASAPI is that you lose "switch devices when default device changes", and picking a sampling rate different from the system rate fails to initialize FT's audio. And any attempts to implement device switching are thwarted by "different devices may have different sampling rates". Ideally audio playback would track the system sampling rate, but we'd still use the options rate for .wav export. This is not implemented.
Discussion
the changes to internal sound interfaces make it effectively impossible to slot the old directsound backend in as-is, into wasapi famitracker. It may be possible to keep both (and switch between them) if I change DirectSound to fit into the new interface. But I don't feel like doing so.
Port DirectSound to new interfacethere are probably other things that can be done to improve low latency performance, like generating partial frames of audio at a time as they're being output to distribute CPU usage, rather than generating the whole thing upfront and only writing a bit at a time, which is super bursty in CPU utilization, and every burst will lag
Generate partial frames of audioabandonedIssues
CSoundStream::ClearBuffer()
CSoundStream::ClearBuffer()
is only called on wav export.also now picking a different sampling rate from the OS won't work. apparently wasapi won't resample for you? and i didn't bundle a resampler of my own, and don't feel like doing so
this is not based off an existing wrapper around wasapi. maybe it would be better that way. but famitracker has idiosyncratic audio design, and would have to be slowly nudged towards playing nicely.
Rewrite using an existing multi-backend audio libraryunlikelyMost audio libraries expose fixed-size periods, which adds latency on top of shared-mode WASAPI's random buffer chunks(?).
might be worth making wasapi optional (also WASAPI doesn't come with auto device switching unless you detect device changes and reinitialize the output stream yourself, possibly even at a different sampling rate!)
also my current wasapi code doesn't report xruns. i couldn't figure out how to ask if the stream ran out of data to play.
I don't know of any way to detect playback buffer underruns in WASAPI. https://stackoverflow.com/q/32112562 has no answer, and mackron/miniaudio#81 couldn't figure it out either.
Wine results
on macos wine, wasapi is almost full speed at 40ms latency, whereas DS is slow-motion
wasapi at 50ms latency is full-speed and almost stable, but hitches every minute or so
wasapi lags qualitatively differently from ds, i feel
wine "boost audio thread priority" flat out does nothing (https://github.com/wine-mirror/wine/blob/1d178982ae5a73b18f367026c8689b56789c39fd/dlls/avrt/main.c#L58)
IAudioClient3: simplified buffering on Windows 10
the hard part is that there are actually two ways to create wasapi shared sessions: IAudioClient (allocate the whole buffer, get write access to random portions of it) and IAudioClient3 (Windows 10+ only, pass in 1/2 the buffer size and WASAPI allocates double the size I think, get write access to half at a time)
and writing code to handle both complicates the code and testing
and supporting the former only (the implementation now) has extra latency/jank, and latter only drops old OS support (and doesn't work on Wine: https://github.com/wine-mirror/wine/blob/master/dlls/wineoss.drv/mmdevdrv.c#L1405-L1411)
Implement IAudioClient3 or switch to a library that does soabandoned[fixed] Visualizer jank
ft's visualizer (
CVisualizerWnd::FlushSamples()
) is not based off a circular buffer. Instead it processes buffers of the same size as audio pushed to the output. and wasapi has variable output buffer size for god knows what reason.This result in inconsistent oscilloscope drift, and inconsistent spectrogram windowing artifacts.
tbh the visualizer should've been a circular buffer from the beginning, even without the wasapi rewrite. Either make
CVisualizerWnd::FlushSamples()
double-buffer its input, or (possibly easier) make m_iGraphBuffer a ring-buffer flushed when full, independently ofCSoundStream::WriteBuffer()
.[fixed] Visualizer corruption
on windows 7, win32 release, wasapi 295df07 is showing random blips on spectrum.
the existing code is wrong.
CVisualizerWnd::ThreadProc(): pDrawBuffer = (m_pFillBuffer=m_pBuffer1)
delete[] m_pBuffer1, edit m_iBufferSize
m_pStates[m_iCurrentState]->SetSampleData(pDrawBuffer, m_iBufferSize);
To reproduce easily, edit
CVisualizerWnd::ThreadProc()
and addSleep(50);
afterm_csBufferSelect.Unlock();
and beforem_csBuffer.Lock();
.if this happens on startup, it may index into NULL with a nonzero m_iBufferSize. (I've not been able to get errors to happen on startup though.)
if this happens later on, it may index into a freed array with an altered m_iBufferSize. (This occurs on both master and wasapi, if the sleep is added.)
it's not really much of a problem on DirectSound with equally-sized buffer pieces (only pops up when changing buffer size in settings), but on the WASAPI branch it's definitely a problem since wasapi keeps writing random-sized chunks of audio. (I don't know why it doesn't happen on my main machine, normally, or with 1 or 2 cores of affinity.)
[fixed] Debugging the crash
Loading WASAPI FT with a sampling rate that doesn't match the system rate fails to load audio. Then changing the sampling rate from the wrong to right rate crashes in CChannelHandler code. I suspect this crash is unrelated to WASAPI, and merely exposed by the "no stream running" state most easily caused by WASAPI.
Windows 7
works. but changing audio settings causes app to stop playing audio for ~3 seconds, then display IDS_SOUND_BUFFER_ERROR (
CSoundInterface::OpenChannel()
returns nullptr).Fixed: commit "Move COM initialization from object to GUI/audio threads"
Error messages
CSoundInterface::SetupDevice
failed (invalid device) =IDS_SOUND_ERROR
m_pSoundInterface->OpenChannel
failed (invalid sampling rate) =IDS_SOUND_BUFFER_ERROR
TODO
CSoundInterface::OpenChannel
->IDS_SOUND_BUFFER_ERROR
(string and HRESULT): IAudioClient::IsFormatSupported() failed (eg. wrong sampling rate)? IAudioClient::Initialize() failed (COM error)?