Replace DirectSound backend with WASAPI #124

nyanpasu64 · 2022-04-14T08:47:31Z

Results

The commits "Write trailing audio at end of each frame" and "Return immediately from CSoundStream::WaitForReady() if room to write audio" both reduce audio stuttering at lower latencies. (The former is no longer necessary, but harmless to keep.)

Audio results at 1ms latency (actually 22ms on Windows):

Quite smooth on Windows.
Mostly smooth on Wine Linux PipeWire nvidia. If you show register state in Dn-FT, this generates considerable Xorg CPU load, and Dn-FT stutters heavily when xorg gets overloaded by switching windows or pressing alt-tab.
- GUI-linked audio stuttering is tracked in Fix stuttering when maximizing other windows with compositing off on Windows 7 #134 and fixed by https://github.com/nyanpasu64/Dn-FamiTracker/tree/rewrite-audio-thread.
Stutters on M1 Mac Wine (probably a defect in Wine's WASAPI-Core Audio backend).

~~On Windows 7, setting a too-low latency results in an audio hang; just don't do that (this is unchanged from DirectSound FT).~~ Upon further testing, it acts like Windows 10 now, not sure what was going on.

The downside with WASAPI is that you lose "switch devices when default device changes", and picking a sampling rate different from the system rate fails to initialize FT's audio. And any attempts to implement device switching are thwarted by "different devices may have different sampling rates". Ideally audio playback would track the system sampling rate, but we'd still use the options rate for .wav export. This is not implemented.

Discussion

the changes to internal sound interfaces make it effectively impossible to slot the old directsound backend in as-is, into wasapi famitracker. It may be possible to keep both (and switch between them) if I change DirectSound to fit into the new interface. But I don't feel like doing so.

~~Port DirectSound to new interface~~
- doesn't help, https://github.com/nyanpasu64/Dn-FamiTracker/tree/dsound-low-latency still has awful stuttering on Wine, and failing at <=20ms is due to emulated DirectSound on WASAPI failing to trip interrupts, not FT's fault

there are probably other things that can be done to improve low latency performance, like generating partial frames of audio at a time as they're being output to distribute CPU usage, rather than generating the whole thing upfront and only writing a bit at a time, which is super bursty in CPU utilization, and every burst will lag

~~Generate partial frames of audio~~ abandoned

Issues

Implement CSoundStream::ClearBuffer()
On Mac Wine, I hear blips of sound when loading different module while playing. Is it related?
- no. CSoundStream::ClearBuffer() is only called on wav export.
- Does it happen with DirectSound? (yes)
Fix CMake build

also now picking a different sampling rate from the OS won't work. apparently wasapi won't resample for you? and i didn't bundle a resampler of my own, and don't feel like doing so

Add resampler(?)
- I dislike. Resamplers add latency, CPU usage, and quality degradation. Well they're occasionally useful for deliberately rendering at a different rate. But there should be a "use system rate" option.
Use system sampling rate instead of picking rate from settings (affects WAV export too?!)
- some people have said that tying WAV export to OS sampling rate is not acceptable.
- It's very bad design for famitracker to fail mysteriously out of the box for half of people with mismatched FT/system sampling rates. But it's hard to make FT only use system rate for system playback, and config rate for wav export.

this is not based off an existing wrapper around wasapi. maybe it would be better that way. but famitracker has idiosyncratic audio design, and would have to be slowly nudged towards playing nicely.

~~Rewrite using an existing multi-backend audio library~~ unlikely

Most audio libraries expose fixed-size periods, which adds latency on top of shared-mode WASAPI's random buffer chunks(?).

might be worth making wasapi optional (also WASAPI doesn't come with auto device switching unless you detect device changes and reinitialize the output stream yourself, possibly even at a different sampling rate!)

also my current wasapi code doesn't report xruns. i couldn't figure out how to ask if the stream ran out of data to play.

Report underruns(?)

I don't know of any way to detect playback buffer underruns in WASAPI. https://stackoverflow.com/q/32112562 has no answer, and mackron/miniaudio#81 couldn't figure it out either.

Wine results

on macos wine, wasapi is almost full speed at 40ms latency, whereas DS is slow-motion

wasapi at 50ms latency is full-speed and almost stable, but hitches every minute or so

wasapi lags qualitatively differently from ds, i feel

wine "boost audio thread priority" flat out does nothing (https://github.com/wine-mirror/wine/blob/1d178982ae5a73b18f367026c8689b56789c39fd/dlls/avrt/main.c#L58)

IAudioClient3: simplified buffering on Windows 10

the hard part is that there are actually two ways to create wasapi shared sessions: IAudioClient (allocate the whole buffer, get write access to random portions of it) and IAudioClient3 (Windows 10+ only, pass in 1/2 the buffer size and WASAPI allocates double the size I think, get write access to half at a time)

and writing code to handle both complicates the code and testing

and supporting the former only (the implementation now) has extra latency/jank, and latter only drops old OS support (and doesn't work on Wine: https://github.com/wine-mirror/wine/blob/master/dlls/wineoss.drv/mmdevdrv.c#L1405-L1411)

~~Implement IAudioClient3 or switch to a library that does so~~ abandoned

[fixed] Visualizer jank

ft's visualizer (CVisualizerWnd::FlushSamples()) is not based off a circular buffer. Instead it processes buffers of the same size as audio pushed to the output. and wasapi has variable output buffer size for god knows what reason.

This result in inconsistent oscilloscope drift, and inconsistent spectrogram windowing artifacts.

tbh the visualizer should've been a circular buffer from the beginning, even without the wasapi rewrite. Either make CVisualizerWnd::FlushSamples() double-buffer its input, or (possibly easier) make m_iGraphBuffer a ring-buffer flushed when full, independently of CSoundStream::WriteBuffer().

Buffer visualizer separately from audio output, and rebase this PR on it
fps is too low (fixed by Move graph data windowing from visualizer thread to audio thread #127)

[fixed] Visualizer corruption

on windows 7, win32 release, wasapi 295df07 is showing random blips on spectrum.

the existing code is wrong.

[visualizer: m_csBufferSelect] CVisualizerWnd::ThreadProc(): pDrawBuffer = (m_pFillBuffer=m_pBuffer1)
[audio: m_csBuffer] delete[] m_pBuffer1, edit m_iBufferSize
[visualizer: m_csBuffer] m_pStates[m_iCurrentState]->SetSampleData(pDrawBuffer, m_iBufferSize);

To reproduce easily, edit CVisualizerWnd::ThreadProc() and add Sleep(50); after m_csBufferSelect.Unlock(); and before m_csBuffer.Lock();.

if this happens on startup, it may index into NULL with a nonzero m_iBufferSize. (I've not been able to get errors to happen on startup though.)

if this happens later on, it may index into a freed array with an altered m_iBufferSize. (This occurs on both master and wasapi, if the sleep is added.)

it's not really much of a problem on DirectSound with equally-sized buffer pieces (only pops up when changing buffer size in settings), but on the WASAPI branch it's definitely a problem since wasapi keeps writing random-sized chunks of audio. (I don't know why it doesn't happen on my main machine, normally, or with 1 or 2 cores of affinity.)

[fixed] Debugging the crash

Loading WASAPI FT with a sampling rate that doesn't match the system rate fails to load audio. Then changing the sampling rate from the wrong to right rate crashes in CChannelHandler code. I suspect this crash is unrelated to WASAPI, and merely exposed by the "no stream running" state most easily caused by WASAPI.

Fix crash ("Fix crash when starting with wrong sampling rate then fixing afterwards")

Windows 7

works. but changing audio settings causes app to stop playing audio for ~3 seconds, then display IDS_SOUND_BUFFER_ERROR (CSoundInterface::OpenChannel() returns nullptr).

Fixed: commit "Move COM initialization from object to GUI/audio threads"

Error messages

CSoundInterface::SetupDevice failed (invalid device) = IDS_SOUND_ERROR
m_pSoundInterface->OpenChannel failed (invalid sampling rate) = IDS_SOUND_BUFFER_ERROR

TODO

Read over code
Change DirectSound error messages to WASAPI or generic
Add error messages for CSoundInterface::OpenChannel -> IDS_SOUND_BUFFER_ERROR (string and HRESULT): IAudioClient::IsFormatSupported() failed (eg. wrong sampling rate)? IAudioClient::Initialize() failed (COM error)?
Change error message to "Incorrect sampling rate?"
Rewrite commit history

nyanpasu64 · 2022-04-15T03:40:54Z

WASAPI stuttering fixed!

Pushed a new commit 2e3aeb9. With this change, I can get nearly stutter-free audio on Windows and Wine, with latency set to 1ms, 10ms, 40ms, 100ms!

Note that latencies below ~22ms actually act like ~22ms!

Why is that? On my computer (with speakers at 48000 hz) WASAPI always allocates 22ms of buffer even if you specify a lower buffer size. This is because WASAPI shared mode actually has a minimum latency floor (since I'm not using IAudioClient3 right now). In fact, the API docs say you're supposed to set latency to 0ms and let WASAPI pick the minimum buffer size (link). In practice, WASAPI clamps all buffer sizes below the minimum up to the minimum.

Fixing DirectSound stuttering?

The old DirectSound backend is built around writing full pages of audio at a time. It may or may not be possible to change it to support partial buffer writes.

Discussion

Commit 2e3aeb9 changes the code to write all remaining audio at the end of every engine frame, even if it's not enough audio to fill the remaining room in the WASAPI buffer. This goes against "audio principles" (mimicking ALSA/jackd's equally-sized periods/pages, and splitting up audio synthesis based off buffer period boundaries), but it works well in practice.

Additionally, shared-mode WASAPI is built for "write what you have" rather than ALSA/jackd's equally-sized buffer pages, since it exposes randomly-sized portions of the buffer for the app to write. Wrapping it in an "equally sized periods" API (which is what RtAudio does) adds extra complexity. The simple approach which adds latency (taken by RtAudio) requires either buffering a full period of data from the callback, handing it out to WASAPI in chunks, and asking the callback for more when it runs out (which adds latency). The PipeWire-like approach (taken by PortAudio? PortAudio/portaudio#303) is waiting (somehow, with a high-resolution timer?) for half of the buffer to be writable before asking the app for data to fill the gap.

Additionally FamiTracker is ill-suited for RtAudio and other callback-based APIs. Rewriting the code to divide work into callbacks is difficult and involves changing audio meter behavior (link). I'm deferring it to a later date (or possibly abandoning it entirely). If I do implement it, it may involve reverting the changes (partial buffer writes) in this commit.

I haven't explored making FT wait for available room before generating audio (which would improve sound latency slightly, but it's good enough as-is).

nyanpasu64 · 2022-05-14T07:00:02Z

can i create MMDeviceEnumerator on one thread and access on another, with sync barriers?
https://github.com/nyanpasu64/Dn-FamiTracker/blob/wasapi/Source/SoundInterface.cpp#L47-L61=

RustAudio/cpal#597

To dive into the detail, the entry point of WASAPI, MMDeviceEnumerator, is registered with "both" threading model, which means that COM objects are created with whatever the thread's concurrency model is set to. This raises the concern that when STA is used, marshaling might make audio buffer operations block on the main thread, breaking continuous audio processing. However, the implementation actually uses free-threaded marshaller for interfaces dealing with buffer operations, which effectively bypasses COM's compatibility marshaling behavior and perform any API calls on the caller's thread instead. Therefore, the interfaces would operate just fine on either concurrency model.

>MMDeviceEnumerator

oh, the same class I'm Sending, so it should be fine

This fixes stuttering at low buffer sizes.

CSoundInterface previously called CoInitializeEx and CoUninitialize in its constructor and destructor. However this failed on Windows 7: On startup or after changing configuration, CSoundInterface::OpenChannel () failed to call IAudioClient::Initialize(), and it instead returned 0x800401f0 CO_E_NOTINITIALIZED. This is because CSoundInterface() called CoInitializeEx() on the GUI thread, while CSoundInterface::OpenChannel() is called on the audio thread. This is a program bug, and Windows 10 and Wine happened to mask this error for some reason. To avoid this issue, call CoInitializeEx and CoUninitialize on the GUI/audio threads' startup/teardown functions (CFamiTrackerApp and CSoundGen::InitInstance/ExitInstance()), rather than the CSoundInterface object's constructor and destructor. Hopefully there aren't problems with accessing COM objects on the wrong thread. It doesn't matter where IMMDeviceEnumerator/ IMMDeviceCollection/IMMDevice are constructed and accessed, since those types are only accessed in startup/teardown(which don't need to be real-time). All real-time logic talks to IAudioClient and IAudioRenderClient, which are created on the audio thread in CSoundInterface::OpenChannel() (and hopefully belong to the audio thread, and if not they use a free-threaded marshaller anyway) and only accessed on the audio thread. COM threading is hard. RustAudio/cpal#597, mozilla/cubeb#534

With this change, the audio thread no longer needs to allocate memory or acquire locks to send graph data to the visualizer thread. Unfortunately, the audio thread still needs to lock the document so the sequencer and driver can read it. This also eliminates visualizer discontinuities which occur during lag (the old code treated a double buffer as a queue).

Now if you set the wrong sampling rate, you get two error dialogs upon startup. I don't really care.

Fixes stuttering on Linux kwin_x11 Wine, except when Xorg is overloaded by Register View repainting. Mac Wine still stutters, but I think that's Wine's fault.

nyanpasu64 force-pushed the wasapi branch 2 times, most recently from 8596c28 to 2e3aeb9 Compare April 15, 2022 03:02

nyanpasu64 force-pushed the wasapi branch 5 times, most recently from 295df07 to 89a86fb Compare April 18, 2022 07:02

nyanpasu64 added 2 commits May 17, 2022 19:02

Rewrite CFDS::m_BlipFDS.set_sample_rate() comment

ffa42be

Rename str_conv::_tstring to tstring

2e4eab1

nyanpasu64 force-pushed the wasapi branch from 115f3d2 to e2da858 Compare May 18, 2022 03:33

nyanpasu64 marked this pull request as ready for review May 18, 2022 03:37

nyanpasu64 changed the title ~~[wip] Replace DirectSound backend with WASAPI~~ Replace DirectSound backend with WASAPI May 18, 2022

nyanpasu64 mentioned this pull request May 18, 2022

Move graph data windowing from visualizer thread to audio thread #127

Closed

nyanpasu64 added 10 commits May 19, 2022 01:59

Rename CDSound to CSoundInterface, CDSoundChannel to CSoundStream

2b4aa14

Cleanup trailing whitespace

5886307

Rewrite CSoundInterface from DirectSound to WASAPI

b093e6f

Disable constant debug prints

492bd5f

Write trailing audio at end of each frame

d753dbc

This fixes stuttering at low buffer sizes.

Fix crash when starting with wrong sampling rate then fixing afterwards

482ba68

Remove extra allocation from TripleBuffer

4b6ebfe

Fix audio thread errors not appearing on top of main window

91de8c9

Now if you set the wrong sampling rate, you get two error dialogs upon startup. I don't really care.

nyanpasu64 force-pushed the wasapi branch from 1e52cd7 to 91de8c9 Compare May 19, 2022 09:14

Return immediately from CSoundStream::WaitForReady() if room to write

63d2e87

Fixes stuttering on Linux kwin_x11 Wine, except when Xorg is overloaded by Register View repainting. Mac Wine still stutters, but I think that's Wine's fault.

nyanpasu64 force-pushed the wasapi branch from 196877b to 63d2e87 Compare May 24, 2022 13:50

nyanpasu64 merged commit 6690fbb into Dn-Programming-Core-Management:master May 24, 2022

nyanpasu64 deleted the wasapi branch May 24, 2022 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace DirectSound backend with WASAPI #124

Replace DirectSound backend with WASAPI #124

nyanpasu64 commented Apr 14, 2022 •

edited

Loading

nyanpasu64 commented Apr 15, 2022 •

edited

Loading

nyanpasu64 commented May 14, 2022

Replace DirectSound backend with WASAPI #124

Replace DirectSound backend with WASAPI #124

Conversation

nyanpasu64 commented Apr 14, 2022 • edited Loading

Results

Discussion

Issues

Wine results

IAudioClient3: simplified buffering on Windows 10

[fixed] Visualizer jank

[fixed] Visualizer corruption

[fixed] Debugging the crash

Windows 7

Error messages

TODO

nyanpasu64 commented Apr 15, 2022 • edited Loading

WASAPI stuttering fixed!

Fixing DirectSound stuttering?

Discussion

nyanpasu64 commented May 14, 2022

nyanpasu64 commented Apr 14, 2022 •

edited

Loading

nyanpasu64 commented Apr 15, 2022 •

edited

Loading