Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace DirectSound backend with WASAPI #124

Merged
merged 13 commits into from
May 24, 2022

Conversation

nyanpasu64
Copy link
Collaborator

@nyanpasu64 nyanpasu64 commented Apr 14, 2022

Results

The commits "Write trailing audio at end of each frame" and "Return immediately from CSoundStream::WaitForReady() if room to write audio" both reduce audio stuttering at lower latencies. (The former is no longer necessary, but harmless to keep.)

Audio results at 1ms latency (actually 22ms on Windows):

On Windows 7, setting a too-low latency results in an audio hang; just don't do that (this is unchanged from DirectSound FT). Upon further testing, it acts like Windows 10 now, not sure what was going on.

The downside with WASAPI is that you lose "switch devices when default device changes", and picking a sampling rate different from the system rate fails to initialize FT's audio. And any attempts to implement device switching are thwarted by "different devices may have different sampling rates". Ideally audio playback would track the system sampling rate, but we'd still use the options rate for .wav export. This is not implemented.

Discussion

the changes to internal sound interfaces make it effectively impossible to slot the old directsound backend in as-is, into wasapi famitracker. It may be possible to keep both (and switch between them) if I change DirectSound to fit into the new interface. But I don't feel like doing so.

there are probably other things that can be done to improve low latency performance, like generating partial frames of audio at a time as they're being output to distribute CPU usage, rather than generating the whole thing upfront and only writing a bit at a time, which is super bursty in CPU utilization, and every burst will lag

  • Generate partial frames of audio abandoned

Issues

  • Implement CSoundStream::ClearBuffer()
  • On Mac Wine, I hear blips of sound when loading different module while playing. Is it related?
    • no. CSoundStream::ClearBuffer() is only called on wav export.
    • Does it happen with DirectSound? (yes)
  • Fix CMake build

also now picking a different sampling rate from the OS won't work. apparently wasapi won't resample for you? and i didn't bundle a resampler of my own, and don't feel like doing so

  • Add resampler(?)
    • I dislike. Resamplers add latency, CPU usage, and quality degradation. Well they're occasionally useful for deliberately rendering at a different rate. But there should be a "use system rate" option.
  • Use system sampling rate instead of picking rate from settings (affects WAV export too?!)
    • some people have said that tying WAV export to OS sampling rate is not acceptable.
    • It's very bad design for famitracker to fail mysteriously out of the box for half of people with mismatched FT/system sampling rates. But it's hard to make FT only use system rate for system playback, and config rate for wav export.

this is not based off an existing wrapper around wasapi. maybe it would be better that way. but famitracker has idiosyncratic audio design, and would have to be slowly nudged towards playing nicely.

  • Rewrite using an existing multi-backend audio library unlikely

Most audio libraries expose fixed-size periods, which adds latency on top of shared-mode WASAPI's random buffer chunks(?).

might be worth making wasapi optional (also WASAPI doesn't come with auto device switching unless you detect device changes and reinitialize the output stream yourself, possibly even at a different sampling rate!)

also my current wasapi code doesn't report xruns. i couldn't figure out how to ask if the stream ran out of data to play.

  • Report underruns(?)

I don't know of any way to detect playback buffer underruns in WASAPI. https://stackoverflow.com/q/32112562 has no answer, and mackron/miniaudio#81 couldn't figure it out either.

Wine results

on macos wine, wasapi is almost full speed at 40ms latency, whereas DS is slow-motion

wasapi at 50ms latency is full-speed and almost stable, but hitches every minute or so

wasapi lags qualitatively differently from ds, i feel

wine "boost audio thread priority" flat out does nothing (https://github.com/wine-mirror/wine/blob/1d178982ae5a73b18f367026c8689b56789c39fd/dlls/avrt/main.c#L58)

IAudioClient3: simplified buffering on Windows 10

the hard part is that there are actually two ways to create wasapi shared sessions: IAudioClient (allocate the whole buffer, get write access to random portions of it) and IAudioClient3 (Windows 10+ only, pass in 1/2 the buffer size and WASAPI allocates double the size I think, get write access to half at a time)

and writing code to handle both complicates the code and testing

and supporting the former only (the implementation now) has extra latency/jank, and latter only drops old OS support (and doesn't work on Wine: https://github.com/wine-mirror/wine/blob/master/dlls/wineoss.drv/mmdevdrv.c#L1405-L1411)

  • Implement IAudioClient3 or switch to a library that does so abandoned

[fixed] Visualizer jank

ft's visualizer (CVisualizerWnd::FlushSamples()) is not based off a circular buffer. Instead it processes buffers of the same size as audio pushed to the output. and wasapi has variable output buffer size for god knows what reason.

This result in inconsistent oscilloscope drift, and inconsistent spectrogram windowing artifacts.

tbh the visualizer should've been a circular buffer from the beginning, even without the wasapi rewrite. Either make CVisualizerWnd::FlushSamples() double-buffer its input, or (possibly easier) make m_iGraphBuffer a ring-buffer flushed when full, independently of CSoundStream::WriteBuffer().

[fixed] Visualizer corruption

on windows 7, win32 release, wasapi 295df07 is showing random blips on spectrum.

the existing code is wrong.

  • [visualizer: m_csBufferSelect] CVisualizerWnd::ThreadProc(): pDrawBuffer = (m_pFillBuffer=m_pBuffer1)
  • [audio: m_csBuffer] delete[] m_pBuffer1, edit m_iBufferSize
  • [visualizer: m_csBuffer] m_pStates[m_iCurrentState]->SetSampleData(pDrawBuffer, m_iBufferSize);

To reproduce easily, edit CVisualizerWnd::ThreadProc() and add Sleep(50); after m_csBufferSelect.Unlock(); and before m_csBuffer.Lock();.

if this happens on startup, it may index into NULL with a nonzero m_iBufferSize. (I've not been able to get errors to happen on startup though.)

if this happens later on, it may index into a freed array with an altered m_iBufferSize. (This occurs on both master and wasapi, if the sleep is added.)

it's not really much of a problem on DirectSound with equally-sized buffer pieces (only pops up when changing buffer size in settings), but on the WASAPI branch it's definitely a problem since wasapi keeps writing random-sized chunks of audio. (I don't know why it doesn't happen on my main machine, normally, or with 1 or 2 cores of affinity.)

[fixed] Debugging the crash

Loading WASAPI FT with a sampling rate that doesn't match the system rate fails to load audio. Then changing the sampling rate from the wrong to right rate crashes in CChannelHandler code. I suspect this crash is unrelated to WASAPI, and merely exposed by the "no stream running" state most easily caused by WASAPI.

crash stack trace

  • Fix crash ("Fix crash when starting with wrong sampling rate then fixing afterwards")

Windows 7

works. but changing audio settings causes app to stop playing audio for ~3 seconds, then display IDS_SOUND_BUFFER_ERROR (CSoundInterface::OpenChannel() returns nullptr).

Fixed: commit "Move COM initialization from object to GUI/audio threads"

Error messages

CSoundInterface::SetupDevice failed (invalid device) = IDS_SOUND_ERROR
m_pSoundInterface->OpenChannel failed (invalid sampling rate) = IDS_SOUND_BUFFER_ERROR

TODO

  • Read over code
  • Change DirectSound error messages to WASAPI or generic
  • Add error messages for CSoundInterface::OpenChannel -> IDS_SOUND_BUFFER_ERROR (string and HRESULT): IAudioClient::IsFormatSupported() failed (eg. wrong sampling rate)? IAudioClient::Initialize() failed (COM error)?
  • Change error message to "Incorrect sampling rate?"
  • Rewrite commit history

@nyanpasu64 nyanpasu64 force-pushed the wasapi branch 2 times, most recently from 8596c28 to 2e3aeb9 Compare April 15, 2022 03:02
@nyanpasu64
Copy link
Collaborator Author

nyanpasu64 commented Apr 15, 2022

WASAPI stuttering fixed!

Pushed a new commit 2e3aeb9. With this change, I can get nearly stutter-free audio on Windows and Wine, with latency set to 1ms, 10ms, 40ms, 100ms!

Note that latencies below ~22ms actually act like ~22ms!

Why is that? On my computer (with speakers at 48000 hz) WASAPI always allocates 22ms of buffer even if you specify a lower buffer size. This is because WASAPI shared mode actually has a minimum latency floor (since I'm not using IAudioClient3 right now). In fact, the API docs say you're supposed to set latency to 0ms and let WASAPI pick the minimum buffer size (link). In practice, WASAPI clamps all buffer sizes below the minimum up to the minimum.

Fixing DirectSound stuttering?

The old DirectSound backend is built around writing full pages of audio at a time. It may or may not be possible to change it to support partial buffer writes.

Discussion

Commit 2e3aeb9 changes the code to write all remaining audio at the end of every engine frame, even if it's not enough audio to fill the remaining room in the WASAPI buffer. This goes against "audio principles" (mimicking ALSA/jackd's equally-sized periods/pages, and splitting up audio synthesis based off buffer period boundaries), but it works well in practice.

Additionally, shared-mode WASAPI is built for "write what you have" rather than ALSA/jackd's equally-sized buffer pages, since it exposes randomly-sized portions of the buffer for the app to write. Wrapping it in an "equally sized periods" API (which is what RtAudio does) adds extra complexity. The simple approach which adds latency (taken by RtAudio) requires either buffering a full period of data from the callback, handing it out to WASAPI in chunks, and asking the callback for more when it runs out (which adds latency). The PipeWire-like approach (taken by PortAudio? PortAudio/portaudio#303) is waiting (somehow, with a high-resolution timer?) for half of the buffer to be writable before asking the app for data to fill the gap.

Additionally FamiTracker is ill-suited for RtAudio and other callback-based APIs. Rewriting the code to divide work into callbacks is difficult and involves changing audio meter behavior (link). I'm deferring it to a later date (or possibly abandoning it entirely). If I do implement it, it may involve reverting the changes (partial buffer writes) in this commit.

I haven't explored making FT wait for available room before generating audio (which would improve sound latency slightly, but it's good enough as-is).

@nyanpasu64 nyanpasu64 force-pushed the wasapi branch 5 times, most recently from 295df07 to 89a86fb Compare April 18, 2022 07:02
@nyanpasu64
Copy link
Collaborator Author

can i create MMDeviceEnumerator on one thread and access on another, with sync barriers?
https://github.com/nyanpasu64/Dn-FamiTracker/blob/wasapi/Source/SoundInterface.cpp#L47-L61=

RustAudio/cpal#597

To dive into the detail, the entry point of WASAPI, MMDeviceEnumerator, is registered with "both" threading model, which means that COM objects are created with whatever the thread's concurrency model is set to. This raises the concern that when STA is used, marshaling might make audio buffer operations block on the main thread, breaking continuous audio processing. However, the implementation actually uses free-threaded marshaller for interfaces dealing with buffer operations, which effectively bypasses COM's compatibility marshaling behavior and perform any API calls on the caller's thread instead. Therefore, the interfaces would operate just fine on either concurrency model.

>MMDeviceEnumerator

oh, the same class I'm Sending, so it should be fine

@nyanpasu64 nyanpasu64 marked this pull request as ready for review May 18, 2022 03:37
@nyanpasu64 nyanpasu64 changed the title [wip] Replace DirectSound backend with WASAPI Replace DirectSound backend with WASAPI May 18, 2022
nyanpasu64 added 10 commits May 19, 2022 01:59
This fixes stuttering at low buffer sizes.
CSoundInterface previously called CoInitializeEx and CoUninitialize in
its constructor and destructor. However this failed on Windows 7: On
startup or after changing configuration, CSoundInterface::OpenChannel
() failed to call IAudioClient::Initialize(), and it instead returned
0x800401f0 CO_E_NOTINITIALIZED. This is because CSoundInterface()
called CoInitializeEx() on the GUI thread, while
CSoundInterface::OpenChannel() is called on the audio thread. This is a
program bug, and Windows 10 and Wine happened to mask this error for
some reason.

To avoid this issue, call CoInitializeEx and CoUninitialize on the
GUI/audio threads' startup/teardown functions (CFamiTrackerApp and
CSoundGen::InitInstance/ExitInstance()), rather than the
CSoundInterface object's constructor and destructor.

Hopefully there aren't problems with accessing COM objects on the wrong
thread. It doesn't matter where IMMDeviceEnumerator/
IMMDeviceCollection/IMMDevice are constructed and accessed, since those
types are only accessed in startup/teardown(which don't need to be
real-time). All real-time logic talks to IAudioClient and
IAudioRenderClient, which are created on the audio thread in
CSoundInterface::OpenChannel() (and hopefully belong to the audio
thread, and if not they use a free-threaded marshaller anyway) and only
accessed on the audio thread.

COM threading is hard. RustAudio/cpal#597,
mozilla/cubeb#534
With this change, the audio thread no longer needs to allocate memory or
acquire locks to send graph data to the visualizer thread.
Unfortunately, the audio thread still needs to lock the document so the
sequencer and driver can read it.

This also eliminates visualizer discontinuities which occur during lag
(the old code treated a double buffer as a queue).
Now if you set the wrong sampling rate, you get two error dialogs upon
startup. I don't really care.
Fixes stuttering on Linux kwin_x11 Wine, except when Xorg is overloaded
by Register View repainting.

Mac Wine still stutters, but I think that's Wine's fault.
@nyanpasu64 nyanpasu64 merged commit 6690fbb into Dn-Programming-Core-Management:master May 24, 2022
@nyanpasu64 nyanpasu64 deleted the wasapi branch May 24, 2022 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant