-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Glitching in WASAPI in shared mode #303
Comments
Some observations:
I made a git bisect to determine which commit caused the gliching behavior in WASAPI. The breaking commit commited on Sep 6 2018 is: edit Added recording observation |
I think I have found evidence that this is related to sleep timing setup in pa_win_wasapi.c implements The change that cause WASAPI to start glitching is that the sleep times returned from portaudio/src/hostapi/wasapi/pa_win_wasapi.c Line 855 in d7a4cb4
While this does not work and produces glitching: portaudio/src/hostapi/wasapi/pa_win_wasapi.c Line 896 in c3fee03
My test setup runs at 48kHz, with
Observations:
|
I have implemented this workaround in my own portaudio build which I use from my Python application and it works fine: In essence it reverts the thread sleep delay to the same values as in the stable version of PA. |
@sveinse , thank you for raising this issue. In my setup I do not experience any glitching but may be it happens when latency is the minimal for the Shared Mode, e.g. as in your case - 21ms (glitch), or there could be a specific behavior of the underlying audio drivers. Your workaround for |
@dmitrykos I created a PR #305 with this fix. I'm not sure what makes one system (like yours) not glitch and others is glitching a lot. With the old solution it does not matter what the Yet, I'm not completely convinced that this fix is complete. I ran a series of tests after the PR and I still experience gliching in some ranges of
In my system any |
@sveinse I merged the fix as a quick solution for now. Although it fixes the problem you observed, we still need to find the reason of the underruns/glitches. The line: sleep_ms_out = GetFramesSleepTime(stream->out.framesPerBuffer, stream->out.wavex.Format.nSamplesPerSec); will cause sleeps for a duration of the user buffer, e.g. we will always sleep less than the duration of the host (WASAPI's side) buffer which size is reflected in // Sleep for half the buffer duration.
Sleep((DWORD)(hnsActualDuration/REFTIMES_PER_MILLISEC/2)); so our implementation (before this fix) had to be stable in terms of absence of the underruns. In your failed tests with this fix we will sleep for:
and it does not provide any clue why glitches still exist because the host's buffer duration is 22 ms, so we have plenty of time. Could it be due to a jitter of the system timer/Sleep? |
@sveinse, would you please make an experiment and replace line 3859 to such: bufferMode = paUtilBoundedHostBufferSize; and test with Let's check if we do not have any problem inside the rendering loop, specifically this area starting at line 5934: if (stream->bufferMode == paUtilFixedHostBufferSize)
{
while (frames >= stream->out.framesPerBuffer)
{
if ((hr = ProcessOutputBuffer(stream, processor, stream->out.framesPerBuffer)) != S_OK)
{
LogHostError(hr);
goto thread_error;
}
frames -= stream->out.framesPerBuffer;
}
} Here I got worried that if this condition |
@dmitrykos setting BTW I'm monitoring and printing if any over- or underrun flags are reported to the user callback function, and they are not flagged in any of the glitches. So either WASAPI is oblivious to the underruns or its missing an implementation of reporting this to the user callback function. I haven't checked what it does. |
@sveinse, thank you for a test! Even if stability is improved but there are glitches left then What if you make another experiment and hardcode |
Given the apparent host native size of 1056, I tried to hardcode In my post above, the table seems to hint that usec sleep is what resolves the glitching when edit FYI: It seem the host frame size of 1056 is a constant on my system. All devices, from HDMI to built-in sound cards to external USB headsets, all report the same host frame size. It doesn't change when the sample rate config is changed in the control panel either. |
As for the if (stream->bufferMode == paUtilFixedHostBufferSize)
{
UINT32 count = 0;
while (frames >= stream->out.framesPerBuffer)
{
count++;
if ((hr = ProcessOutputBuffer(stream, processor, stream->out.framesPerBuffer)) != S_OK)
{
LogHostError(hr);
goto thread_error;
}
frames -= stream->out.framesPerBuffer;
}
printf("%d,", count);
} (Don't flog me for the printf() within real-time-ish code, but it does work sufficiently to highlight the results :D)
So it definitely hits the while loop without enough data available, resulting in no callbacks to
edit I made an additional test to see if getting no iterations in the while-loop was the cause for the glitches. And it's not the sole cause. The glitching occurs even if all iterations pass through the while loop. E.g. setting |
Thank you both for your work. Non-reported glitches is my biggest concern about using the current WASAPI driver in portaudio. |
I've looked a little bit at the code now, and I'm probably wrong, or maybe not, but as far as I can see, the audio processing itself (called in the function "WaspiHostProcessingLoop") happens in the same thread as the one that sleeps (the function "ProcThreadPoll"). Doesn't this mean that when you spend half the time sleeping you also can't use more than 50% CPU to process audio? |
@sveinse thank you for your experiment. To my view we can have underruns happening here:
e.g. after having no frames to process on the next cycle we get 2 buffers to fill (1056 / 528 = 2) which are equal to a full host buffer and it is a problem, I would rather expect sequence I did some more experiments and tried to make busy waiting once we get 0 like this (starts at line 5981): UINT32 next_sleep = sleep_ms;
UINT32 repeat = 0;
// Processing Loop
while (WaitForSingleObject(stream->hCloseRequest, next_sleep) == WAIT_TIMEOUT)
{
for (i = 0; i < S_COUNT; ++i)
{
// Process S_INPUT/S_OUTPUT
switch (i)
{
// Input stream
case S_INPUT: {
if (stream->captureClient == NULL)
break;
if ((hr = ProcessInputBuffer(stream, processor)) != S_OK)
{
LogHostError(hr);
goto thread_error;
}
break; }
// Output stream
case S_OUTPUT: {
UINT32 frames;
if (stream->renderClient == NULL)
break;
// Get available frames
if ((hr = _PollGetOutputFramesAvailable(stream, &frames)) != S_OK)
{
LogHostError(hr);
goto thread_error;
}
// Output data to the user callback
if (stream->bufferMode == paUtilFixedHostBufferSize)
{
UINT32 count = 0;
while (frames >= stream->out.framesPerBuffer)
{
count++;
if ((hr = ProcessOutputBuffer(stream, processor, stream->out.framesPerBuffer)) != S_OK)
{
LogHostError(hr);
goto thread_error;
}
frames -= stream->out.framesPerBuffer;
}
if (count == 0)
{
repeat++;
next_sleep = 0;
continue;
}
else
{
//printf("%d,", repeat);
repeat = 0;
}
}
else
if (frames != 0)
{
if ((hr = ProcessOutputBuffer(stream, processor, frames)) != S_OK)
{
LogHostError(hr);
goto thread_error;
}
}
break; }
}
}
// Get next sleep time
if (sleep_ms == 0)
next_sleep = ThreadIdleScheduler_NextSleep(&scheduler);
else
next_sleep = sleep_ms;
} Explanation: Once we get 0 we should not sleep again (may be it was a corner case and buffer is already ready) as it can result in the underrun and would rather do a short busy waiting by clearing framesPerUser[ 300 ] framesPerHost[ 1056 ] latency[ 22.00ms ] exclusive[ NO ] wow64_fix[ NO ] mode[ POLL ]
// 1,2,2,2,2,2,2,2,1,2,1,2,1,1,2 Could you please replace your implementation with this piece of code and check if it improves anything on your machine. |
@kmatheussen, your assumption is correct but we need to get some space in WASAPI's host buffer to write into it and thus code must sleep/wait for some piece of the host buffer to become available for writing into it. To solve the problem you are mentioning there should be a 2-nd audio rendering thread that is doing audio processing on the app's side, it would write to the own temporary buffer/-s and once WASAPI's host buffer is available the PA's callback function would simply copy ready audio samples prepared by the 2-nd audio processing thread. With such approach you can utilize CPU effectively but it is outside the PA's scope and rather a design of the application which is using PA as audio API. |
@kmatheussen, @dmitrykos , with respect to amount of wall-time for user processing/callback of audio: How is this implemented for other host API? Do they also sleep equal amounts? I guess, from a principle point of view, this is the price of polling. One have to wait, check, wait and so on. You will lose granularity on this. The other alternative is to wake up the thread when there is data available -- which is what the event-based variant is all about, right? Could the sleep be dynamic? In principle the outer loop is aware of when data should be available (being more or less monotonic with respect to wall-time). Could the sleep after user callback be dynamic in the sense that it sleeps the remainder of the timeslot rather than a fixed pre-calculated amount? |
I think so. We could calculate how much time does |
@sveinse I quickly made the change you proposed that subtracts the time spent in the processing from the INT32 next_sleep = sleep_ms;
UINT32 repeat = 0;
// Processing Loop
while (WaitForSingleObject(stream->hCloseRequest, next_sleep) == WAIT_TIMEOUT)
{
double start_time = PaUtil_GetTime();
for (i = 0; i < S_COUNT; ++i)
{
// Process S_INPUT/S_OUTPUT
switch (i)
{
// Input stream
case S_INPUT: {
if (stream->captureClient == NULL)
break;
if ((hr = ProcessInputBuffer(stream, processor)) != S_OK)
{
LogHostError(hr);
goto thread_error;
}
break; }
// Output stream
case S_OUTPUT: {
UINT32 frames;
if (stream->renderClient == NULL)
break;
// Get available frames
if ((hr = _PollGetOutputFramesAvailable(stream, &frames)) != S_OK)
{
LogHostError(hr);
goto thread_error;
}
// Output data to the user callback
if (stream->bufferMode == paUtilFixedHostBufferSize)
{
UINT32 count = 0;
while (frames >= stream->out.framesPerBuffer)
{
count++;
if ((hr = ProcessOutputBuffer(stream, processor, stream->out.framesPerBuffer)) != S_OK)
{
LogHostError(hr);
goto thread_error;
}
frames -= stream->out.framesPerBuffer;
}
if (count == 0)
{
repeat++;
next_sleep = 0;
continue;
}
else
{
//printf("%d,", repeat);
repeat = 0;
}
}
else
if (frames != 0)
{
if ((hr = ProcessOutputBuffer(stream, processor, frames)) != S_OK)
{
LogHostError(hr);
goto thread_error;
}
}
break; }
}
}
// Get next sleep time
if (sleep_ms == 0)
next_sleep = ThreadIdleScheduler_NextSleep(&scheduler);
else
next_sleep = sleep_ms;
// Update next sleep time dynamically depending on how much time we spent in processing
next_sleep -= (INT32)((PaUtil_GetTime() - start_time) * 1000);
if (next_sleep < 0)
next_sleep = 0;
} |
@dmitrykos I quickly tested them both (with and without dynamic sleep), and none of them resolves the glitching issues, unfortunately. I'm curious if its my HW environment that make this behave differently from your, or if its related to the type of callback system I use (i.e. python). It would be really great if you could download the standalone executable https://github.com/sveinse/elns-release/releases/download/v1.1-b3/elns-1.1-b3-standalone.exe and test if WASAPI glitches in your system. Select "WASAPI" in audio system, find a suitable wav or flac file as input and select an appropriate output audio device, select "Passthrough" function and then press play. (I hope you're not set off by my executable, its an application I author and maintain. Is a py application including bundled portaudio binaries.) |
@sveinse I tested your app and can hear glitches on all latencies except 128. I used 1 kHz sine in WAV as it is very easy to hear glitches. Nevertheless the work that was done above is anyway not useless and I tend to include it into the actual implementation:
In my tests I noticed that with the fix you proposed |
@dmitrykos I made a brute force test by setting This tells me this seems to be purely timing related. Anyways, I hoped this could be useful input. |
(continuing disussion from #305 here) |
@dmitrykos You wrote: "you could do audio rendering not inside the PA's thread but in some other thread which would deliver ready buffers/data into the PA's callback thread" Are you serious? Of course you have to do processing in a separate thread if your own code doesn't behave in a proper RT manner, but you should definitely not have to do it because PA doesn't behave in a proper RT manner. |
(Wrong line.
sleepMsecOut = GetFramesSleepTime(stream->out.framesPerBuffer,
stream->out.wavex.Format.nSamplesPerSec);
)
|
We are sleeping for the duration of the user buffer which is always at least 2x times smaller than host (WASAPI) buffer, if user specifies the desired latency the host buffer will be of the size of the requested latency. There is no mistake in the implementation.
On every cycle we get number of available frames with |
@dmitrykos Sorry for the delay, I got put off by the rebase in winrt and had to remerge my changes (I have to patch msvc build files to be able to build it). Running the PR #307 as is, with
You mentoned you'd like to test my app on your HW. I created this special build of ELNS using the portaudio version from this PR. It can be downloaded from https://github.com/sveinse/elns-portaudio-test/releases/tag/1. It has console output enabled, so running it from a command-prompt will output the WASAPI logging. This test version loads the input file into memory prior to playback to rule out any timing issues internally. I test by pressing play and stop rather quickly. What is interesting to note is that it never glitches on rapid pause play operations. Perhaps that suggests that this is related to the initial startup of a stream? |
@dmitrykos btw, to test your statement in PR #307 "...time hungry user-side processing", I added the following code: LARGE_INTEGER a, b;
QueryPerformanceCounter(&b);
processor[S_OUTPUT].processor(NULL, 0, data, frames, processor[S_OUTPUT].userData);
QueryPerformanceCounter(&a);
printf("_%lld_,", a.QuadPart-b.QuadPart);
This one is a glitch on startup. It can seem like the user callback spends much longer time in the first invocation, but the number are very variable. They are equally variable and high on playbacks that doesn't glitch and even with lower buffer sizes. I plan to collect a better histogram of these, but just wanted to mention that I'm looking into if the root cause to the startup glitching is excessive user callback times. |
On Thu, Oct 1, 2020 at 8:03 PM Dmitry Kostjuchenko ***@***.***> wrote:
@kmatheussen <https://github.com/kmatheussen>
It looks like you are sleeping the entire buffer.
We are sleeping for the duration of the user buffer which is always at
least 2x times smaller than host (WASAPI) buffer, if user specifies the
desired latency the host buffer will be of the size of the requested
latency. There is no mistake in the implementation.
Thanks for clearing that up Dmitry. I suggest adding this line "// the user
buffer which is always at least 2x times smaller than host (WASAPI) buffer"
to the code.
|
@sveinse thank you for a test and I am glad that we managed to get to a stable (almost) version.
I probably misunderstood but I could not test your application unfortunately as it throws Starting
Using input device index number: -1
Using output device index number: 4
WASAPI: IAudioClient2 set properties: IsOffload = 0, Category = 0, Options = 0
WASAPI ERROR HRESULT: 0x88890008 : AUDCLNT_E_UNSUPPORTED_FORMAT
[FUNCTION: CreateAudioClient FILE: \elns-wasapi\portaudio-pr307\src\hostapi\wasapi\pa_win_wasapi.c {LINE: 3132}]
WASAPI ERROR PAERROR: -9997 : Invalid sample rate
[FUNCTION: ActivateAudioClientOutput FILE: \elns-wasapi\portaudio-pr307\src\hostapi\wasapi\pa_win_wasapi.c {LINE: 3449}]
WASAPI ERROR PAERROR: -9996 : Invalid device
[FUNCTION: OpenStream FILE: \elns-wasapi\portaudio-pr307\src\hostapi\wasapi\pa_win_wasapi.c {LINE: 2143}]
An error occured while using the portaudio stream
Error number: -9996
Error message: Invalid device
Traceback:
File "elns\core.py", line 797, in tick_step
File "elns\audio.py", line 556, in open
File "pyaudio.py", line 754, in open
File "pyaudio.py", line 445, in __init__
OSError: [Errno -9996] Invalid device
ERROR: Opening audio device failed. OSError: [Errno -9996] Invalid device |
@dmitrykos This is actually another bug (or feature) in WASAPI. The root cause is that you're most probably trying to play an audio file with a different sample rate than the Windows audio device is configured for (48k). In this version I do not use the |
@sveinse yes, you were right, matching sample rate made it work (I tried to match them yesterday but may be something went wrong) and I went through all the latencies. I could hear some very occasional glitching when using 128 ms but you enabled logging with
It is not a bug but how WASAPI works in Shared mode, Microsoft decided to pass the resampling task to the application level. Can we conclude that we managed to reach stability and current implementation shall be merged, or you wish to make some other experiments first? |
@dmitrykos Please stand by, I'm going to run some test now without printf.
The sample rate matching error is correct. That's not what i meant. I meant that WASAPI does identify this being a sample rate error (the -9997 error), but it later becomes a -9996 invalid device error, where the latter is the error reported back to the user layer. It would be better if the error was -9997 Invalid sample rate. -- That said, I plan to query for compatible format before starting the stream in the next version, so I can catch this and inform the user in a more useful manner. Even give the user the option to use autoconvert. |
@sveinse I see, sorry for confusion. I checked the related code and it is actually a bug in the implementation at line 3842 |
This is a great technical discussion. But the thread is very long. So when this issue is closed, could someone please add a very short summary paragraph describing the issue and the final solution. That will help others in the future. Thanks. |
@dmitrykos I rebuild portaudio to release with no printfs. Unfortunately the glitching does not go away The test results are as follows: Devices:
Solo = started without any audio running on that device. shared = another audio playback running on the same device
edit PS! None of the other hostapis (DS, MME and WDM-KS) glitches with the same setup |
@sveinse thank you for a very detailed test! Did it become worse when you do not define According to the time slots logging results, on application/PA side all seems to be fine and we never exceed 1/2 of the host buffer latency, so I am puzzled. edit: |
I changed multiple variables in the test above (no printf and Release build), so I had to rerun the tests. Setting
Is it Windows that require a larger buffer fill in the first invocations to be able to handle internal warm-up? I am considering logging/tracing the absolute time when e.g. The other worry I have is that these glitches are unobserved. There is no flag being raised and sent back to the user indicating the glitch/underrun. That bothers me.
I understand your point, but at the same time I think its not good if the WASAPI host API underperforms compared to the other host APIs. So my opinion and objective is that we need to find a config and method that works reliably for the users. If that mode is Event mode, that's fine, except the current Event mode is unavailable except for Exclusive mode. Exclusive mode, which use Event, is rock solid on all frame size I can throw at it, even with very low values such as 12 samples. One question: In shared mode I see that the host buffer sizes orient around 1056 samples in my system. I cannot find any such "converging" values in exclusive mode. It seems to want to use a host buffer size a couple of hundred samples bigger than the user buffer size. Why is that? |
There are no docs related to how many frames should be preloaded to guarantee stable startup but there is headroom for improvement by filling whole host buffer with user data before stream is started: you could modify block starting at line line 5992 to such: while (frames >= stream->out.framesPerBuffer)
{
if ((hr = ProcessOutputBuffer(stream, processor, stream->out.framesPerBuffer)) != S_OK)
{
LogHostError(hr); // not fatal, just log
break;
}
frames -= stream->out.framesPerBuffer;
}
Yes, it may be the case that problem is somewhere inside the WASAPI's mixer/buffer handling because the time slots logging showed that no underruns happening on our side and polling time is always smaller than 1/2 of the host buffer. This problem is happening when the lowest (22 ms) possible host buffer is used in Shared mode. In your tests you got stable playback when the host buffer has 44 ms latency (2112).
I agree but here to my view we can not compare performance 1 to 1, for example using your application and selecting DS as host API I can see in logs that DS latency is much higher than what we are trying to achieve with WASAPI:
e.g. it is 125 ms when I selected latency 256. MME's latency will be high too and WDMKS is equivalent of WASAPI's Exclusive mode. On contrary in case of WASAPI host API we are trying to play with 22 ms latency in Shared mode. May be it is too low for Shared mode while 44 ms becomes stable which is almost 3 times lower than DS's latency anyway. If you could experiment - force Exclusive mode for WASAPI and test the range of latencies then we could understand whether Shared mode itself is buggy/unstable when latency is 22 ms or the problem in PA WASAPI implementation (I can see no corrupted logic in our implementation). If playback is stable then we probably can not achieve an absolute success with WASAPI Shared mode with latency below 44 ms.
It is the value provided from WASAPI's API by |
@dmitrykos I did not do a very thorough test, but I get no glitching with any of the buffer size values I've tested before. Looking good. This test was in debug and with printf, so let me do a more detailed test. |
@dmitrykos I have now tested the latest proposal extensively: Shared mode: Works perfectly without any glitching. Tested with 3 different audio devices (built-in, HDMI and USB), in solo and mixed with other playback devices. And tested every permutations with buffer sizes of 2112, 1056, 528, 264, 132 and 66. All OK. Exclusive mode: Works great, but I get some glitching on the lower buffer sizes (132 and below). This is probably due to excessive user processing time and will at some point be expected. Forced event mode: In this mode I force the playback to use the Event mode. It is not stable at all except for 528. In all the other values it creates a repetitive, monotonic stream of glitching audio (as opposed to random, intermittent glitching). Auto convert mode: Didn't do as detailed test as above, but using this option doesn't seem to affect stability. If I revert the fix above, I get the same type of glitching regardless if the auto convert feature is used or not.
I agree, DS and MME is not a fair comparison due to their very long buffering schemes and that they can't be compared 1 to 1. However my overall point is seen from the user's perspective: The user most often don't care which host API is used, just something that works. DS and MME are old legacy APIs (which is using WASAPI behind the scene). I would like to be able to recommend my users to use WASAPI as this is preferred, modern, Windows audio system. That's all. |
PS! I hit the bug/feature in https://github.com/PortAudio/portaudio/blob/winrt/src/hostapi/wasapi/pa_win_wasapi.c#L2079-L2085 . Long story short: I have an Focusrite Scarlett 6i6 USB Pro sound card, and it wanted a firmware update. For some reason that firmware update completely broke audio on my computer. If I selected that audio card as output, other audio apps, such as Spotify crashed. I had plans to use this sound card for the tests above, but had to give it up. When I ran my application, PortAudio crashed due to this:
This is OT for this issue, but I wanted to mention it. If you like I could create a new issue for it. |
@sveinse I am glad to hear we got some light in the end of tunnel. Do I understand it correctly that proposed modification starting from line 5992 improved situation with glitches?
Yes, if you do not mind to create 2 issues, one for this issue and another for |
Affirmative. That fix resolves any startup glitches. |
@dmitrykos Since the initial buffer fill fix was found last, I wanted to verify if the /2 fix was necessary. I did this by commenting out portaudio/src/hostapi/wasapi/pa_win_wasapi.c Lines 5841 to 5846 in aecafb1
|
I've added a special build of my end-user app for windows with the propsals in PR #307 if anyone else wants to test. It can be downloaded from here: https://github.com/sveinse/elns-portaudio-test/releases/tag/2 |
@sveinse thank you very much for testing and all your proposals, I am really glad that finally we nailed the bug! :) This hard work resulted in a large improvement to my view. To summarize (as per @philburk's advise) what we have done with #307 to get rid of audible glitches in Polling Shared mode:
Will proceed with merging into the master. |
I'm experiencing significant glitching when using WASAPI from master in shared mode. If using exclusive mode or a full-duplex audio stream, the audio doesn't glitch.
I'm using PA from Python using the interface library pyaudio. I'm not excluding performance issues from the python environment, but all the other hostAPIs works smoothly without glitching.
The glitching is typically manifested as bursts of 4096 samples played back successfully and then 220-230 samples of pause in between. The pause interval varies. PA is called with
framesPerBuffer
of 1024. It improves the occurrence of glitching by increasing this number, but it doesn't remove the glitching altogether, even with very large values such as 65536.For reference my environment is:
Windows 10, 2004 version
Compiled with MS VS 2019 Community Edition
Applied patch for building (for MSVC 2019 and to removed ASIO): https://gist.github.com/sveinse/7a3442d6f8444b95c4084a7172ec5fdb
Compile command (from git bash):
The text was updated successfully, but these errors were encountered: