Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaletempo2, the new default for adjusting audio playback speed, sounds noticeably worse in some situations #8705

Open
varenc opened this issue Apr 7, 2021 · 56 comments · Fixed by #13748

Comments

@varenc
Copy link
Contributor

varenc commented Apr 7, 2021

Important Information

Provide following Information:

  • mpv version: freshly compiled latest from master (mpv 0.33.0-109-gd0c530919d)
  • macOS

This is relevant because scaletempo2 was changed to the default from scaletempo in #8376

Reproduction steps

Try listening to some 5.1 audio at 0.95x speed using the now default scaletempo2 filter.
$ mpv --speed=0.95 --af=scaletempo2 some_audio
Listen for the poor quality in some situations.

Now add the old default, scaletempo, to the af filter chain and listen for the better quality.
$ mpv --speed=0.95 --af=scaletempo some_audio

Also listen to the recorded sample files below. You can use the original_source.mkv file included to reproduce the samples I recorded.

Expected behavior

The default should not make things worse.

Actual behavior

scaletempo2 is worse for minor speed changes in the 0.85x - 1.2x range. It's MUCH better for the big speed changes though, and I really appreciate it for that.

While I do appreciate scaletempo2 for big adjustments, I usually only make minor speed changes so for me so it's not a good default. I suspect that playback speed adjustments in the 0.9-1.2x range are much more common amongst users. This comment is where another users seems to have been caught up in this default change.

Sample files

@Hrxn
Copy link
Contributor

Hrxn commented Apr 7, 2021

I suggest voting with +1 and -1 on the original post to vote on changing the default.
+1 to vote for changing the default back to scaletempo (unless there is some fix)
-1 to vote for keeping the current default.

Edit: yes, can reproduce

@CounterPillow
Copy link
Contributor

Suggestion: scaletempo3, which uses scaletempo for speeds between 0.80 to 1.3 and scaletempo2 for speeds outside of that range.

@TiGR
Copy link

TiGR commented Apr 8, 2021

Or maybe have it configurable separately as we have it with scale algorithms.

@varenc
Copy link
Contributor Author

varenc commented Apr 10, 2021

The best solution would just be to make scaletempo2 work better even at minor speed changes! Chrome's own audio scaling, which scaletempo2 is a port of, seems to work fine with minor adjustments.

@DorianRudolph, perhaps you might have an idea of why scaletempo2 performs worse than Chrome does at 0.95x speed? Is there any hope for just tweaking it to handle this use case? That would of course be the ideal solution!

My thinking is that if scaletempo is going to be restored as the default, that should happen soon to avoid further confusion for people. Also I'm basing this on the assuming that minor speed changes in the 0.85x - 1.2x range are far more common amongst MPV users, like they are for me, though I'm not sure if that's true. No matter the outcome, I'll also submit a PR for a docs update which adds a section explaining to users how to easily change the default to another audio scaler.

(@TiGR I do think the how mpv lets you choose your "audio scaling" filter is a bit idiosyncratic and hard to discover, but I think that's for a different discussion!)

@DanOscarsson
Copy link
Contributor

If it works fine in Chrome it may be because some versions of their code switched to resampling between speed 0.95 - 1.06. Personally I prefer to use resampling when close to normal speed, like when playing a 25 Hz movie on a 24 Hz display. And mpv can do sync to vsync with resampling and be configured to do that so 25 Hz movies are automatically resampled to 24 Hz.
My only need for preserving pitch is when playing at a fast speed like > 1.5.
And that is what I would have expected most users need scaletempo2/scaletempo for.
But apparently that may not be true but cannot be determined without asking a lot of users.

As I have started working on some fixes to scaletempo2 (not related to speed near 1) it would be good to quickly decide which scaletempo version to use (there is one more atempo in ffmpeg) as maintaining several WSOLA implementations will just be confusing for users and additional work for maintainers. But may be needed if one cannot solve all users needs.

@realnc
Copy link

realnc commented May 29, 2021

When I built mpv from git, the first thing I noticed were rather severe audio glitches when listening to audiobooks at speeds 0.9 and 1.1 (depending on whether the narrator is too fast or too slow.) It sounds like a scratched CD where the CD player is skipping.

scaletempo produces perfect results at these playback speeds. You can't even tell the sound is slowed down or sped up. It really sounds like the narrator is just reading slower or faster.

There doesn't seem to be an option to tell mpv which filter to use, so I had to put af-add=scaletempo in my config. Unfortunately, this disabled mpv's automatic filter removal when the filter is not needed. The filter is always active and shows up in the OSD all the time.

Something like an --audio-speed-filter option would be very nice to have instead of hardcoding scaletempo2 in the mpv source code.

@garoto
Copy link
Contributor

garoto commented May 29, 2021

[	 no-osd af add "@tempo:scaletempo" ; no-osd add speed "-0.1"
]	 no-osd af add "@tempo:scaletempo" ; no-osd add speed "+0.1"
BS	 no-osd af remove @tempo ; no-osd set speed 1.0

@realnc
Copy link

realnc commented May 29, 2021

[	 no-osd af add "@tempo:scaletempo" ; no-osd add speed "-0.1"
]	 no-osd af add "@tempo:scaletempo" ; no-osd add speed "+0.1"
BS	 no-osd af remove @tempo ; no-osd set speed 1.0

I can't see what speed I'm setting.

@avih
Copy link
Member

avih commented May 29, 2021

@realnc please file a new issue, with logs and everything else which the template requests

If you can bisect it to find the exact first commit where the issue happens - it would great info to add.

@avih
Copy link
Member

avih commented May 31, 2021

It sounds like a scratched CD where the CD player is skipping

@realnc could you please open a new issue for this? All the reports we have so far are about subjective quality, but what you're describing is new, and could very well be an actual bug - which none of us is able to reproduce.

So please file a new issue, with logs, preferably sample files, bisect if you can, etc. It would help us identify a yet-unknown bug.

@kevin-stuart
Copy link

I will be not too helpful commenting here, but I just want to confirm this report.

I upgraded do 0.34 and wondered why voices sound robotic at speed 1.1 until I figured out that apparrently the default was changed to scaletempo2. I added af=scaletempo as option in mpv 0.34 and aparrently things went back to normal.

Unfortunately, I can't offer any samples and it may subjective, but to me it was clear as day that something had changed and voices sounded very robotic with a lot of videos (but not all!). There seem to be some exceptions, but for me, scaletempo2 is way worse.

At least please don't remove scaletempo, for me scaletempo2 is very hard to bear for many files. I can try to see if I notice some kind of regularity such as audio codecs, but for me, there is something very wrong with scaletempo2.

@richardpl
Copy link
Contributor

Use atempo instead.

@kevin-stuart
Copy link

I tried atempo. It sounds similar to scaletempo2 to me (i.e. robotic). It is also not documented in the mpv manual, so I did not get the idea to use this ffmpeg filter. For me scaletempo sounds best. Is it possible that there is some kind of bug in mpv that makes scaletempo2 or atempo sound much worse for only some people?

@varenc
Copy link
Contributor Author

varenc commented Nov 27, 2021

@kevin-stuart I don't think there's any reason why the exact same media played with the exact same version of MPV would result in any difference in sound between people. That said, I opened this issue because I observed that 6 channel audio with scaletempo2 seemed to give worse results than scaletempo when there's a very minor speed adjustment. But the issue went away with most stereo audio. I suspect you're experiencing the same issue. If you can post a small sample that'll help people confirm.

Also I agree that atempo also performs well, but atempo isn't fully supported by mpv and it will eventually lead to an out of sync audio and video. But if you're just playing audio you might not care. I described the atempo issue and some very janky workarounds here: #4418 (comment) For me, scaletempo2 removes my need for atempo.

Given how long scaletempo2 has been the default at this point, unless a lot more people find this issue and concur, I think leaving it the default will be the least disruptive for the most folks. In the meantime just making it easy to switch back to scaletempo is an easy solution. Maybe adding that to the default input.conf to could help. (though tough to decide on the key)

(I use $ af toggle scaletempo in my input.conf to make the $ key toggle it)

@kevin-stuart
Copy link

kevin-stuart commented Nov 27, 2021

You are right, I observed my problems with scaletempo2 with 6 channel audio. I mainly use 1.1 as speedup and scaletempo2 and atempo sound bad for me with this setup. I have set scaletempo in my config. I just hope that scaletempo2 is improved in the future and that scaletempo is not removed until then. For me, scaletempo2 became the new default only very recently when I upgraded to 0.34

@dardoor
Copy link

dardoor commented Jan 9, 2022

I also noticed occasionally very bad sound with scaletempo2.
Here's an example from a movie with 2 channel audio, comparing scaletempo and scaletempo2 at 1.1x and 1.21x speeds:
scaletempo mpv test.zip

@christoph-heinrich
Copy link
Contributor

christoph-heinrich commented Aug 5, 2022

You might want to try out --af=scaletempo2=search-interval=50:window-size=40.
I've tried the example from @dardoor (original (1x).opus) and it sounds great at various speeds (>1).

@realnc
Copy link

realnc commented Aug 5, 2022

You might want to try out --af=scaletempo2=search-interval=50:window-size=40. I've tried the example from @dardoor (original (1x).opus) and it sounds great at various speeds.

It sounds horrible to me with speech with a speed of 0.94. Some words sound robotic, metallic and choppy.As a quick test, I was listening to this podcast:

https://www.youtube.com/watch?v=cnFubyqJ3Ro

Prime example is at the very beginning (0:0:45s) where he says "that the community left for us". If you set the speed to 0.94, scaletempo2 is attrocious. scaletempo is perfect.

Whether I use your paremeters or not doesn't change anything for me in this regard.

@christoph-heinrich
Copy link
Contributor

mpv --no-config --start=44 --speed=0.94 --af=<filter> 'https://www.youtube.com/watch?v=cnFubyqJ3Ro'
I don't hear a problem with scaletempo2, but maybe I'm so used to it that I don't even notice it anymore.
test.zip

Admittedly I never actually listen to anything at <1 speed, so maybe I would have noticed something at some point if I did.
(videos are always >=1.25 speed for me, but I also tested with smaller values >=1)

@dardoor
Copy link

dardoor commented Oct 9, 2022

scaletempo2=search-interval=50:window-size=40 does sound good on the sample I posted, at 1.1 and 1.2 speeds, even a bit better than scaletempo, I think.

But it sounds bad on that last sample at 0.94, at least the "basically we" part.
scaletempo2 with no parameters sounds better, and scaletempo even better.

(I also mostly play media at faster speeds and I would guess that's true for most people too.)

@mars4science
Copy link

Interestingly:
After I've changed scaletempo2 to scaletempo in f_auto_filters.c p->sub.filter = mp_create_user_filter(f, MP_OUTPUT_CHAIN_AUDIO, "scaletempo", NULL);

--af=scaletempo=speed= none, both and tempo sound about the same - like I expect tempo to sound. af=scaletempo=speed=pitch works as expected. But when I've commented out that line sound was played at 1x speed regardless of video speed.
Seems none and both values to option speed do not work as expected from man page.

    both
        Scale both tempo and pitch.
    none
        Ignore speed changes.

@llyyr
Copy link
Contributor

llyyr commented Sep 25, 2023

Is this issue still valid on builds from current master? Also please try rubberband from #12479 build

christoph-heinrich added a commit to christoph-heinrich/mpv that referenced this issue Mar 20, 2024
The signal energy was used for the similarity calculation in the search
for the optimal overlap position. It's usage led to worse results with
increased channel count. I was not able to find a situation
where the inclusion of signal energy produced better results then
without it.

Without signal energy this effectively turns into a cross-correlation.

Fixes mpv-player#8705 (comment)
christoph-heinrich added a commit to christoph-heinrich/mpv that referenced this issue Mar 20, 2024
The signal energy was used for the similarity calculation in the search
for the optimal overlap position. It's usage led to worse results with
increased channel count. I was not able to find a situation
where the inclusion of signal energy produced better results then
without it.

Without signal energy this effectively turns into a cross-correlation.

Fixes mpv-player#8705 (comment)
christoph-heinrich added a commit to christoph-heinrich/mpv that referenced this issue Mar 21, 2024
The old formula worked well for stereo, but the results got worse with
increased channel count.

The taxicab distance works just as well for stereo, while not falling
appart as the channel count grows.

The downside is increased CPU usage. Maybe someone can try and vectorize
this one like the old one was. The performance still isn't bad, so there
is no pressing need for it.

Fixes mpv-player#8705 (comment)
christoph-heinrich added a commit to christoph-heinrich/mpv that referenced this issue Mar 21, 2024
The old formula worked well for stereo, but the results got worse with
increased channel count.

The taxicab distance works just as well for stereo, while not falling
appart as the channel count grows.

The downside is increased CPU usage. Maybe someone can try and vectorize
this one like the old one was. The performance still isn't bad, so there
is no pressing need for it.

Fixes mpv-player#8705 (comment)
christoph-heinrich added a commit to christoph-heinrich/mpv that referenced this issue Mar 21, 2024
The old formula worked well for stereo, but the results got worse with
increased channel count.

The taxicab distance works just as well for stereo, while not falling
appart as the channel count grows.

The downside is increased CPU usage. Maybe someone can try and vectorize
this one like the old one was. The performance still isn't bad, so there
is no pressing need for it.

Fixes mpv-player#8705 (comment)
christoph-heinrich added a commit to christoph-heinrich/mpv that referenced this issue Mar 21, 2024
The old formula worked well for stereo, but the results got worse with
increased channel count.

The taxicab distance works just as well for stereo, while not falling
appart as the channel count grows.

The downside is increased CPU usage. Maybe someone can try and vectorize
this one like the old one was. The performance still isn't bad, so there
is no pressing need for it.

Fixes mpv-player#8705 (comment)
ferreum added a commit to ferreum/mpv that referenced this issue Mar 22, 2024
Playback with many audio channels could be distorted when using
scaletempo2. This was most noticeable when there were a lot of quiet
channels and few louder channels.

Fix this by increasing the weight of louder channels in relation to
quieter channels. Each channel's the target block energy is factored
into the usual similarity measure. To prevent bias towards louder
blocks, the result is divided by the total energy across all channels.

This should have very little effect on very correlated channels (such as
most stereo media), as the division by total energy reverses the effect
of the channel-wise factorization if all channels have similar energy.

See-Also: mpv-player#8705
See-Also: mpv-player#13737
ferreum added a commit to ferreum/mpv that referenced this issue Mar 22, 2024
Playback with many audio channels could be distorted when using
scaletempo2. This was most noticeable when there were a lot of quiet
channels and few louder channels.

Fix this by increasing the weight of louder channels in relation to
quieter channels. Each channel's target block energy is factored into
the usual similarity measure.

This should have very little effect on very correlated channels (such as
most stereo media), as the factors are very similar for all channels.

See-Also: mpv-player#8705
See-Also: mpv-player#13737
ferreum added a commit to ferreum/mpv that referenced this issue Mar 22, 2024
Playback with many audio channels could be distorted when using
scaletempo2. This was most noticeable when there were a lot of quiet
channels and few louder channels.

Fix this by increasing the weight of louder channels in relation to
quieter channels. Each channel's target block energy is factored into
the usual similarity measure.

This should have little effect on very correlated channels (such as most
stereo media), where the factors are very similar for all channels.

See-Also: mpv-player#8705
See-Also: mpv-player#13737
@mesvam
Copy link

mesvam commented Apr 1, 2024

I'm not convinced scaletempo2 is actually better than the original scaletempo at any speed. The problem is that mpv's default parameters for scaletempo gives suboptimal results, so when comparing each filter at default settings, scaletempo2 comes out ahead. But properly configured, scaletempo still beats scaletempo2. I have scaletempo=stride=15:overlap=1:search=15 and it gives nearly perfect playback quality from speeds 1 to 4, and I've never heard any artifacts on a variety of audio. CPU usage may be a bit higher with these settings, but at reasonable speeds on reasonably recent hardware, the load is negligible, especially compared to video decoding.

Meanwhile, for scaletempo2, no combination of parameters can guarantee artifact-free audio at any speed. And the artifacts can actually be quite severe. scaletempo2 has audible pitch shifting of as much as a semitone on drone notes in the background music, which sounds like wrong notes being played, which is really distracting. The subjective quality improvement at higher speeds is simply due to the artifacts being harder to hear since they go by so quickly, but they're still there.

For speeds < 1, scaletempo2 is sounds similar to scaletempo, but WSOLA-type algorithms are all a bit of a crapshoot. FFT methods are better for that IMO.

@christoph-heinrich
Copy link
Contributor

@mesvam here is a little excerpt from a song with your scaletempo parameters
1.12x speed.webm
1x speed.webm

@mesvam
Copy link

mesvam commented Apr 2, 2024

@christoph-heinrich ok I stand corrected. That timber was worse than I expected.

I will say though, that even in that worst case, it's still better than when scaletempo2 goes wonky. Here is an example of background music going crazy with scaletempo2
excerpt.webm
excerpt-scaletempo2-1.06.webm

What's worse is that the artifacts in your excerpt is mainly due to the bass frequencies, which can be fixed by increasing stride/search to 30 or higher scaletempo=stride=30:overlap=1:search=30, with some sacrifices when it comes to other content. I could not find any settings for scaletempo2 that would make my audio listenable, and there aren't even heavy bass frequencies in there!

Dudemanguy pushed a commit that referenced this issue Apr 12, 2024
Playback with many audio channels could be distorted when using
scaletempo2. This was most noticeable when there were a lot of quiet
channels and few louder channels.

Fix this by increasing the weight of louder channels in relation to
quieter channels. Each channel's target block energy is factored into
the usual similarity measure.

This should have little effect on very correlated channels (such as most
stereo media), where the factors are very similar for all channels.

See-Also: #8705
See-Also: #13737
@Dudemanguy
Copy link
Member

Well #13748 improved this but I don't think it's necessarily fixed judging by the comments so reopening.

@Dudemanguy Dudemanguy reopened this Apr 12, 2024
@fideliochan
Copy link

Is there any way to fix desync of atempo? because its still best one imo.

@richardpl
Copy link
Contributor

not really, atempo filter changes timestamps and that causes desync, workaround is adding some hack which would rescale those timestamps back to original values that mpv expects.

@fideliochan
Copy link

what do you mean by hacks like this?
-vf setpts='PTS/1.15' -af atempo=1.15

@richardpl
Copy link
Contributor

Yes, something like that hardcoded to keep A/V sync but that breaks seeking to right spot...

@richardpl
Copy link
Contributor

I have developed prototype filter that can stretch audio with 2x factor, using autocorrelation by RDFT to find similar periods plus interpolating found periods with equal-power cross-fade that make use of normalization cross-correlation factor between two periods. The output is much better than scaletempo(2) or atempo. Need to do similar for 1/2 factor for 2x speed gain.

@richardpl
Copy link
Contributor

Got 0.5 and 2.0 ratios working well and fast. Maybe will add support for arbitrary ratios. If anybody interested to take a look at it I can push filter into librempeg.

@richardpl
Copy link
Contributor

Looks like nobody interested in high-quality, removed code, and moved on to do other stuff.

@richardpl
Copy link
Contributor

Anybody interested in 0.5 and 2.0 fixed tempo filter try ascale filter.

@BergmannAtmet
Copy link

Anybody interested in 0.5 and 2.0 fixed tempo filter try ascale filter.

yes. especially for 2.0.

@kasper93
Copy link
Contributor

kasper93 commented Oct 4, 2024

I don't know if Paul provides any builds, but you can use, the ones from #14977.

@richardpl
Copy link
Contributor

well this will work only if audio-only is used, otherwise runtime AV adjust within mpv is not possible, maybe one can hack some new filter into mpv, once i figure how to do arbitrary scaling, and fix not so small issues with >=2 channels and solo high volume bass-rich & treble-poor audio.

@richardpl
Copy link
Contributor

Now all tempo values from 0.5 to 2.0 should be supported, it sound fine to me, still >=2 channels and heavy bass audio with < 1.0 tempo remain to be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.