af_scaletempo2: prioritize louder channels for similarity measure #13748

ferreum · 2024-03-22T15:37:44Z

This is an attempt to fix #8705 and is the result to my ideas commented on #13737.

The idea here is to increase the weight of louder channels. Some details are in the commit message. Should have very little effect on stereo audio and none on mono.

I tested it with the sample from the issue, with the test file in the discussion, and in random other videos I've watched and my impression is pretty good so far.

github-actions · 2024-03-22T15:51:59Z

Download the artifacts for this pull request:

Windows

macOS

christoph-heinrich

I gave it a quick test and it sounds good so far 👍

I'll be using it in my mpv build.

christoph-heinrich · 2024-03-22T15:57:28Z

audio/filter/af_scaletempo2_internals.c

@@ -99,11 +99,13 @@ static float multi_channel_similarity_measure(
 {
    const float epsilon = 1e-12f;
    float similarity_measure = 0.0f;
+    float total_energy = epsilon;


Removing that shouldn't make any difference to the result.

The target block doesn't change with the offset, therefore this sum will always be the same for each iteration of the optimal index search, so all calculated similarities are changed by the same factor, thus not changing the location of the maximum.

Hmm, that's true. I was thinking of the candidate blocks here. On the other hand, this relies on the fact that target block is always passed first, and this method currently doesn't say which one the target block is (though I did phrase my commit message that way). I'll remove it.

how about changing the parameters to make the intention apparent? I'm thinking of this

static float multi_channel_similarity_measure( - const float* dot_prod_a_b, - const float* energy_a, const float* energy_b, + const float* dot_prod, + const float* energy_target, const float* energy_candidate, int channels) { const float epsilon = 1e-12f; float similarity_measure = 0.0f; for (int n = 0; n < channels; ++n) { - similarity_measure += dot_prod_a_b[n] * energy_a[n] - / sqrtf(energy_a[n] * energy_b[n] + epsilon); + similarity_measure += dot_prod[n] * energy_target[n] + / sqrtf(energy_target[n] * energy_candidate[n] + epsilon); } return similarity_measure; }

Now that the target and candidate aren't treated the same anymore in that function, it does make sense to give them more descriptive names. I'd be fine with those.

BTW if you start the block with ```diff it gets some colors.

Playback with many audio channels could be distorted when using scaletempo2. This was most noticeable when there were a lot of quiet channels and few louder channels. Fix this by increasing the weight of louder channels in relation to quieter channels. Each channel's target block energy is factored into the usual similarity measure. This should have little effect on very correlated channels (such as most stereo media), where the factors are very similar for all channels. See-Also: mpv-player#8705 See-Also: mpv-player#13737

na-na-hi · 2024-03-23T00:24:41Z

I haven't found obvious regressions with a few samples so far, and the principle seems sound.

But this PR still doesn't eliminate some problems which scaletempo doesn't have. Here is a sample I have played at 1.1x speed, where scaletempo2 has some kind of pitch shifting artifact which scaletempo doesn't have:

This PR: (the same artifact also happens with master and #13737, and I think they sound worse than this PR)

PR13748.1.1.webm

scaletempo:

scaletempo.1.1.webm

Original sample:

sample.zip

Anyway, this sounds better than the status quo for a few samples I tested, so if more users can test this and also find a net improvement I think this can go through, even though this doesn't completely solve the problem.

christoph-heinrich

I don't understand why dot_prod(target, candidate) * sqrt(energy_target) / sqrt(energy_candidate) works, but it does and that's all that really matters.

christoph-heinrich · 2024-03-23T23:27:05Z

But this PR still doesn't eliminate some problems which scaletempo doesn't have. Here is a sample I have played at 1.1x speed, where scaletempo2 has some kind of pitch shifting artifact which scaletempo doesn't have:

scaletempo uses very different default parameters to scaletempo2. If you use af=scaletempo2=search-interval=14:window-size=60 (which is kind of what scaletempo uses, except scaletempo2 has no way of replicating overlap=0.2) then there is no noticeable pitch shift anymore.

Whichever parameters you choose, it will always be a compromise in some regard.
Maybe dynamic parameters could work well for "all" situations, where search-interval and window-size change automatically based on playback speed.

na-na-hi · 2024-03-24T00:38:57Z

I don't understand why dot_prod(target, candidate) * sqrt(energy_target) / sqrt(energy_candidate) works, but it does and that's all that really matters.

My understanding is that the original similarity measure being ununited means that the input signals are essentially normalized, so quiet channels get essentially boosted even though they shouldn't.

Whichever parameters you choose, it will always be a compromise in some regard.
Maybe dynamic parameters could work well for "all" situations, where search-interval and window-size change automatically based on playback speed.

True. The parameters can probably also be content-dependent too - lots of commercial DAW plugins use similar approaches to auto adapt parameters based on certain signal features.

christoph-heinrich · 2024-03-24T00:52:04Z

My understanding is that the original similarity measure being ununited means that the input signals are essentially normalized, so quiet channels get essentially boosted even though they shouldn't.

I should have been more specific. I do understand why this change in isolation is an improvement over the status quo, but I don't understand why multiplying the dot product with sqrt(energy_target) / sqrt(energy_candidate) makes it better.
How come that if the target is louder then the candidate the similarity gets raised over when they are the same loudness,
and then if the candidate is louder then the target it gets decreased.
How does that make the result better? Evidently it does, but it makes no sense to me.

na-na-hi · 2024-03-24T01:19:47Z

How come that if the target is louder then the candidate the similarity gets raised over when they are the same loudness,
and then if the candidate is louder then the target it gets decreased.

Note that the candidate's signal level is also represented in the dot_prod term, so raising it shouldn't change the similarity figure by much because that's still normalized. The similarity can still be decreased if the candidate is significantly louder than the target, but this is probably desired to reduce possible mismatches of transients of different strength.

For the former point though, I suspect that the sqrt(energy_target) term can be dropped and it would be still better than the dot product alone. (untested intuition, probably wrong)

Dudemanguy

I don't use this extensively, but if everyone agrees this has better results, I don't see why not.

christoph-heinrich suggested changes Mar 22, 2024

View reviewed changes

ferreum force-pushed the scaletempo2-similarity-measure branch from db72edb to 06c61d7 Compare March 22, 2024 16:07

ferreum force-pushed the scaletempo2-similarity-measure branch from 06c61d7 to b4b1172 Compare March 22, 2024 16:23

christoph-heinrich approved these changes Mar 23, 2024

View reviewed changes

christoph-heinrich mentioned this pull request Mar 26, 2024

af_scaletempo2: improve signal similarity metric #13737

Closed

Dudemanguy approved these changes Apr 9, 2024

View reviewed changes

Dudemanguy merged commit 096d35d into mpv-player:master Apr 12, 2024
14 checks passed

Dudemanguy mentioned this pull request Apr 12, 2024

Scaletempo2, the new default for adjusting audio playback speed, sounds noticeably worse in some situations #8705

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

af_scaletempo2: prioritize louder channels for similarity measure #13748

af_scaletempo2: prioritize louder channels for similarity measure #13748

ferreum commented Mar 22, 2024 •

edited

Loading

github-actions bot commented Mar 22, 2024 •

edited

Loading

christoph-heinrich left a comment

christoph-heinrich Mar 22, 2024

ferreum Mar 22, 2024 •

edited

Loading

ferreum Mar 22, 2024 •

edited

Loading

christoph-heinrich Mar 22, 2024 •

edited

Loading

na-na-hi commented Mar 23, 2024

christoph-heinrich left a comment •

edited

Loading

christoph-heinrich commented Mar 23, 2024

na-na-hi commented Mar 24, 2024

christoph-heinrich commented Mar 24, 2024

na-na-hi commented Mar 24, 2024

Dudemanguy left a comment

af_scaletempo2: prioritize louder channels for similarity measure #13748

af_scaletempo2: prioritize louder channels for similarity measure #13748

Conversation

ferreum commented Mar 22, 2024 • edited Loading

github-actions bot commented Mar 22, 2024 • edited Loading

christoph-heinrich left a comment

Choose a reason for hiding this comment

christoph-heinrich Mar 22, 2024

Choose a reason for hiding this comment

ferreum Mar 22, 2024 • edited Loading

Choose a reason for hiding this comment

ferreum Mar 22, 2024 • edited Loading

Choose a reason for hiding this comment

christoph-heinrich Mar 22, 2024 • edited Loading

Choose a reason for hiding this comment

na-na-hi commented Mar 23, 2024

christoph-heinrich left a comment • edited Loading

Choose a reason for hiding this comment

christoph-heinrich commented Mar 23, 2024

na-na-hi commented Mar 24, 2024

christoph-heinrich commented Mar 24, 2024

na-na-hi commented Mar 24, 2024

Dudemanguy left a comment

Choose a reason for hiding this comment

ferreum commented Mar 22, 2024 •

edited

Loading

github-actions bot commented Mar 22, 2024 •

edited

Loading

ferreum Mar 22, 2024 •

edited

Loading

ferreum Mar 22, 2024 •

edited

Loading

christoph-heinrich Mar 22, 2024 •

edited

Loading

christoph-heinrich left a comment •

edited

Loading