-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
af_scaletempo2: prioritize louder channels for similarity measure #13748
af_scaletempo2: prioritize louder channels for similarity measure #13748
Conversation
Download the artifacts for this pull request: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave it a quick test and it sounds good so far 👍
I'll be using it in my mpv build.
@@ -99,11 +99,13 @@ static float multi_channel_similarity_measure( | |||
{ | |||
const float epsilon = 1e-12f; | |||
float similarity_measure = 0.0f; | |||
float total_energy = epsilon; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing that shouldn't make any difference to the result.
The target block doesn't change with the offset, therefore this sum will always be the same for each iteration of the optimal index search, so all calculated similarities are changed by the same factor, thus not changing the location of the maximum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that's true. I was thinking of the candidate blocks here. On the other hand, this relies on the fact that target block is always passed first, and this method currently doesn't say which one the target block is (though I did phrase my commit message that way). I'll remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about changing the parameters to make the intention apparent? I'm thinking of this
static float multi_channel_similarity_measure(
- const float* dot_prod_a_b,
- const float* energy_a, const float* energy_b,
+ const float* dot_prod,
+ const float* energy_target, const float* energy_candidate,
int channels)
{
const float epsilon = 1e-12f;
float similarity_measure = 0.0f;
for (int n = 0; n < channels; ++n) {
- similarity_measure += dot_prod_a_b[n] * energy_a[n]
- / sqrtf(energy_a[n] * energy_b[n] + epsilon);
+ similarity_measure += dot_prod[n] * energy_target[n]
+ / sqrtf(energy_target[n] * energy_candidate[n] + epsilon);
}
return similarity_measure;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that the target and candidate aren't treated the same anymore in that function, it does make sense to give them more descriptive names. I'd be fine with those.
BTW if you start the block with ```diff it gets some colors.
db72edb
to
06c61d7
Compare
Playback with many audio channels could be distorted when using scaletempo2. This was most noticeable when there were a lot of quiet channels and few louder channels. Fix this by increasing the weight of louder channels in relation to quieter channels. Each channel's target block energy is factored into the usual similarity measure. This should have little effect on very correlated channels (such as most stereo media), where the factors are very similar for all channels. See-Also: mpv-player#8705 See-Also: mpv-player#13737
06c61d7
to
b4b1172
Compare
I haven't found obvious regressions with a few samples so far, and the principle seems sound. But this PR still doesn't eliminate some problems which scaletempo doesn't have. Here is a sample I have played at 1.1x speed, where scaletempo2 has some kind of pitch shifting artifact which scaletempo doesn't have: This PR: (the same artifact also happens with master and #13737, and I think they sound worse than this PR) PR13748.1.1.webmscaletempo: scaletempo.1.1.webmOriginal sample: Anyway, this sounds better than the status quo for a few samples I tested, so if more users can test this and also find a net improvement I think this can go through, even though this doesn't completely solve the problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why dot_prod(target, candidate) * sqrt(energy_target) / sqrt(energy_candidate)
works, but it does and that's all that really matters.
scaletempo uses very different default parameters to scaletempo2. If you use Whichever parameters you choose, it will always be a compromise in some regard. |
My understanding is that the original similarity measure being ununited means that the input signals are essentially normalized, so quiet channels get essentially boosted even though they shouldn't.
True. The parameters can probably also be content-dependent too - lots of commercial DAW plugins use similar approaches to auto adapt parameters based on certain signal features. |
I should have been more specific. I do understand why this change in isolation is an improvement over the status quo, but I don't understand why multiplying the dot product with |
Note that the candidate's signal level is also represented in the For the former point though, I suspect that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't use this extensively, but if everyone agrees this has better results, I don't see why not.
This is an attempt to fix #8705 and is the result to my ideas commented on #13737.
The idea here is to increase the weight of louder channels. Some details are in the commit message. Should have very little effect on stereo audio and none on mono.
I tested it with the sample from the issue, with the test file in the discussion, and in random other videos I've watched and my impression is pretty good so far.