-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YouTube] Fixes for n param deobfuscation function #1253
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
// Pattern.compile(SINGLE_CHAR_VARIABLE_REGEX + "=\"nn\"\\[\\+" + MULTIPLE_CHARS_REGEX | ||
// + "\\." + MULTIPLE_CHARS_REGEX + "]," + MULTIPLE_CHARS_REGEX + "\\(" | ||
// + MULTIPLE_CHARS_REGEX + "\\)," + MULTIPLE_CHARS_REGEX + "=" | ||
// + MULTIPLE_CHARS_REGEX + "\\." + MULTIPLE_CHARS_REGEX + "\\[" | ||
// + MULTIPLE_CHARS_REGEX + "]\\|\\|null\\).+\\|\\|(" + MULTIPLE_CHARS_REGEX | ||
// + ")\\(\"\"\\)"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove the previous regex? It's better if you keep it, so in case YouTube reverts something it's already there and it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.
This pattern is prone to find wrong functions.
Imagine this:
- NewPipeExtractor searches for the n deobfuscation function
- the first regex doesn't work so it uses the second one (the one I commented out)
- the (commented out) regex finds the wrong function
- NewPipeExtractor runs the wrong function and gets the wrong n parameter
- User gets 403
- No exception is thrown in NewPipeExtractor so it is harder to find the correct place which needs fixing
2.
The regex which comes after the one commented out already finds a very similar pattern, only different groups and more specific.
In this example:
a.D&&(b="nn"[+a.D],WL(a),c=a.j[b]||null)&&(c=SDa[0](c),a.set(b,c),SDa.length||Wma("")
Looking at different versions of the player code it seems more often correct to use the function which is in SDa[0]
but using Wma
in this case would find the wrong function in newer versions.
Going for SDa[0]
(and searching for what is in the array afterwards) seems more robust across multiple versions. The next regex already catches that.
From what I have seen I think if the regex which I commented out is added back in it should be swapped with the next one to hopefully prevent false positives more often.
(Also it might be a good idea to add some additional checks to validate it found the correct function but I consider this out of scope for this PR.)
But I have to say I'm very new to this YT/NewPipeExtractor stuff and only looked at some but not all player version so I might lack some experience with YTs past which means I might be wrong. I can only talk about what I have seen so far so if you know better feel free to correct me.
My suggestion for now:
I will add it back in but swap it with the next one.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also it might be a good idea to add some additional checks to validate it found the correct function but I consider this out of scope for this PR.)
That's a pretty good idea, however if you are using other clients not requiring deobfuscation like it is the case currently (we do not use HTML5 clients for now except for age-restricted videos, but this is a broken workaround that will be removed soon), you are preventing streams extraction. This already happened in past. A general error/logging warning is something that would fit the best.
My suggestion for now:
I will add it back in but swap it with the next one.
What do you think?
Keep the current third version as it is and move the current first one, potentially finding the wrong function, at the third place and the current regexes from this position to the last one after the new third regex. Let me know if I am not clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I have done now compared to the current code on the dev
branch:
- swap the first two regexes
- insert the new regex in third place
Is this correct?
...main/java/org/schabi/newpipe/extractor/services/youtube/YoutubeThrottlingParameterUtils.java
Show resolved
Hide resolved
Is it really needed to add a new regex matching partially what the current second one does? I think it could increase the chances to extract the wrong function, as there is nothing specific about the |
The old regexes don't find any matches and the current player js does not contain the The chance of extracting the wrong function in the future will probably never be zero. |
Is this good to go now? |
I see the merging is blocked. Is it possible to advance it? AFAIK some downstreams depend on it. Also, @gechoto , could you please rebase it on top of the latest release? thanks guys, your work is hard to overestimate for many many ppl! |
I merged the latest changes from dev but guess I messed it up a bit. Now we have 20 commits in this PR.
Yes this is an important fix for some apps and I don't want to maintain a fork for this indefinitely. Hope this gets merged asap but we have to wait until someone from the team has time. @litetex was active lately. Maybe you can review this one? |
thanks a lot
|
Mmmh, I don't see the point of not putting the most recent regex in first place. YouTube is unlikely to rollback to the previous player version. |
…ixup function to prevent early return
… is a function which takes one parameter
The new regex is in third place because it is less specific as it doesn't contain the "nn" string anymore. |
Yes, the point is that every regex is made specifically for one player version from YouTube, and the current version only matches the third regex (and none of the others), so it doesn't make sense not to put that one regex in first position. It's not like YouTube is going to rollback to the previous player (I mean, it happened once I think, and that's why we have the other fallback regexes in the list). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, we can always change it later
Fixes #1252
Changes: