Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Unicode (utf16) not handled in filenames ? #5094

Open
Steveland opened this issue Mar 1, 2015 · 23 comments
Open

Windows Unicode (utf16) not handled in filenames ? #5094

Steveland opened this issue Mar 1, 2015 · 23 comments

Comments

@Steveland
Copy link

Hello,

I wonder if you can add support for utf16 in filenames. I'm trying to pass this command line from a Windows c++ program:
youtube-dl.exe -o output_美.mp4 -v https://www.youtube.com/watch?v=HEwDZ8KtRMg

and I get this output:
[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['-v', '-o', 'C:\Users\Alan\Desktop\youtube-dl\output_?.mp4', 'https://www.youtube.com/watch?v=HEwDZ8KtRMg']
[debug] Encodings: locale cp1252, fs mbcs, out None, pref cp1252
[debug] youtube-dl version 2015.02.28
[debug] Python version 2.7.8 - Windows-7-6.1.7601-SP1
[debug] exe versions: ffmpeg N-68881-ga79ac73, ffprobe N-68881-ga79ac73, rtmpdump 2.4
[debug] Proxy map: {}
[youtube] HEwDZ8KtRMg: Downloading webpage
[youtube] HEwDZ8KtRMg: Extracting video information
[youtube] HEwDZ8KtRMg: Downloading DASH manifest
[debug] Invoking downloader on u'https://r1---sn-25g7snee.googlevideo.com/videoplayback?source=youtube&mime=video%2Fmp4&expire=1425224794&itag=18&fexp=904844%2C905657%2C907263%2C927622%2C931392%2C934954%2C9406140%2C9406861%2C943917%2C947225%2C947240%2C948124%2C951703%2C952302%2C952605%2C952612%2C952620%2C952901%2C955301%2C957201%2C959701&sparams=dur%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Cmime%2Cmm%2Cms%2Cmv%2Cpl%2Cratebypass%2Crequiressl%2Csource%2Cupn%2Cexpire&dur=126.525&mm=31&ms=au&mv=m&mt=1425203133&ipbits=0&ip=2.14.183.224&key=yt5&upn=CUV2UYbWCXs&id=o-ALMUGUmQ1J1TmoV9FOrSLSrPdyF3XR07mSjPEJ51kId_&ratebypass=yes&initcwndbps=1465000&requiressl=yes&sver=3&signature=32631A825116B8DFB0F8F040ACEECB8A897BD008.758F1F62B48F4ABDE4CFA6F51A37C5FFF1990E51&pl=16'
Traceback (most recent call last):
File "main.py", line 19, in
File "youtube_dl__init__.pyo", line 397, in main
File "youtube_dl__init__.pyo", line 387, in _real_main
File "youtube_dl\YoutubeDL.pyo", line 1442, in download
File "youtube_dl\YoutubeDL.pyo", line 654, in extract_info
File "youtube_dl\YoutubeDL.pyo", line 700, in process_ie_result
File "youtube_dl\YoutubeDL.pyo", line 1143, in process_video_result
File "youtube_dl\YoutubeDL.pyo", line 1375, in process_info
File "youtube_dl\YoutubeDL.pyo", line 1350, in dl
File "youtube_dl\downloader\common.pyo", line 339, in download
File "youtube_dl\downloader\http.pyo", line 158, in real_download
File "youtube_dl\utils.pyo", line 258, in sanitize_open
File "ntpath.pyo", line 64, in join
File "ntpath.pyo", line 114, in splitdrive
TypeError: object of type 'generator' has no len()

@Steveland
Copy link
Author

It seems the Windows executable youtube-dl.exe is using an old version of Python (2.7.8) that does not support unicode. Could you please make the update to the latest Python version (3.4.3)? if it is not possible please explain why.

This is an important feature because it can lead to serious bugs. For example, the default output folder for downloads in Windows is "C:\Users\Username\Downloads", now if the username contains utf16 characters (chinese, russian, etc), the program will fail.

@phihag
Copy link
Contributor

phihag commented Mar 6, 2015

No offense, but this sounds a lot like an unfounded conspiracy theory - bear in mind that extraordinary claims do require extraordinary proof.

First of all, the error message is unrelated to UTF-16 in the first place!

Secondly, Python 2.x supports Unicode just fine, at least as far as I am aware.

The executable is generated using py2exe, which does not support Python 3. I'd gladly see a way to build 3.x. Since I only use Windows to fix reported bugs in our Windows port, I see other issues as much more pressing. You are very welcome though to suggest code to build an exe for Python 3.x on Windows.

@Steveland
Copy link
Author

Hi Philipp,

Thanks for taking the time to reply.

I said that the problem is related to unicode for several reasons:

  • If I remove the chinese character (utf-16) in the input, everything works fine.
  • the chinese character is replaced by '?' in the [debug] line
  • the [debug] Encodings shows "locale cp1252, fs mbcs, out None, pref cp1252", I guess it should be UTF-16 or UTF-8 (I also tried --encoding UTF-16 but got garbage output)
  • I saw several posts pointing out unicode problems on python version 2.x.
  • by default youtube-dl.exe delete the unicode characters in output filenames

I was not aware of this py2exe tool, I guess it adds another variable to the problem.

If anyone manage to get unicode (UTF-16 or UTF-8) output filenames using youtube-dl.exe on Windows please give me the answer. I'm trying for several days to make this work but no luck so far...

Also I downloaded the latest python version, and tried to run the script YoutubeDL.py but still no luck with it. Is there a tuto on how to run youtube-dl on Windows with python installed?

I'm sorry, I'm just a Windows C++ developer and I'm not familiar with linux and command line stuff.

@jaimeMF
Copy link
Collaborator

jaimeMF commented Mar 7, 2015

We could look into cx_Freeze, I used it once with a python3 program and it worked fine, but I don't know if it can generate a single exe file instead of an installer.

About running youtube-dl with python on Windows: if you have installed latest python version and you haven't unselected pip during the the installation (I think it's selected by default), you can run pip install -U youtube_dl or pip install -e . from the source code directory to use the version from the repo.

@Steveland
Copy link
Author

Hello Jaime,

Thanks a lot for your help. Your method (pip install -U youtube_dl) is working fine. I could get youtube-dl to work on Windows with the latest version of python and unicode chinese characters are now displayed properly in output filenames.

Philipp mentioned that python had support for unicode so I guess the problem comes from py2exe that is screwing the encoding.

Anyway, I lost too much time on this, I'll find a workaround and move on.

Thanks again.

@jaimeMF
Copy link
Collaborator

jaimeMF commented Mar 13, 2015

@phihag It seems that py2exe (0.9.2.2) supports python 3.4:

$ pip show py2exe
---
Name: py2exe
Version: 0.9.2.2
Location: c:\python34\lib\site-packages
Requires: 

C:\Users\jaime\Desktop>youtube-dl.exe "http://www.youtube.com/watch?v=OIYeCPUIL1E" -v -x --audio-format mp3 > log.txt 2>&1 produces:

[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['http://www.youtube.com/watch?v=OIYeCPUIL1E', '-v', '-x', '--audio-format', 'mp3']
[debug] Encodings: locale cp1252, fs mbcs, out cp1252, pref cp1252
[debug] youtube-dl version 2015.03.09
[debug] Python version 3.4.2 - Windows-7-6.1.7600
[debug] exe versions: ffmpeg 1.2, ffprobe 1.2
[debug] Proxy map: {}
[youtube] OIYeCPUIL1E: Downloading webpage
[youtube] OIYeCPUIL1E: Extracting video information
[youtube] OIYeCPUIL1E: Downloading DASH manifest
[debug] Invoking downloader on 'https://r6---sn-h5q7dnez.googlevideo.com/videoplayback?<...>'
[download] Resuming download at byte 195767
[download] Destination:  ' -   ()-OIYeCPUIL1E.m4a

[download]   3.7% of 5.03MiB at Unknown speed ETA Unknown ETA
[download]   3.8% of 5.03MiB at 750.01KiB/s ETA 00:06        
[download]   3.9% of 5.03MiB at 700.00KiB/s ETA 00:07        
[download]   4.0% of 5.03MiB at 254.22KiB/s ETA 00:19        
[download]   4.3% of 5.03MiB at 364.68KiB/s ETA 00:13        
[download]   4.9% of 5.03MiB at 459.83KiB/s ETA 00:10        
[download]   6.2% of 5.03MiB at 477.42KiB/s ETA 00:10        
[download]   8.7% of 5.03MiB at 561.64KiB/s ETA 00:08        
[download]  13.6% of 5.03MiB at 587.32KiB/s ETA 00:07        
[download]  23.6% of 5.03MiB at 661.24KiB/s ETA 00:05        
[download]  38.3% of 5.03MiB at 721.46KiB/s ETA 00:04        
[download]  54.3% of 5.03MiB at 718.17KiB/s ETA 00:03        
[download]  68.1% of 5.03MiB at 673.52KiB/s ETA 00:02        
[download]  78.8% of 5.03MiB at 659.33KiB/s ETA 00:01        
[download]  90.1% of 5.03MiB at 644.71KiB/s ETA 00:00        
[download] 100.0% of 5.03MiB at 632.75KiB/s ETA 00:00        
[download] 100% of 5.03MiB in 00:07                          
[ffmpeg] Correcting container in " ' -   ()-OIYeCPUIL1E.m4a"
[debug] ffmpeg command line: ffmpeg -y -i ' '"'"' -   ()-OIYeCPUIL1E.m4a' -c copy -f mp4 ' '"'"' -   ()-OIYeCPUIL1E.temp.m4a'
[debug] ffmpeg command line: ffprobe -show_streams ' '"'"' -   ()-OIYeCPUIL1E.m4a'
[ffmpeg] Destination:  ' -   ()-OIYeCPUIL1E.mp3
[debug] ffmpeg command line: ffmpeg -y -i ' '"'"' -   ()-OIYeCPUIL1E.m4a' -vn -acodec libmp3lame -q:a 5 ' '"'"' -   ()-OIYeCPUIL1E.mp3'
Deleting original file  ' -   ()-OIYeCPUIL1E.m4a (pass -k to keep)

@Steveland
Copy link
Author

It seems that the unicode characters are deleted in your output filename. Did you manage to get the Hebrew characters in the output filename?

@jaimeMF
Copy link
Collaborator

jaimeMF commented Mar 14, 2015

On the desktop it showed the hebrew characters, I don't know why they don't appear in the output.

@Steveland
Copy link
Author

Great! Let's hope this py2exe version will be used for future release of youtube-dl.exe

@mbnoimi
Copy link

mbnoimi commented Apr 13, 2015

+1

I've same bug for UTF-8 names (Arabic characters)

@mbnoimi
Copy link

mbnoimi commented Apr 13, 2015

BTW, I tried to build youtube-dl using the new py2exe but unfortunately I failed :(

D:\PortableApps\YouTube-dl\youtube-dl>python setup.py py2exe
C:\Python 3.5\lib\site-packages\setuptools\dist.py:283: UserWarning: The version
 specified requires normalization, consider using '2015.4.9' instead of '2015.04
.09'.
  self.metadata.version,
running py2exe
running build_py

  9 missing Modules
  ------------------
? HTMLParser                          imported from youtube_dl.compat
? cookielib                           imported from youtube_dl.compat
? netbios                             imported from uuid
? readline                            imported from cmd, code, pdb
? urllib.urlretrieve                  imported from youtube_dl.compat
? win32api                            imported from platform
? win32con                            imported from platform
? win32wnet                           imported from uuid
? xattr                               imported from youtube_dl, youtube_dl.downl
oader.http, youtube_dl.postprocessor.xattrpp
Building '.\youtube-dl.exe'.
error: [Errno 2] No such file or directory: 'C:\\Python 3.5\\lib\\site-packages\
\py2exe\\run-py3.5-win-amd64.exe'

D:\PortableApps\YouTube-dl\youtube-dl>pip install HTMLParser
Collecting HTMLParser
  Downloading HTMLParser-0.0.2.tar.gz
Installing collected packages: HTMLParser
  Running setup.py install for HTMLParser
Successfully installed HTMLParser-0.0.2

D:\PortableApps\YouTube-dl\youtube-dl>pip install cookielib netbios readline url
lib.urlretrieve win32api win32con win32wnet xattr
Collecting cookielib
  Could not find a version that satisfies the requirement cookielib (from versio
ns: )
  No matching distribution found for cookielib

D:\PortableApps\YouTube-dl\youtube-dl>

@yan12125
Copy link
Collaborator

@mbnoimi The problem you encounter is not directly related to this issue. Feel free to open a new issue. By the way, your problem seems to be related to py2exe itself rather than youtube-dl. If I'm correct, there's no official Python 3.5 support in the latest py2exe release.

@mbnoimi
Copy link

mbnoimi commented Apr 13, 2015

@yan12125 No my problem is exactly what occurs here... read more please

your problem seems to be related to py2exe itself rather than youtube-dl

No. I just made a test for building binaries for youtube-dl using py2exe same as @jaimeMF did

@mbnoimi
Copy link

mbnoimi commented Apr 13, 2015

After many tests I could successfully built youtube-dl binary using py2exe 0.9.2.2 and Python 3.5.0a3.
Now youtube-dl.exe can handle UTF-8 without any problem.

It's up to you guys to create a new distro supports UNICODE.

Thanks you all.

@mbnoimi
Copy link

mbnoimi commented Apr 13, 2015

BTW, recent Windows binary (2015.04.09) doesn't support UTF-8.

@karatchov
Copy link

Just my 2 cent to workaround the unicode problem on Windows:

  • either use the script along with Python 3
  • or build a binary based on Python 3, Unfortunately I failed to get it done with py2exe, but Nuitka 0.5.13 + Python 3.4 x86 (along with an installed msvc 2010 express) can get the job done with no hassle (and with few warnings) with a simple:
    nuitka --standalone youtube_dl

The Nuitka built binary seems to be slightly slower to start, but does work correctly.
For the lazies, I uploaded my build ...
https://drive.google.com/file/d/0B1T_XhgV8nOjRzN1RHU3ekpfVFE/view?usp=sharing

@videonerd
Copy link

Thanks karatchov, this is the first public build that fixes the unicode issue on Windows. Would really appreciate it if the devs can incorporate a working build for the windows binary release. Thank you!

@karatchov
Copy link

A new build based on today's master, using Nuitka 0.5.16 & Python 3.5 x86 & MSVC 2010 express:
https://drive.google.com/file/d/0B1T_XhgV8nOjTlBNX3dRUk1Zam8/view?usp=sharing

@videonerd
Copy link

Thank you karatchov. Whilst your efforts are much appreciated, I would urge the devs please to fix the official win32 binary release for the benefit of the wider userbase of the win32 version.

Thank you!

@GoTop
Copy link

GoTop commented Sep 4, 2016

@karatchov

Do you have the newest version of youtube-dl build with python3?

@yan12125
Copy link
Collaborator

yan12125 commented Sep 4, 2016

@GoTop FYI: Currently the official .exe is built against Python 3.4

@GoTop
Copy link

GoTop commented Sep 4, 2016

@yan12125

That's really cool!

I use the official .exe, and it solve the unicode filename problem.

Thanks!

@yan12125
Copy link
Collaborator

yan12125 commented Sep 4, 2016

@jaimeMF's result may be a Python bug/missing feature. See PEP 528.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants