-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow unicode in test names #30
Conversation
Any idea how I can reproduce the issue? I tried running # encoding: utf-8
import pytest
@pytest.mark.parametrize('val', ['〈'])
def test_foo(val):
pass but that passes without any trouble. |
@The-Compiler here you go: # -*- coding: utf-8 -*-
import pytest
@pytest.mark.parametrize('val', ['foo'], ids=['〈'])
def test_unicode(val):
pass |
So what I think is happening is that pytest passes through the IDs verbatim, i.e. as a python2 Then when you try to get I think the proper solution would be to do something similar to what pytest does: if _PY3:
import codecs
def _escape_bytes(val):
"""
If val is pure ascii, returns it as a str(), otherwise escapes
into a sequence of escaped bytes:
b'\xc3\xb4\xc5\xd6' -> u'\\xc3\\xb4\\xc5\\xd6'
note:
the obvious "v.decode('unicode-escape')" will return
valid utf-8 unicode if it finds them in the string, but we
want to return escaped bytes for any byte, even if they match
a utf-8 string.
"""
if val:
# source: http://goo.gl/bGsnwC
encoded_bytes, _ = codecs.escape_encode(val)
return encoded_bytes.decode('ascii')
else:
# empty bytes crashes codecs.escape_encode (#1087)
return ''
else:
def _escape_bytes(val):
"""
In py2 bytes and str are the same type, so return it unchanged if it
is a full ascii string, otherwise escape it into its binary form.
"""
try:
return val.decode('ascii')
except UnicodeDecodeError:
return val.encode('string-escape') pytest seems to not do that when @nicoddemus - sorry for all the highlights today, but I'd like your opinion on this 😉 Does this sound correct? Should pytest core call |
Oh, here is the stacktrace of @davehunt's example:
|
Hmm if you convert user-provided not ascii test ids into "escaped" ascii, won't that be surprising to users? Because in the end the HTML output will contain escaped bytes instead of their UTF-8 encoded strings... on the other hand, they did provide a utf-8-encoded Does this work? # -*- coding: utf-8 -*-
import pytest
@pytest.mark.parametrize('val', ['foo'], ids=[u'〈'])
def test_unicode(val):
pass |
I see your point @nicoddemus, unfortunately your suggestion gives a different exception:
|
Thanks @davehunt. It was less a suggestion and more curiosity if that would work... but I didn't have my hopes up. I think we should try to implement escaping the bytes as @The-Compiler suggested. |
Just to clarify, are you saying this should be done in pytest core? |
Hmmm I think so, not sure if it is possible to fix in |
@nicoddemus I'd be happy to raise an issue or even submit a patch for a test (possibly even a fix) but I'd need a little bit of guidance. I'm not too familiar with the pytest codebase, and it's not clear to me how best to expose/address this issue. |
Sure @davehunt, please open up an issue. @The-Compiler any chance you can tackle this? I'm pretty busy this week, unfortunately... |
I can't promise anything right now - busy with preparing a pytest talk for next week and a pytest training for in 3 weeks. 😉 |
Raised as pytest-dev/pytest#1351 |
This should be fixed in pytest core. |
@The-Compiler unicode hurts my head! I encountered a test with
〈
as a parameter. This caused the report to fail due toUnicodeDecodeError
. This patch fixes it, but would you mind taking a look to see if there's a smarter approach? I also found it difficult to write a test for this use case.