Allow unicode in test names #30

davehunt · 2016-01-27T11:34:32Z

@The-Compiler unicode hurts my head! I encountered a test with 〈 as a parameter. This caused the report to fail due to UnicodeDecodeError. This patch fixes it, but would you mind taking a look to see if there's a smarter approach? I also found it difficult to write a test for this use case.

The-Compiler · 2016-01-28T16:53:06Z

Any idea how I can reproduce the issue? I tried running py.test --html foo.html test_foo.py with this file and python 2:

# encoding: utf-8

import pytest

@pytest.mark.parametrize('val', ['〈'])
def test_foo(val):
    pass

but that passes without any trouble.

davehunt · 2016-01-28T16:57:31Z

It's this test and the parameter value is here. I'll see if I can create a simplified test case.

davehunt · 2016-01-28T18:27:41Z

@The-Compiler here you go:

# -*- coding: utf-8 -*-

import pytest

@pytest.mark.parametrize('val', ['foo'], ids=['〈'])
def test_unicode(val):
    pass

The-Compiler · 2016-01-28T19:18:18Z

So what I think is happening is that pytest passes through the IDs verbatim, i.e. as a python2 str (i.e. bytes with a particular encoding, utf-8 in this case).

Then when you try to get unicode from the HTML document, py.xml internally calls unicode() (without an encoding) on it, which fails for non-ascii chars.

I think the proper solution would be to do something similar to what pytest does:

if _PY3:
    import codecs

    def _escape_bytes(val):
        """
        If val is pure ascii, returns it as a str(), otherwise escapes
        into a sequence of escaped bytes:
        b'\xc3\xb4\xc5\xd6' -> u'\\xc3\\xb4\\xc5\\xd6'
        note:
           the obvious "v.decode('unicode-escape')" will return
           valid utf-8 unicode if it finds them in the string, but we
           want to return escaped bytes for any byte, even if they match
           a utf-8 string.
        """
        if val:
            # source: http://goo.gl/bGsnwC
            encoded_bytes, _ = codecs.escape_encode(val)
            return encoded_bytes.decode('ascii')
        else:
            # empty bytes crashes codecs.escape_encode (#1087)
            return ''
else:
    def _escape_bytes(val):
        """
        In py2 bytes and str are the same type, so return it unchanged if it
        is a full ascii string, otherwise escape it into its binary form.
        """
        try:
            return val.decode('ascii')
        except UnicodeDecodeError:
            return val.encode('string-escape')

pytest seems to not do that when ids is given (only when generating them).

@nicoddemus - sorry for all the highlights today, but I'd like your opinion on this 😉 Does this sound correct? Should pytest core call _escape_bytes even if bytes are given via ids?

The-Compiler · 2016-01-28T19:18:42Z

Oh, here is the stacktrace of @davehunt's example:

foo.py .Traceback (most recent call last):
  File "/home/florian/.venv2/bin/py.test", line 11, in <module>
    sys.exit(main())
  File "/home/florian/.venv2/lib/python2.7/site-packages/_pytest/config.py", line 48, in main
    return config.hook.pytest_cmdline_main(config=config)
  File "/home/florian/.venv2/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 724, in __call__
    return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
  File "/home/florian/.venv2/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 338, in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
  File "/home/florian/.venv2/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 333, in <lambda>
    _MultiCall(methods, kwargs, hook.spec_opts).execute()
  File "/home/florian/.venv2/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 596, in execute
    res = hook_impl.function(*args)
  File "/home/florian/.venv2/lib/python2.7/site-packages/_pytest/main.py", line 115, in pytest_cmdline_main
    return wrap_session(config, _main)
  File "/home/florian/.venv2/lib/python2.7/site-packages/_pytest/main.py", line 110, in wrap_session
    exitstatus=session.exitstatus)
  File "/home/florian/.venv2/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 724, in __call__
    return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
  File "/home/florian/.venv2/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 338, in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
  File "/home/florian/.venv2/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 333, in <lambda>
    _MultiCall(methods, kwargs, hook.spec_opts).execute()
  File "/home/florian/.venv2/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 595, in execute
    return _wrapped_call(hook_impl.function(*args), self.execute)
  File "/home/florian/.venv2/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 249, in _wrapped_call
    wrap_controller.send(call_outcome)
  File "/home/florian/.venv2/lib/python2.7/site-packages/_pytest/terminal.py", line 361, in pytest_sessionfinish
    outcome.get_result()
  File "/home/florian/.venv2/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 279, in get_result
    _reraise(*ex)  # noqa
  File "/home/florian/.venv2/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 264, in __init__
    self.result = func()
  File "/home/florian/.venv2/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py", line 596, in execute
    res = hook_impl.function(*args)
  File "/home/florian/.venv2/lib/python2.7/site-packages/pytest_html/plugin.py", line 269, in pytest_sessionfinish
    unicode_doc = doc.unicode(indent=2)
  File "/home/florian/.venv2/lib/python2.7/site-packages/py/_xmlgen.py", line 69, in unicode
    HtmlVisitor(l.append, indent, shortempty=False).visit(self)
  File "/home/florian/.venv2/lib/python2.7/site-packages/py/_xmlgen.py", line 126, in visit
    visitmethod(node)
  File "/home/florian/.venv2/lib/python2.7/site-packages/py/_xmlgen.py", line 158, in Tag
    self.visit(x)
  File "/home/florian/.venv2/lib/python2.7/site-packages/py/_xmlgen.py", line 126, in visit
    visitmethod(node)
  File "/home/florian/.venv2/lib/python2.7/site-packages/py/_xmlgen.py", line 158, in Tag
    self.visit(x)
  File "/home/florian/.venv2/lib/python2.7/site-packages/py/_xmlgen.py", line 126, in visit
    visitmethod(node)
  File "/home/florian/.venv2/lib/python2.7/site-packages/py/_xmlgen.py", line 158, in Tag
    self.visit(x)
  File "/home/florian/.venv2/lib/python2.7/site-packages/py/_xmlgen.py", line 126, in visit
    visitmethod(node)
  File "/home/florian/.venv2/lib/python2.7/site-packages/py/_xmlgen.py", line 141, in list
    self.visit(elem)
  File "/home/florian/.venv2/lib/python2.7/site-packages/py/_xmlgen.py", line 126, in visit
    visitmethod(node)
  File "/home/florian/.venv2/lib/python2.7/site-packages/py/_xmlgen.py", line 158, in Tag
    self.visit(x)
  File "/home/florian/.venv2/lib/python2.7/site-packages/py/_xmlgen.py", line 126, in visit
    visitmethod(node)
  File "/home/florian/.venv2/lib/python2.7/site-packages/py/_xmlgen.py", line 158, in Tag
    self.visit(x)
  File "/home/florian/.venv2/lib/python2.7/site-packages/py/_xmlgen.py", line 126, in visit
    visitmethod(node)
  File "/home/florian/.venv2/lib/python2.7/site-packages/py/_xmlgen.py", line 141, in list
    self.visit(elem)
  File "/home/florian/.venv2/lib/python2.7/site-packages/py/_xmlgen.py", line 126, in visit
    visitmethod(node)
  File "/home/florian/.venv2/lib/python2.7/site-packages/py/_xmlgen.py", line 158, in Tag
    self.visit(x)
  File "/home/florian/.venv2/lib/python2.7/site-packages/py/_xmlgen.py", line 126, in visit
    visitmethod(node)
  File "/home/florian/.venv2/lib/python2.7/site-packages/py/_xmlgen.py", line 132, in __object
    self.write(escape(unicode(obj)))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 21: ordinal not in range(128)

nicoddemus · 2016-01-28T21:10:47Z

@The-Compiler

pytest seems to not do that when ids is given (only when generating them).

Hmm if you convert user-provided not ascii test ids into "escaped" ascii, won't that be surprising to users? Because in the end the HTML output will contain escaped bytes instead of their UTF-8 encoded strings... on the other hand, they did provide a utf-8-encoded bytes instance to ids= instead of a unicode instance, and that in Python 2 is asking for trouble. 😁

Does this work?

# -*- coding: utf-8 -*-

import pytest

@pytest.mark.parametrize('val', ['foo'], ids=[u'〈'])
def test_unicode(val):
    pass

davehunt · 2016-01-29T08:48:22Z

I see your point @nicoddemus, unfortunately your suggestion gives a different exception:

../../.virtualenvs/tmp-dd105f0f1432d97c/lib/python2.7/site-packages/_pytest/runner.py:149: in __init__
    self.result = func()
../../.virtualenvs/tmp-dd105f0f1432d97c/lib/python2.7/site-packages/_pytest/main.py:431: in _memocollect
    return self._memoizedcall('_collected', lambda: list(self.collect()))
../../.virtualenvs/tmp-dd105f0f1432d97c/lib/python2.7/site-packages/_pytest/main.py:311: in _memoizedcall
    res = function()
../../.virtualenvs/tmp-dd105f0f1432d97c/lib/python2.7/site-packages/_pytest/main.py:431: in <lambda>
    return self._memoizedcall('_collected', lambda: list(self.collect()))
../../.virtualenvs/tmp-dd105f0f1432d97c/lib/python2.7/site-packages/_pytest/python.py:604: in collect
    return super(Module, self).collect()
../../.virtualenvs/tmp-dd105f0f1432d97c/lib/python2.7/site-packages/_pytest/python.py:458: in collect
    res = self.makeitem(name, obj)
../../.virtualenvs/tmp-dd105f0f1432d97c/lib/python2.7/site-packages/_pytest/python.py:470: in makeitem
    collector=self, name=name, obj=obj)
../../.virtualenvs/tmp-dd105f0f1432d97c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py:724: in __call__
    return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
../../.virtualenvs/tmp-dd105f0f1432d97c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py:338: in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
../../.virtualenvs/tmp-dd105f0f1432d97c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py:333: in <lambda>
    _MultiCall(methods, kwargs, hook.spec_opts).execute()
../../.virtualenvs/tmp-dd105f0f1432d97c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py:595: in execute
    return _wrapped_call(hook_impl.function(*args), self.execute)
../../.virtualenvs/tmp-dd105f0f1432d97c/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py:249: in _wrapped_call
    wrap_controller.send(call_outcome)
../../.virtualenvs/tmp-dd105f0f1432d97c/lib/python2.7/site-packages/_pytest/python.py:329: in pytest_pycollect_makeitem
    res = list(collector._genfunctions(name, obj))
../../.virtualenvs/tmp-dd105f0f1432d97c/lib/python2.7/site-packages/_pytest/python.py:500: in _genfunctions
    subname = "%s[%s]" %(name, callspec.id)
../../.virtualenvs/tmp-dd105f0f1432d97c/lib/python2.7/site-packages/_pytest/python.py:863: in id
    return "-".join(map(str, filter(None, self._idlist)))
E   UnicodeEncodeError: 'ascii' codec can't encode character u'\u3008' in position 0: ordinal not in range(128)

nicoddemus · 2016-01-29T10:21:18Z

Thanks @davehunt.

It was less a suggestion and more curiosity if that would work... but I didn't have my hopes up. ☺️

I think we should try to implement escaping the bytes as @The-Compiler suggested.

davehunt · 2016-01-29T16:23:45Z

I think we should try to implement escaping the bytes as @The-Compiler suggested.

Just to clarify, are you saying this should be done in pytest core?

nicoddemus · 2016-01-29T18:44:50Z

Hmmm I think so, not sure if it is possible to fix in pytest-html

davehunt · 2016-02-01T11:04:15Z

@nicoddemus I'd be happy to raise an issue or even submit a patch for a test (possibly even a fix) but I'd need a little bit of guidance. I'm not too familiar with the pytest codebase, and it's not clear to me how best to expose/address this issue.

nicoddemus · 2016-02-01T11:30:26Z

Sure @davehunt, please open up an issue. @The-Compiler any chance you can tackle this? I'm pretty busy this week, unfortunately...

The-Compiler · 2016-02-01T12:07:53Z

I can't promise anything right now - busy with preparing a pytest talk for next week and a pytest training for in 3 weeks. 😉

davehunt · 2016-02-01T13:11:02Z

Raised as pytest-dev/pytest#1351

davehunt · 2016-02-18T18:00:22Z

This should be fixed in pytest core.

Allow unicode in test names

baabd43

davehunt mentioned this pull request Feb 1, 2016

explicitly passed parametrize ids do not get escaped to ascii pytest-dev/pytest#1351

Closed

davehunt closed this Feb 18, 2016

davehunt deleted the allow-unicode branch November 28, 2016 09:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow unicode in test names #30

Allow unicode in test names #30

davehunt commented Jan 27, 2016

The-Compiler commented Jan 28, 2016

davehunt commented Jan 28, 2016

davehunt commented Jan 28, 2016

The-Compiler commented Jan 28, 2016

The-Compiler commented Jan 28, 2016

nicoddemus commented Jan 28, 2016

davehunt commented Jan 29, 2016

nicoddemus commented Jan 29, 2016

davehunt commented Jan 29, 2016

nicoddemus commented Jan 29, 2016

davehunt commented Feb 1, 2016

nicoddemus commented Feb 1, 2016

The-Compiler commented Feb 1, 2016

davehunt commented Feb 1, 2016

davehunt commented Feb 18, 2016

Allow unicode in test names #30

Allow unicode in test names #30

Conversation

davehunt commented Jan 27, 2016

The-Compiler commented Jan 28, 2016

davehunt commented Jan 28, 2016

davehunt commented Jan 28, 2016

The-Compiler commented Jan 28, 2016

The-Compiler commented Jan 28, 2016

nicoddemus commented Jan 28, 2016

davehunt commented Jan 29, 2016

nicoddemus commented Jan 29, 2016

davehunt commented Jan 29, 2016

nicoddemus commented Jan 29, 2016

davehunt commented Feb 1, 2016

nicoddemus commented Feb 1, 2016

The-Compiler commented Feb 1, 2016

davehunt commented Feb 1, 2016

davehunt commented Feb 18, 2016