bug in printing invalid char #17271

randy3k · 2016-07-04T21:10:17Z

b"\x90" here is not a valid char. However, since there is no conversion error, I guess it does make sense to print something like � instead.

julia> String(b"\x90");

julia> String(b"\x90")
"Error showing value of type String:
ERROR: BoundsError: attempt to access 1-element Array{UInt8,1} at index [0]
 in endof(::String) at ./strings/string.jl:40
 in done at ./strings/basic.jl:0 [inlined]
 in escape_string(::IOContext{Base.Terminals.TTYTerminal}, ::String, ::String) at ./strings/io.jl:137
 in print_quoted at ./strings/io.jl:156 [inlined]
 in show at ./strings/io.jl:0 [inlined]
 in show at ./replutil.jl:0 [inlined]
 in display(::Base.REPL.REPLDisplay{Base.REPL.LineEditREPL}, ::MIME{Symbol("text/plain")}, ::String) at ./REPL.jl:114
 in display(::Base.REPL.REPLDisplay{Base.REPL.LineEditREPL}, ::String) at ./REPL.jl:117
 in display(::String) at ./multimedia.jl:143
 in print_response(::Base.Terminals.TTYTerminal, ::Any, ::Void, ::Bool, ::Bool, ::Void) at ./REPL.jl:134
 in print_response(::Base.REPL.LineEditREPL, ::Any, ::Void, ::Bool, ::Bool) at ./REPL.jl:121
 in (::Base.REPL.##18#19{Bool,Base.REPL.##29#38{Base.REPL.LineEditREPL,Base.REPL.REPLHistoryProvider},Base.REPL.LineEditREPL,Base.LineEdit.Prompt})(::Base.LineEdit.MIState, ::Base.AbstractIOBuffer{Array{UInt8,1}}, ::Bool) at ./REPL.jl:631
 in run_interface(::Base.Terminals.TTYTerminal, ::Base.LineEdit.ModalInterface) at ./LineEdit.jl:1570
 in run_frontend(::Base.REPL.LineEditREPL, ::Base.REPL.REPLBackendRef) at ./REPL.jl:882
 in run_repl(::Base.REPL.LineEditREPL, ::Base.##598#599) at ./REPL.jl:167
 in _start() at ./client.jl:364

by the way, it is julia 0.5-dev

The text was updated successfully, but these errors were encountered:

TotalVerb · 2016-07-04T22:17:31Z

Do not use String(::Array{UInt8, 1}) unless you're sure the input data is valid UTF-8. The constructor does not do any checking.

Also, I disagree that � should be printed. It is better to raise an error early then to silently allow invalid UTF-8.

randy3k · 2016-07-04T22:28:30Z

I suggested the above since String(b"\xce") prints �

nalimilan · 2016-07-05T08:04:16Z

I suggested the above since String(b"\xce") prints �

@randy3k That's because it's not a valid continuation byte, so endof returns immediately the last index in the underlying array. OTOH, String(b"\x90") contains only a continuation byte, which makes endof go past the beginning of the array. I have a fix at #17276: with that change, String(b"\x90") prints as an empty string (since it contains no valid Unicode character).

Also, I disagree that � should be printed. It is better to raise an error early then to silently allow invalid UTF-8.

@TotalVerb I think the approach retained in Julia is that you're responsible for ensuring strings are valid on input (as you said). But then we do our best to accept broken strings, which is needed e.g. to work correctly with file paths (which are not necessarily valid Unicode). Anyway, checking strings everywhere would be too expensive (for example, endof would be much slower if it had to check whether String(b"\xce") contains at least one valid character).

StefanKarpinski · 2016-07-06T00:29:47Z

Whether String objects can hold arbitrary data or not is in a bit of a limbo state since the changes in #16107 have ended up spanning the 0.5 and 0.6 releases, so for now @nalimilan's fix is the way to go.

randy3k · 2016-07-06T03:31:57Z

Great. It solves the issue.

tkelman added the unicode Related to unicode characters and encodings label Jul 4, 2016

randy3k closed this as completed Jul 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug in printing invalid char #17271

bug in printing invalid char #17271

randy3k commented Jul 4, 2016 •

edited

Loading

TotalVerb commented Jul 4, 2016 •

edited

Loading

randy3k commented Jul 4, 2016 •

edited

Loading

nalimilan commented Jul 5, 2016

StefanKarpinski commented Jul 6, 2016

randy3k commented Jul 6, 2016

bug in printing invalid char #17271

bug in printing invalid char #17271

Comments

randy3k commented Jul 4, 2016 • edited Loading

TotalVerb commented Jul 4, 2016 • edited Loading

randy3k commented Jul 4, 2016 • edited Loading

nalimilan commented Jul 5, 2016

StefanKarpinski commented Jul 6, 2016

randy3k commented Jul 6, 2016

randy3k commented Jul 4, 2016 •

edited

Loading

TotalVerb commented Jul 4, 2016 •

edited

Loading

randy3k commented Jul 4, 2016 •

edited

Loading