glob("*") does not support matching non-utf8 filenames #11916

lilyball · 2014-01-29T23:12:35Z

glob::glob() does not have any support right now for matching non-utf8 filenames. Not only are its patterns restricted to strings, but it also explicitly skips any non-utf8 filenames it encounters (which should at least be able to match a * pattern).

Tasks that need to be done:

glob() needs to accept both strings and byte-vectors. It can do this using std::path::BytesContainer
glob() needs to process its pattern as a byte vector instead of a string, which will allow it to process filenames as byte vectors. This includes matching non-utf8 filenames against * and ? tokens (for the latter, matching a single byte is appropriate; ideally, it would match however many bytes are supposed to be consumed to create a U+FFFD REPLACEMENT CHARACTER as per the unicode standard)

This is a sub-task of #9639.

The text was updated successfully, but these errors were encountered:

lilyball · 2014-01-29T23:14:29Z

An alternative approach is to wait until std::str::from_utf8() has the capability of replacing invalid byte sequences with U+FFFD REPLACEMENT CHARACTER, then simply using that to match against the string pattern. This is deficient in two ways:

You cannot specify a pattern that intentionally wants to match against a particular non-utf8 sequence, and
Patterns that embed literal U+FFFD REPLACEMENT CHARACTERs should not match arbitrary non-utf8 sequences.

For this reason, the approach outlined in the issue description is recommended.

lilyball · 2014-01-29T23:24:07Z

Test case from @flaper87:

#[test]
#[cfg(not(windows))]
fn test_non_utf8_glob() {
    let dir = tempfile::TempDir::new("").unwrap();
    let p = dir.path().join(&[0xFFu8]);
    fs::mkdir(&p, S_IRWXU as u32);

    let pat = p.with_filename("*");
    assert_eq!(glob(pat.as_str().expect("tmpdir is not utf-8")).collect::<~[Path]>(), ~[p])
}

This also needs to be disabled on OS X, although perhaps we should do the opposite and simply enable it for linux.

flaper87 · 2014-01-29T23:47:59Z

@kballard Thanks for putting this together. As discussed on IRC, I'm setting the mentor tag on you 😄

flaper87 · 2014-01-30T21:50:51Z

@kballard I'll work on this

alexcrichton · 2015-01-05T08:27:18Z

cc @nick29581, could this move to rust-lang/globs?

rust-highfive · 2015-01-05T08:46:07Z

This issue has been moved to the RFCs repo: rust-lang/glob#23

alexcrichton · 2015-01-05T08:47:54Z

Thanks!

This was referenced Jan 29, 2014

Clients of the new path need to be updated for non-utf8 paths #9639

Closed

extra::test::test_lots_of_files appears to fail if there are any non-utf8 file names in the top 4 sublevels of / #9406

Closed

lilyball mentioned this issue Jan 29, 2014

Add test cases for non-utf8 paths #11872

Closed

ghost assigned flaper87 Jan 30, 2014

flaper87 mentioned this issue Feb 1, 2014

Add non-ut8 support to glob #11972

Closed

pzol added the A-unicode label Feb 26, 2014

flaper87 removed their assignment Apr 6, 2014

rust-highfive mentioned this issue Jan 5, 2015

glob("*") does not support matching non-utf8 filenames rust-lang/glob#23

Open

2 tasks

rust-highfive closed this as completed Jan 5, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

glob("*") does not support matching non-utf8 filenames #11916

glob("*") does not support matching non-utf8 filenames #11916

lilyball commented Jan 29, 2014

lilyball commented Jan 29, 2014

lilyball commented Jan 29, 2014

flaper87 commented Jan 29, 2014

flaper87 commented Jan 30, 2014

alexcrichton commented Jan 5, 2015

rust-highfive commented Jan 5, 2015

alexcrichton commented Jan 5, 2015

glob("*") does not support matching non-utf8 filenames #11916

glob("*") does not support matching non-utf8 filenames #11916

Comments

lilyball commented Jan 29, 2014

lilyball commented Jan 29, 2014

lilyball commented Jan 29, 2014

flaper87 commented Jan 29, 2014

flaper87 commented Jan 30, 2014

alexcrichton commented Jan 5, 2015

rust-highfive commented Jan 5, 2015

alexcrichton commented Jan 5, 2015