Better handling of the UnicodeDecodeError exception. #102

TunaCici · 2023-07-05T11:07:05Z

The function decodeString(...) does not handle the UnicodeDecodeError exception very well. Although trying again /w errors ignored might work in theory; in practice it does NOT. Some special characters (e.g. ü, ş, ç. ö) raises another exception and it fails the whole proccesss. Instead of trying to ignore the error, we should try to fix it.

I was getting similar errors mentioned in #81 and #35. The fix by @faisal-hameed in #81 uses Regex to "filter out" any non-ASCII characters. The idea is good, but regex is heavy (CPU time & Memory). When I tried /w Python 2.7.18 on my Windows 10 VM Machine /w Intel Xeon E-2236 and 16GiB of memory, the program runs for a few seconds and then crashes. I believe this is due to how the re regex library in Python 2.7.x works.

The ClearCase view I was trying is relatively old (3-5 years). So the encodedstr is pretty long and Python 2.7.x's regex just can't keep up with it on my host machine.

A relatively "better" solution is to use Python's native join(...) operation and convert EVERY character in encodedstr to ASCII characters. It works by checking each character's decimal value using ord(...). If the char values is less than 128 (max ASCII char value), then it is kept. If not then we just ignore it.

This way we are manually converting from ANY Unicode string to ASCII string. I'm sure there are better ways to handle the UnicodeDecodeError exception, but this one seemed the most trivial solution and it just Works™.

If there are anyone experiencing the same error mentioned in #35 and #81. Try using this patch.

Hope that this helps <3

charleso · 2023-07-07T06:22:40Z

Thanks @TunaCici! I'm sorry my python knowledge and decoding was so poor !

Convert to ASCII char-by-char instead of trying again /w errors ignored.

f3b2329

charleso merged commit 4ff7f90 into charleso:master Jul 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better handling of the UnicodeDecodeError exception. #102

Better handling of the UnicodeDecodeError exception. #102

TunaCici commented Jul 5, 2023 •

edited

Loading

charleso commented Jul 7, 2023

Better handling of the UnicodeDecodeError exception. #102

Better handling of the UnicodeDecodeError exception. #102

Conversation

TunaCici commented Jul 5, 2023 • edited Loading

charleso commented Jul 7, 2023

TunaCici commented Jul 5, 2023 •

edited

Loading