Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling of the UnicodeDecodeError exception. #102

Merged
merged 1 commit into from
Jul 7, 2023

Conversation

TunaCici
Copy link
Contributor

@TunaCici TunaCici commented Jul 5, 2023

The function decodeString(...) does not handle the UnicodeDecodeError exception very well. Although trying again /w errors ignored might work in theory; in practice it does NOT. Some special characters (e.g. ü, ş, ç. ö) raises another exception and it fails the whole proccesss. Instead of trying to ignore the error, we should try to fix it.

I was getting similar errors mentioned in #81 and #35. The fix by @faisal-hameed in #81 uses Regex to "filter out" any non-ASCII characters. The idea is good, but regex is heavy (CPU time & Memory). When I tried /w Python 2.7.18 on my Windows 10 VM Machine /w Intel Xeon E-2236 and 16GiB of memory, the program runs for a few seconds and then crashes. I believe this is due to how the re regex library in Python 2.7.x works.

The ClearCase view I was trying is relatively old (3-5 years). So the encodedstr is pretty long and Python 2.7.x's regex just can't keep up with it on my host machine.

A relatively "better" solution is to use Python's native join(...) operation and convert EVERY character in encodedstr to ASCII characters. It works by checking each character's decimal value using ord(...). If the char values is less than 128 (max ASCII char value), then it is kept. If not then we just ignore it.

This way we are manually converting from ANY Unicode string to ASCII string. I'm sure there are better ways to handle the UnicodeDecodeError exception, but this one seemed the most trivial solution and it just Works™.

If there are anyone experiencing the same error mentioned in #35 and #81. Try using this patch.

Hope that this helps <3

@charleso charleso merged commit 4ff7f90 into charleso:master Jul 7, 2023
@charleso
Copy link
Owner

charleso commented Jul 7, 2023

Thanks @TunaCici! I'm sorry my python knowledge and decoding was so poor !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants