Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writer/Xls/Parser::advance Should Parse by Character #4344

Merged
merged 3 commits into from
Feb 7, 2025

Conversation

oleibman
Copy link
Collaborator

@oleibman oleibman commented Feb 7, 2025

It currently parsed by byte, which is not a good thing in a UTF-8 system. See the discussion at the bottom of PR #4203. That turns out to not be the change which led to this problem; that would have been PR #4323. That change came about because using Composer/Pcre revealed bugs in several regular expressions used in Writer/Xls/Parser. This ticket comes about because more bugs were revealed in the same module. The problem is that the advance method needs to process formulas character by character, but is instead doing it byte by byte. It is changed to advance by characters, and tests for non-ASCII characters are added.

This is:

  • a bugfix
  • a new feature
  • refactoring
  • additional unit tests

Checklist:

  • Changes are covered by unit tests
    • Changes are covered by existing unit tests
    • New unit tests have been added
  • Code style is respected
  • Commit message explains why the change is made (see https://github.com/erlang/otp/wiki/Writing-good-commit-messages)
  • CHANGELOG.md contains a short summary of the change and a link to the pull request if applicable
  • Documentation is updated as necessary

It currently parsed by byte, which is not a good thing in a UTF-8 system. See the discussion at the bottom of PR PHPOffice#4203. That turns out to not be the change which led to this problem; that would have been PR PHPOffice#4323. That change came about because using Composer/Pcre revealed bugs in several regular expressions used in Writer/Xls/Parser. This ticket comes about because more bugs were revealed in the same module. The problem is that the `advance` method needs to process formulas character by character, but is instead doing it byte by byte. It is changed to advance by characters, and tests for non-ASCII characters are added.
oleibman added a commit that referenced this pull request Feb 7, 2025
oleibman added a commit that referenced this pull request Feb 7, 2025
@siarheipashkevich
Copy link

@oleibman please create 1.29.10 release which will include this PR.

@oleibman oleibman enabled auto-merge February 7, 2025 16:57
@oleibman oleibman added this pull request to the merge queue Feb 7, 2025
Merged via the queue into PHPOffice:master with commit 1be70e9 Feb 7, 2025
13 of 14 checks passed
@oleibman oleibman deleted the parseutf8 branch February 7, 2025 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants