Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Percentages stored as strings are not converted during formula calculations as they are in Excel #3155

Closed
1 of 8 tasks
fdjohnston opened this issue Nov 3, 2022 · 3 comments

Comments

@fdjohnston
Copy link
Contributor

This is:

- [x] a bug report
- [ ] a feature request
- [x] **not** a usage question (ask them on https://stackoverflow.com/questions/tagged/phpspreadsheet or https://gitter.im/PHPOffice/PhpSpreadsheet)

What is the expected behavior?

When converting a spreadsheet to an array using toArray(), formulas that reference a cell that contains a percentage stored as a string should still calculate, like they do in Excel.

What is the current behavior?

PHPSpreadsheet returns a #VALUE! error instead of converting the string to a float.

What are the steps to reproduce?

<?php

require __DIR__ . '/vendor/autoload.php';

//Create reader
$xlsxReader = \PhpOffice\PhpSpreadsheet\IOFactory::createReader('Xlsx');
$xlsxReader->setReadDataOnly(true);

//Load worksheet
$workbook = $xlsxReader->load('Book1.xlsx');
$worksheet = $workbook->getSheetByName('Sheet1');

//Convert to array
$data = $worksheet->toArray(null, true, false, false);

var_dump($data);
die();

Output:

array(2) {
  [0]=>
  array(2) {
    [0]=>
    string(7) "#VALUE!"
    [1]=>
    NULL
  }
  [1]=>
  array(2) {
    [0]=>
    int(100)
    [1]=>
    string(2) "2%"
  }
}

Expected output:

array(2) {
  [0]=>
    [0]=>
    float(2)
    [1]=>
    NULL
  }
  [1]=>
  array(2) {
    [0]=>
    int(100)
    [1]=>
    string(2) "2%"
  }
}

I have attached an XLSX test file.
Book1.xlsx

What features do you think are causing the issue

  • Reader
  • Writer
  • Styles
  • Data Validations
  • Formula Calculations
  • Charts
  • AutoFilter
  • Form Elements

The validateBinaryOperand() function in src/PhpSpreadsheet/Calculation/Calculation.php already supports converting fractions stored as strings to numbers. I think adding a function here to convert percentages stored as strings to numbers could solve this problem.

Does an issue affect all spreadsheet file formats? If not, which formats are affected?

Only tested on XLSX files, but based on my reading of the code this will likely affect all spreadsheet formats.

Which versions of PhpSpreadsheet and PHP are affected?

PHPSpreadsheet 1.25.2

I would be happy to submit a PR to address this issue if the maintainers agree it is a bug and should be fixed.

@fdjohnston fdjohnston changed the title [Bug] Percentages stored as string are not converted during formula calculations as they are in Excel [Bug] Percentages stored as strings are not converted during formula calculations as they are in Excel Nov 3, 2022
@MarkBaker
Copy link
Member

Yes, this is a "bug", but it's a lot more complicated once you start to look at it in detail.

This applies to a lot more than just percentages: Excel also behaves that way with strings containing currencies, dates/times, fractions, etc. And for float values, it's locale-aware. Nor is it completely consistent. And it's also incredibly complicated, and a performance overhead when it's necessary to test every string value encountered in a calculation to see if it could match an Excel number format, and to convert it to a number if it can be converted.

It's also not something that occurs frequently, so I made a decision many years ago that I wouldn't try implementing the logic for it: given the complexity, and the more frequent requests to improve performance, add new features, improve handline for Excel functions, add charts and tables, support more file formats, etc, so those changes have always had priority.
In 16 years of PHPExcel and PhpSpreadsheet, you're the first person to notice.

@MarkBaker
Copy link
Member

MarkBaker commented Nov 4, 2022

Feel free to submit a PR, but the logic for handling this in validateBinaryOperand() should probably be factored out into a dedicated helper class (perhaps in Calculation\Engine), the logic for convertToNumberIfFraction() moved out from the Shared String Helper, which then makes it easier to build on that in the future.

@fdjohnston
Copy link
Contributor Author

Thanks for the feedback, and all your work on PHPSpreadsheet. I didn't consider the other cases as this is the particular case that I was running into. I couldn't figure out why the values I was getting out of toArray() weren't lining up with what I was seeing in the spreadsheet. One of the rare times where Excel "magic" actually helps instead of randomly breaking things.
I wasn't trying to be critical when calling it a bug; I guess feature request might have been better. This is my first time trying to contribute to this project and I was trying to follow the Contribution guidelines, perhaps a little too carefully.

In the patch I've written I added a convertToNumberIfPercent() function to the shared string helper and chained it in the same elseif with convertToNumberIfFraction(). Probably not the ideal solution but it works for this case.

I'll submit the PR and if you have some thoughts on how it could be improved or refactored I'd love to hear them.

MarkBaker added a commit that referenced this issue Dec 21, 2022
### Added

- Extended flag options for the Reader `load()` and Writer `save()` methods
- Apply Row/Column limits (1048576 and XFD) in ReferenceHelper [PR #3213](#3213)
- Allow the creation of In-Memory Drawings from a string of binary image data, or from a stream. [PR #3157](#3157)
- Xlsx Reader support for Pivot Tables [PR #2829](#2829)
- Permit Date/Time Entered on Spreadsheet to be calculated as Float [Issue #1416](#1416) [PR #3121](#3121)

### Changed

- Nothing

### Deprecated

- Direct update of Calculation::suppressFormulaErrors is replaced with setter.
- Font public static variable defaultColumnWidths replaced with constant DEFAULT_COLUMN_WIDTHS.
- ExcelError public static variable errorCodes replaced with constant ERROR_CODES.
- NumberFormat constant FORMAT_DATE_YYYYMMDD2 replaced with existing identical FORMAT_DATE_YYYYMMDD.

### Removed

- Nothing

### Fixed

- Fixed handling for `_xlws` prefixed functions from Office365 [Issue #3245](#3245) [PR #3247](#3247)
- Conditionals formatting rules applied to rows/columns are removed [Issue #3184](#3184) [PR #3213](#3213)
- Treat strings containing currency or accounting values as floats in Calculation Engine operations [Issue #3165](#3165) [PR #3189](#3189)
- Treat strings containing percentage values as floats in Calculation Engine operations [Issue #3155](#3155) [PR #3156](#3156) and [PR #3164](#3164)
- Xlsx Reader Accept Palette of Fewer than 64 Colors [Issue #3093](#3093) [PR #3096](#3096)
- Use Locale-Independent Float Conversion for Xlsx Writer Custom Property [Issue #3095](#3095) [PR #3099](#3099)
- Allow setting AutoFilter range on a single cell or row [Issue #3102](#3102) [PR #3111](#3111)
- Xlsx Reader External Data Validations Flag Missing [Issue #2677](#2677) [PR #3078](#3078)
- Reduces extra memory usage on `__destruct()` calls [PR #3092](#3092)
- Additional properties for Trendlines [Issue #3011](#3011) [PR #3028](#3028)
- Calculation suppressFormulaErrors fix [Issue #1531](#1531) [PR #3092](#3092)
- Permit Date/Time Entered on Spreadsheet to be Calculated as Float [Issue #1416](#1416) [PR #3121](#3121)
- Incorrect Handling of Data Validation Formula Containing Ampersand [Issue #3145](#3145) [PR #3146](#3146)
- Xlsx Namespace Handling of Drawings, RowAndColumnAttributes, MergeCells [Issue #3138](#3138) [PR #3136](#3137)
- Generation3 Copy With Image in Footer [Issue #3126](#3126) [PR #3140](#3140)
- MATCH Function Problems with Int/Float Compare and Wildcards [Issue #3141](#3141) [PR #3142](#3142)
- Fix ODS Read Filter on number-columns-repeated cell [Issue #3148](#3148) [PR #3149](#3149)
- Problems Formatting Very Small and Very Large Numbers [Issue #3128](#3128) [PR #3152](#3152)
- XlsxWrite preserve line styles for y-axis, not just x-axis [PR #3163](#3163)
- Xlsx Namespace Handling of Drawings, RowAndColumnAttributes, MergeCells [Issue #3138](#3138) [PR #3137](#3137)
- More Detail for Cyclic Error Messages [Issue #3169](#3169) [PR #3170](#3170)
- Improved Documentation for Deprecations - many PRs [Issue #3162](#3162)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants