Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent Writer/Xlsx/Chart.php corrupting Excel scattercharts (& maybe other number type charts) #2817

Closed
1 of 8 tasks
bridgeplayr opened this issue May 9, 2022 · 3 comments · Fixed by #2841
Closed
1 of 8 tasks
Labels

Comments

@bridgeplayr
Copy link

This is:

- [x] a bug report
- [ ] a feature request
- [ ] **not** a usage question (ask them on https://stackoverflow.com/questions/tagged/phpspreadsheet or https://gitter.im/PHPOffice/PhpSpreadsheet)

What is the expected behavior?

produce xlsx Excel charts via Phpspreadsheet methods which Excel can read

What is the current behavior?

Excel refuses to display scattercharts, but will offer to open a spreadsheet after removing all charts. A pop-up window from Excel states something like "We found a problem with some content in 'xxxxx.xlsx'. Do you want us to try to recover as much as
we can? If you trust the source of this workbook, click Yes."
If you click on 'yes', another pop-up box states in the banner "Excel was able to open the file by repairing or removing the unreadable content", and this message in the window, "Removed Part: /xl/drawings/drawing1.xml part. (Drawing shape)"
An accompanying logfile 'C:\Users\xxx\AppData\Local\Temp\error101760_nn.xml'
contains (xml delimiters removed)

xml version="1.0" encoding="UTF-8" standalone="yes"
recoveryLog
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
logFileName>error101760_09.xml</logFileName
summary>Errors were detected in file 'H:\Documents\bridge\BridgeDB\code\4934180_Charts.xlsx'</summary
removedParts
removedPart>Removed Part: /xl/drawings/drawing1.xml part. (Drawing shape)</removedPart
/removedParts
/recoveryLog

What are the steps to reproduce?

Execute phpspreadsheet/samples/Chart/33_Chart_create_scatter.php and try to open the spreadsheet in Excel.

What features do you think are causing the issue

  • Reader
  • Writer
  • Styles
  • Data Validations
  • Formula Calulations
  • Charts
  • AutoFilter
  • Form Elements

Does an issue affect all spreadsheet file formats? If not, which formats are affected?

xlsx scattercharts affected

Which versions of PhpSpreadsheet and PHP are affected?

1.23.0
7.4.29

@bridgeplayr
Copy link
Author

bridgeplayr commented May 9, 2022

Modify Phpspreadsheet/Writer/Xlsx/Chart.php:
I failed in my attempts to insert the code snippets. Attached file contains my code suggestions.
HowToPreventPhpSpreadsheetFromCorruptingCharts-File-WriterXlsxChart.txt

The problem is simple. The x-axis in some charts contains string ("category") data, while the x-axis in other charts (viz. scattercharts and bubblecharts) contains numeric ("value") data. The distinction is important in function writeCategoryAxis(), which must insert characteristic 'c:catAx' for string data, but 'c:valAx' for numeric data. (N.B. you must examine the last parameter, "$yaxis" because this function is called when writing BOTH the xAxis AND the yAxis in function writePlotGroup()). I suggest examining the FORMAT_CODE_type of the axis, but I am at a loss to suggest what to do if the data format is "General". The default of 'c:catAx' is probably ok.

The issue of handling both string and numeric data must also be decided correctly in function writePlotGroup() where the choice of string data vs value data requires inserting either 'c:xVal' or 'c:cat'. This distinction requires additional testing of the X-axis (Category) label's dataseriesvalues datatype :

$CategoryDatatype = $plotSeriesCategory->getDataType();
If the type is numeric then
$this->writePlotSeriesValues($plotSeriesCategory, $objWriter, $groupType, 'num');
otherwise,
$this->writePlotSeriesValues($plotSeriesCategory, $objWriter, $groupType, 'str');

@oleibman
Copy link
Collaborator

oleibman commented May 9, 2022

I don't see a corruption problem with 33_Chart_create_scatter - same release of PhpSpreadsheet and Php as you, Excel 365 Apps for Enterprise (see attached image). We are aware that there are some formatting problems with scatter charts, which might or might not apply to 33*, and which might or might not be addressed by your proposed fix - please let me know if you think either is the case. Also, please elaborate on "failed in my attempt to insert the code snippets"; perhaps we can figure out what you need to do to get your code changes and tests to succeed.
image

@oleibman oleibman added the charts label May 9, 2022
@bridgeplayr
Copy link
Author

You are correct. [Blush] Here is a modified version of the 33_Chart_create_scatter which demonstrates the issue. (NOW I see how to insert code.) I changed the data table to demonstrate a trend chart of 3 measurements taken at 4 dates. I changed the output spreadsheet to '33_Chart_create_scatterModified.xlsx'. Each 'metric' measurement (yAxis) now has an accompanying X-axis date - so there are 3 xAxis dataseries instead of only 1. Each data point now has two coordinates - one xAxis value and one yAxis value. (This matches my idea of a scatterchart.)

The chart '33_Chart_create_scatterModified.xlsx' produced in the modified script will be removed by Excel as described in my first post. Adding my proposed code changes to Phpspreadsheet/Writer/Xlsx/Chart.php causes the following entries in the spreadsheet's "xl/charts/chart1.xml" file and makes it acceptable to Excel:

WAS                        IS
<c:xVal>               (same)
<c:strRef>           <c:numRef>
<c:f>Worksheet:$A2:$A5   (same)
</c:f>                  (same)
<c:strCache>      <c:numCache>

(The matching terminating entities also are changed from 'str' to 'num'.)

Each of the 3 xAxis date-datasets are identically mischaracterized as 'str' instead of 'num'.
(Excel still refuses to display xAxis as dates until you manually change axis datatype from General to Date. Grr. Another issue to address... My code change somehow ignores the xAxis FORMAT_CODE_DATE designation. Sigh.)

Here is the modified 33_Chart_create_scatter.php

$spreadsheet = new Spreadsheet();
$worksheet = $spreadsheet->getActiveSheet();
// changed data to simulate a trend chart - Xaxis are dates; Yaxis are 3 meausurements from each date
$worksheet->fromArray(
    [
        ['',              'metric1', 'metric2', 'metric3'],
        ['=DATEVALUE("1/1/2021")', 12.1, 15.1, 21.1],
        ['=DATEVALUE("1/4/2021")', 56.2, 73.2, 86.2],
        ['=DATEVALUE("1/7/2021")', 52.2, 61.2, 69.2],
        ['=DATEVALUE("1/10/2021")', 30.2, 32.2, 0.2],
    ]
);

// Set the Labels for each data series we want to plot
//     Datatype
//     Cell reference for data
//     Format Code
//     Number of datapoints in series
//     Data values
//     Data Marker
$dataSeriesLabels = [
    new DataSeriesValues(DataSeriesValues::DATASERIES_TYPE_STRING, 'Worksheet!$B$1', null, 1), // was 2010
    new DataSeriesValues(DataSeriesValues::DATASERIES_TYPE_STRING, 'Worksheet!$C$1', null, 1), // was 2011
    new DataSeriesValues(DataSeriesValues::DATASERIES_TYPE_STRING, 'Worksheet!$D$1', null, 1), // was 2012
];
// Set the X-Axis Labels
// changed from STRING to NUMBER
// added 2 additional x-axis values associated with each of the 3 metrics
// added FORMATE_CODE_NUMBER
$xAxisTickValues = [
    //new DataSeriesValues(DataSeriesValues::DATASERIES_TYPE_STRING, 'Worksheet!$A$2:$A$5', null, 4), // Q1 to Q4
    new DataSeriesValues(DataSeriesValues::DATASERIES_TYPE_NUMBER, 'Worksheet!$A$2:$A$5', Properties::FORMAT_CODE_DATE, 4),
    new DataSeriesValues(DataSeriesValues::DATASERIES_TYPE_NUMBER, 'Worksheet!$A$2:$A$5', Properties::FORMAT_CODE_DATE, 4),
    new DataSeriesValues(DataSeriesValues::DATASERIES_TYPE_NUMBER, 'Worksheet!$A$2:$A$5', Properties::FORMAT_CODE_DATE, 4),
];
// Set the Data values for each data series we want to plot
//     Datatype
//     Cell reference for data
//     Format Code
//     Number of datapoints in series
//     Data values
//     Data Marker
// added FORMAT_CODE_NUMBER
$dataSeriesValues = [
    new DataSeriesValues(DataSeriesValues::DATASERIES_TYPE_NUMBER, 'Worksheet!$B$2:$B$5', Properties::FORMAT_CODE_NUMBER, 4),
    new DataSeriesValues(DataSeriesValues::DATASERIES_TYPE_NUMBER, 'Worksheet!$C$2:$C$5', Properties::FORMAT_CODE_NUMBER, 4),
    new DataSeriesValues(DataSeriesValues::DATASERIES_TYPE_NUMBER, 'Worksheet!$D$2:$D$5', Properties::FORMAT_CODE_NUMBER, 4),
];
  // Added so that Xaxis shows dates instead of Excel-equivalent-year1900-numbers
$xAxis = new Axis();
$xAxis->setAxisNumberProperties(Properties::FORMAT_CODE_DATE );

// Build the dataseries
$series = new DataSeries(
    DataSeries::TYPE_SCATTERCHART, // plotType
    null, // plotGrouping (Scatter charts don't have any grouping)
    range(0, count($dataSeriesValues) - 1), // plotOrder
    $dataSeriesLabels, // plotLabel
    $xAxisTickValues, // plotCategory
    $dataSeriesValues, // plotValues
    null, // plotDirection
    null, // smooth line
    //DataSeries::STYLE_LINEMARKER  // plotStyle
    DataSeries::STYLE_MARKER  // plotStyle
);

// Set the series in the plot area
$plotArea = new PlotArea(null, [$series]);
// Set the chart legend
$legend = new ChartLegend(ChartLegend::POSITION_TOPRIGHT, null, false);

$title = new Title('Test Scatter Trend Chart');
$yAxisLabel = new Title('Value ($k)');

// Create the chart
$chart = new Chart(
    'chart1', // name
    $title, // title
    $legend, // legend
    $plotArea, // plotArea
    true, // plotVisibleOnly
    DataSeries::EMPTY_AS_GAP, // displayBlanksAs
    null, // xAxisLabel
    $yAxisLabel,  // yAxisLabel
    // added xAxis for correct date display
    $xAxis, // xAxis
);

// Set the position where the chart should appear in the worksheet
$chart->setTopLeftPosition('A7');
$chart->setBottomRightPosition('H20');

// Add the chart to the worksheet
$worksheet->addChart($chart);

// Save Excel 2007 file
$filename = '33_Chart_create_scatterModified.xlsx';
$writer = IOFactory::createWriter($spreadsheet, 'Xlsx');
$writer->setIncludeCharts(true);
$callStartTime = microtime(true);
$writer->save($filename);
`

oleibman added a commit to oleibman/PhpSpreadsheet that referenced this issue May 20, 2022
Fix PHPOffice#2817, where @bridgeplayr gives an excellent description of the problem and how it should be solved.
@oleibman oleibman mentioned this issue May 20, 2022
7 tasks
oleibman added a commit that referenced this issue May 21, 2022
* More Chart Fixes

Taking up where #2828 left off. Most of the following changes are demonstrated in 32readwriteChartWithImages1:
- Adds support for "scheme" colors (because rgb, theme, and index colors just weren't enough for Excel) for DataSeriesValues. See issue #2299.
- For chart titles (including axis labels), rather than a font name, Excel supplies a 3-fold series of font names for Latin, East Asian, and Complex Scripts. New properties `latin`, `eastAsian`, and `complexScript` are added to the Font class. I frankly have no idea how, or even if, you can set these in Excel; my test case (sample 32readwriteScatterChart7) is a result of manually editing the XML.
- Add support for subscript/superscript to chart titles. This requires a new property `baseLine` in Font (positive=superscript negative=subscript baseline value says how high/low).
- Support for underscore with different scheme color than its text, using a new string property `uSchemeClr` in Font.
- Support for extra options for strikethrough, using a new string property `strikeType` in Font.
- Support for extra options for underscore type, using the existing string property `underline` in Font.
- I do not anticipate that any of the new Font properties will be used except for chart titles.
- If no default font overrides are found for a Rich Text element in chart titles, and no explicit font overrides are found for a Run under such an element, the font element of the Run is set to null.
- PhpSpreadsheet will always write a tag `a:pPr` and, underneath that, an empty tag `a:defRPr`, for default font settings for chart titles and axis labels. Combined with the previous bullet item, this will prevent PhpSpreadsheet from inadvertently overriding the Excel defaults (18 point bold Calibri for chart title, 10 point bold Calibri for axis labels).
- Axis labels will now be written to XML in the same manner as chart titles. Among other considerations, this means that they can now have colors. Fix #2700. Supersedes PR #2701. Demonstrated in sample 32readwriteStockChart5.

* Fix Some Chart Corruption

Fix #2817, where @bridgeplayr gives an excellent description of the problem and how it should be solved.

* Fix Bubble Charts

Sample produced corrupt output - see issue #2763. After a lot of research, solution was just re-ordering of parameters in a single function call.

Bubble 3D had not been supported at all. It is now.

Surface Charts remain corrupted.
MarkBaker added a commit that referenced this issue Jul 9, 2022
Note that this will be the last 1.x branch release before the 2.x release. We will maintain both branches in parallel for a time; but users are requested to update to version 2.0 once that is fully available.

### Added

- Added `removeComment()` method for Worksheet [PR #2875](https://github.com/PHPOffice/PhpSpreadsheet/pull/2875/files)
- Add point size option for scatter charts [Issue #2298](#2298) [PR #2801](#2801)
- Basic support for Xlsx reading/writing Chart Sheets [PR #2830](#2830)

  Note that a ChartSheet is still only written as a normal Worksheet containing a single chart, not as an actual ChartSheet.

- Added Worksheet visibility in Ods Reader [PR #2851](#2851) and Gnumeric Reader [PR #2853](#2853)
- Added Worksheet visibility in Ods Writer [PR #2850](#2850)
- Allow Csv Reader to treat string as contents of file [Issue #1285](#1285) [PR #2792](#2792)
- Allow Csv Reader to store null string rather than leave cell empty [Issue #2840](#2840) [PR #2842](#2842)
- Provide new Worksheet methods to identify if a row or column is "empty", making allowance for different definitions of "empty":
  - Treat rows/columns containing no cell records as empty (default)
  - Treat cells containing a null value as empty
  - Treat cells containing an empty string as empty

### Changed

- Modify `rangeBoundaries()`, `rangeDimension()` and `getRangeBoundaries()` Coordinate methods to work with row/column ranges as well as with cell ranges and cells [PR #2926](#2926)
- Better enforcement of value modification to match specified datatype when using `setValueExplicit()`
- Relax validation of merge cells to allow merge for a single cell reference [Issue #2776](#2776)
- Memory and speed improvements, particularly for the Cell Collection, and the Writers.

  See [the Discussion section on github](#2821) for details of performance across versions
- Improved performance for removing rows/columns from a worksheet

### Deprecated

- Nothing

### Removed

- Nothing

### Fixed

- Xls Reader resolving absolute named ranges to relative ranges [Issue #2826](#2826) [PR #2827](#2827)
- Null value handling in the Excel Math/Trig PRODUCT() function [Issue #2833](#2833) [PR #2834](#2834)
- Invalid Print Area defined in Xlsx corrupts internal storage of print area [Issue #2848](#2848) [PR #2849](#2849)
- Time interval formatting [Issue #2768](#2768) [PR #2772](#2772)
- Copy from Xls(x) to Html/Pdf loses drawings [PR #2788](#2788)
- Html Reader converting cell containing 0 to null string [Issue #2810](#2810) [PR #2813](#2813)
- Many fixes for Charts, especially, but not limited to, Scatter, Bubble, and Surface charts. [Issue #2762](#2762) [Issue #2299](#2299) [Issue #2700](#2700) [Issue #2817](#2817) [Issue #2763](#2763) [Issue #2219](#2219) [Issue #2863](#2863) [PR #2828](#2828) [PR #2841](#2841) [PR #2846](#2846) [PR #2852](#2852) [PR #2856](#2856) [PR #2865](#2865) [PR #2872](#2872) [PR #2879](#2879) [PR #2898](#2898) [PR #2906](#2906) [PR #2922](#2922) [PR #2923](#2923)
- Adjust both coordinates for two-cell anchors when rows/columns are added/deleted. [Issue #2908](#2908) [PR #2909](#2909)
- Keep calculated string results below 32K. [PR #2921](#2921)
- Filter out illegal Unicode char values FFFE/FFFF. [Issue #2897](#2897) [PR #2910](#2910)
- Better handling of REF errors and propagation of all errors in Calculation engine. [PR #2902](#2902)
- Calculating Engine regexp for Column/Row references when there are multiple quoted worksheet references in the formula [Issue #2874](#2874) [PR #2899](#2899)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

2 participants