Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mime_content_type for CSV can be return 'text/html' #811

Closed
nevigen opened this issue Dec 12, 2018 · 2 comments
Closed

mime_content_type for CSV can be return 'text/html' #811

nevigen opened this issue Dec 12, 2018 · 2 comments

Comments

@nevigen
Copy link

nevigen commented Dec 12, 2018

This is:

- [x] a bug report

When CSV file contains fields with some html tags (for example "<a href=""..."">link</a>"), mime_content_type return 'text/html', not 'text/plain'. It need to add 'text/html' to array $supportedTypes in \PhpOffice\PhpSpreadsheet\Reader\Csv.php

        $type = mime_content_type($pFilename);
        $supportedTypes = [
            'text/html',
            'text/csv',
            'text/plain',
            'inode/x-empty',
        ];

        return in_array($type, $supportedTypes, true);

Which versions of PhpSpreadsheet and PHP are affected?

php 7 and last PhpSpreadsheet

@PowerKiKi
Copy link
Member

Seems to duplicate #564.

If your file has a csv extension it will be read, otherwise we try our best to guess, but it's only that a guess. If you know it's a CSV, then create the reader yourself, instead of relying on IOFactory guesswork.

@dmitrystas
Copy link

File has CSV extension, but mime_content_type return 'text/html'

oleibman added a commit to oleibman/PhpSpreadsheet that referenced this issue May 22, 2024
Fix PHPOffice#4036. The issue was originally reported as PHPOffice#564 (and PHPOffice#811) and fixed for the most part, but this is a variation that was not covered by the original. Cells with html fragments can cause `mime_content_type` to identify the file as `text\html`. Original fix was to ignore mime_content_type when file extension is 'csv' or 'tsv'. However, if the file does not have one of those extensions, it will be rejected by Csv Reader as invalid mimetype. This PR adds text\html to the list of valid mimetypes.

I imagine that this type of problem might occur for other mimetypes. If any of those are reported in future, it might be better to just add a "suppress mimetype" check option, rather than extending the list forever. Html is unusual in that its rules are so lax, which is why it seems appropriate to add it here.

Note that IOFactory may still identify a file as Html even when intended as Csv. The sample associated with this issue does not fall into this category, but one of the unit tests on this ticket does. The file will still be read correctly by Csv Reader, but IOFactory load may cause it to use Html Reader instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants