Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-US number formats: delim = ',' is not handled well #1150

Open
roland-KA opened this issue Dec 17, 2024 · 3 comments
Open

Non-US number formats: delim = ',' is not handled well #1150

roland-KA opened this issue Dec 17, 2024 · 3 comments

Comments

@roland-KA
Copy link

In most European countries (and elsewhere) the comma is used a decimal delimiter. At the same time the dot is used as a thousands separator. I.e. you see typically numbers in the following form (especially for currency values) :

5.840,72
5.860,72
4.593,58
5.335,72
5.981,53

If you export such numbers from Excel to a CSV file, this exact format is retained. If that CSV file is read by CSV.jl with delim = ',', the values are not recognised as numbers (but as Strings) and you have to manually replace the separators and convert all data to numbers. As this is such a common use case, it is really annoying.

Therefore CSV.jl should act more "intelligent" in this case (like similar CSV-packages in other programming languages do) and recognise these values as numbers, if delim = ',' is used.

@quinnj
Copy link
Member

quinnj commented Dec 17, 2024

It's not entirely clear what you're proposing be changed; in the example you provided, it seems to me just as likely that the format is american where there are 2 columns, the first being float numbers (5.840) and then a 2nd column of integers (72). How is CSV.jl supposed to be "more intelligent" and know whether things should be interpreted as european vs. american format in this case? Especially when the delimiter is provided as ,, which would suggest 2 columns.

You can provide the decimal and groupmark keyword arguments (which default to . and ,, respectively), to control the parsing of values here.

@roland-KA
Copy link
Author

Ah I see, I should have explained, that in this context (i.e. delim = ',') the field separator is always a semicolon (that's the reason for not using a comma!). That's the 100% standard in Europe and elsewhere (which is also produced by Excel for example). Therefore it would be wonderful to have this situation covered by CVS.jl (I've just tried Kotlin dataframes which have no problems reading CSV-files in this format).

@roland-KA
Copy link
Author

And I've tried now the groupmark-parameter. That's exactly what I've missed.

Perhaps it is just a matter of documentation. It would be helpful to list this common combination (`delim = ';', decimal = ',', groupmark = '.') in the examples section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants