Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-67230: add quoting rules to csv module #29469

Merged
merged 20 commits into from
Apr 12, 2023

Conversation

smontanaro
Copy link
Contributor

@smontanaro smontanaro commented Nov 8, 2021

Add two quoting styles for csv dialects.
They will help to work with certain databases in particular.

Automerge-Triggered-By: GH:merwok

@smontanaro
Copy link
Contributor Author

I looked to try and find the real people behind the BPO accounts "samwyse" and "krypten" but couldn't. I have no idea how to verify at least krypten has signed a CLA.

@github-actions
Copy link

github-actions bot commented Dec 9, 2021

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Dec 9, 2021
@ned-deily
Copy link
Member

I looked to try and find the real people behind the BPO accounts "samwyse" and "krypten" but couldn't. I have no idea how to verify at least krypten has signed a CLA.

@smontanaro, FWIW, according to the BPO user database, both of those users are registered as having signed the PSF CLA.

@MaxwellDupre
Copy link
Contributor

You don't have empty quote '' in either of your tests:
self._write_test(['a',None,1], '"a",,1', quoting = csv.QUOTE_STRINGS)

self._write_test(['a',None,1], '"a",,"1"', quoting = csv.QUOTE_NOTNULL)

Hence I don't know if the code change produces the correct result.
I note that you can transpose the QUOTE_STRINGS and QUOTE_NOTNULL in these test and the affect is the same, and to be obvious you would only need one!
Also, since this is a little confusing (as to what is expected) could you add an example to the docs, please?
For each new quote so that users can see clearly whats expected and how they differ.

@samwyse
Copy link
Contributor

samwyse commented May 3, 2022

@smontanaro, Is the CLA Signing waiting on me?

@AlexWaygood AlexWaygood closed this May 3, 2022
@AlexWaygood AlexWaygood reopened this May 3, 2022
@AlexWaygood
Copy link
Member

@smontanaro, Is the CLA Signing waiting on me?

Nope, all good, it just needed a new CI run to be retriggered, which I've just done by closing and reopening the PR :)

@AlexWaygood AlexWaygood changed the title bpo-23041: update proposed changes to csv module. gh-67230: update proposed changes to csv module. Nov 1, 2022
@smontanaro
Copy link
Contributor Author

We need to fish or cut bait on this PR.

@msetina
Copy link

msetina commented Apr 9, 2023

Why is merging blocked. These quoting types are essential for Microsoft warehousing CSV loading. Is there anyone form MS to push this to get better support for their loaders. CSV is broad format and these quoting types would fill some holes for special loaders without enough parameters to allow CSV prepared with Python.

@arhadthedev
Copy link
Member

Why is merging blocked

Because only core developers can merge, and the one responsible for the csv module seems to go on vacation.

@msetina
Copy link

msetina commented Apr 10, 2023

Is there any chance for this to land in 3.12?
I can elaborate why these additional quoting is needed:
As Python has a special value None It would be nice to have capability to control serialization of this to CSV. Currently it serializes into empty string quoted or not. But in some string columns/fields it is required on the parsing side that None is repesented differently than an empty string. If the reception side does not support parsing a special character set into None and it expects empty field without quoting even if field is a string and is quoted. This case is currently not supported in quoting possibilities, but is possible with new quoting rule. All of this is in realm of the RFC that should be supported by CSV module.
I mentioned Microsoft for this case as It is Synapse that has this kind of parser. Without additional postprocessing Synapse will not load CSV produced by Python CSV module. When working with Synapse team they stand by the RFC that their parser supports. I did express a whish to them that they support this change request, but...
I do not understand why it was not added in 3.11.

@smontanaro
Copy link
Contributor Author

Sorry @msetina. All I can do is update the PR. I'll do what I can in that regard, but someone else will have to push to get it merged. @Mariatta do you have any suggestions?

@samwyse
Copy link
Contributor

samwyse commented Apr 11, 2023

I realize that this is over a year old, but only noticed it today...

You don't have empty quote '' in either of your tests: self._write_test(['a',None,1], '"a",,1', quoting = csv.QUOTE_STRINGS)

self._write_test(['a',None,1], '"a",,"1"', quoting = csv.QUOTE_NOTNULL)

Change the test_csv.py to the following:

       self._write_test(['a','',None,1], '"a","",,1',
                        quoting = csv.QUOTE_STRINGS)
       self._write_test(['a','',None,1], '"a","",,"1"',
                        quoting = csv.QUOTE_NOTNULL)

I note that you can transpose the QUOTE_STRINGS and QUOTE_NOTNULL in these test and the affect is the same, and to be obvious you would only need one!

Not so! They differ in the treatment of numeric values. QUOTE_STRINGS would not quote numbers, so 1 is output as 1, while with QUOTE_NOTNULL it is output as "1".

Also, since this is a little confusing (as to what is expected) could you add an example to the docs, please? For each new quote so that users can see clearly whats expected and how they differ.

Let me see what I can do.

samwyse added 2 commits April 11, 2023 10:02
Improve tests of QUOTE_STRINGS and QUOTE_NOTNULL
Improve descriptions of QUOTE_NOTNULL and QUOTE_STRINGS with writer objects. Add descriptions for reader objects.
@merwok merwok added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 12, 2023
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @merwok for commit ba74857 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 12, 2023
@miss-islington
Copy link
Contributor

@smontanaro: Status check is done, and it's a failure or timed out ❌.

@merwok merwok removed the stale Stale PR or inactive for long period of time. label Apr 12, 2023
@merwok merwok closed this Apr 12, 2023
@merwok merwok reopened this Apr 12, 2023
@miss-islington
Copy link
Contributor

@smontanaro: Status check is done, and it's a failure or timed out ❌.

1 similar comment
@miss-islington
Copy link
Contributor

@smontanaro: Status check is done, and it's a failure or timed out ❌.

@miss-islington
Copy link
Contributor

Status check is done, and it's a success ✅.

@miss-islington miss-islington merged commit 330a942 into python:main Apr 12, 2023
aisk pushed a commit to aisk/cpython that referenced this pull request Apr 18, 2023
Add two quoting styles for csv dialects.
They will help to work with certain databases in particular.

Automerge-Triggered-By: GH:merwok
@smontanaro smontanaro deleted the fix_issue_23041 branch January 18, 2024 13:25
smontanaro added a commit to smontanaro/cpython that referenced this pull request Jan 31, 2024
@GPHemsley pointed out that with the recent changes in python#29469, two new data items failed to get `versionadded` notes.
terryjreedy pushed a commit that referenced this pull request Feb 1, 2024
…114816)

As @GPHemsley pointed out, #29469 omitted `versionadded` notes for the 2 new items.
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Feb 1, 2024
…RINGS (pythonGH-114816)

As @GPHemsley pointed out, pythonGH-29469 omitted `versionadded` notes for the 2 new items.
(cherry picked from commit 586057e)

Co-authored-by: Skip Montanaro <[email protected]>
terryjreedy pushed a commit that referenced this pull request Feb 1, 2024
…TRINGS (GH-114816) (#114840)

As @GPHemsley pointed out, GH-29469 omitted `versionadded` notes for the 2 new items.
(cherry picked from commit 586057e)

Co-authored-by: Skip Montanaro <[email protected]>
aisk pushed a commit to aisk/cpython that referenced this pull request Feb 11, 2024
…RINGS (python#114816)

As @GPHemsley pointed out, python#29469 omitted `versionadded` notes for the 2 new items.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.