Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure collation/charset for DB connection #100

Merged
merged 12 commits into from
Dec 18, 2022

Conversation

Dakad
Copy link
Contributor

@Dakad Dakad commented Dec 12, 2021

Resolves #46

Context

Allow the configuration of the collation to use during the DB.open connection.
Just include into the connection string URI &encoding=utf8mb4_unicode_520_ci
The charset is deduced from the encoding.
Only the collation with their ID below than 1 byte can be used and some collation are not supported by MySQL.
See: https://dev.mysql.com/doc/refman/5.7/en/charset-connection.html#charset-connection-impermissible-client-charset

@bcardiff
Copy link
Member

If we use a different collation, should the strings sent as arguments be encoded differently? We would need a couple of specs for this change, I fear it could break the support of other collations if there are none.

@straight-shoota
Copy link
Member

@Dakad Would it be possible to add some specs for this?

Also note the question in #100 (comment)

@Dakad
Copy link
Contributor Author

Dakad commented Sep 19, 2022

@straight-shoota @bcardiff My bad, I completely forgot about this 😞 I will add some specs.

About the question, I'm not sure to fully understand it. I expect the ?encoding= param to be in UTF8 since it's a URI::Params

}

def self.default_collation
"utf8_general_ci"
Copy link
Contributor Author

@Dakad Dakad Sep 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted the default collation to utf8_general_ci as before.

@@ -30,13 +30,28 @@ describe Driver do
DB.open "mysql://crystal_test:secret@#{database_host}/crystal_mysql_test" do |db|
db.scalar("SELECT DATABASE()").should eq("crystal_mysql_test")
db.scalar("SELECT CURRENT_USER()").should match(/^crystal_test@/)
db.scalar("SELECT @@collation_connection").should eq("utf8_general_ci")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added these 2 assertions to check when the encoding is not provided in the connection string, the DB connection's collation should be utf8_general_ci.

Not sure if it's a good practice to have that. I assume the DB server could have a different collation set in the config.ini

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say this is probably fine.
It might be better to break them out into a separate example and add some description of the expectation. That would make it easier to understand the reasons if this happens to fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added another section to the README.md

Copy link
Member

@beta-ziliani beta-ziliani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor fix. I'll commit it directly to not bother the contributor for such a minion task

@beta-ziliani beta-ziliani requested a review from bcardiff December 2, 2022 14:22
Copy link
Member

@bcardiff bcardiff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a default_charset function. Seems it's not used.

The specs do not verify if a string column is able to store and retrieve a string with special chars. If this is already working we can add those specs later and add the feature. But it does not seems we are preventing to break this in the future as is.

@bcardiff
Copy link
Member

I checked with other native driver implementation and it seems they don't do anything to the payload if the collation/charset is different. So extra 👍 on that regard.

@straight-shoota
Copy link
Member

All right, then let's merge it 👍

@bcardiff
Copy link
Member

Yup. I would drop the unneeded defs I pointed above and merge it. I can take care of it in the following days.

@bcardiff bcardiff merged commit 7689c58 into crystal-lang:master Dec 18, 2022
@bcardiff
Copy link
Member

Thanks @Dakad for the contribution and my apologies for the long delay in this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Connect to database with collation/character other than UTF8
4 participants