Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-36859: Use sqlite3_stmt_readonly API when possible to determine if statement is DML.K #13216

Closed
wants to merge 9 commits into from

Conversation

coleifer
Copy link

@coleifer coleifer commented May 9, 2019

Implements fix for https://bugs.python.org/issue36859

The stdlib sqlite3 module attempts to detect if a statement is data-modifying, which it currently does by stripping whitespace and looking at the start of the SQL string. There are numerous issues with this approach:

  • If the sql string starts with a comment it is marked as not DML, even though it may be.
  • Common table expressions can be used with INSERT/UPDATE/DELETE queries and will not be caught using the current logic.
  • Possibly other stuff?

In sqlite 3.7.11, the sqlite3_stmt_readonly API was added which implements this logic correctly.

pysqlite has weeeird quirks with how it handles DDL (create/drop), which the above api correctly identifies as writing to the db, but for our purposes should not be considered DML. Same for begin exclusive and begin immediate transaction controls.

So this patch:

  • for old sqlite (prior to 3.7.11) the current behavior is retained, else
  • use sqlite3_stmt_readonly instead and then be sure that we don't mark create/drop/begin as being data-modifying.
  • adds tests.
  • is backwards-compatible but more robust

Sqlite docs for sqlite3_stmt_readonly.

https://bugs.python.org/issue36859

@coleifer coleifer requested a review from berkerpeksag as a code owner May 9, 2019 04:44
@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Our records indicate we have not received your CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

Copy link
Member

@berkerpeksag berkerpeksag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great improvement, thank you for your PR!

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@coleifer
Copy link
Author

coleifer commented May 9, 2019

I have made the requested changes; please review again.

@bedevere-bot
Copy link

Thanks for making the requested changes!

@berkerpeksag: please review the changes made to this pull request.

@coleifer
Copy link
Author

coleifer commented May 9, 2019

Oh thanks for pushing the markup fix, my bad! So what happens now? :)

@berkerpeksag
Copy link
Member

I'm testing it locally now. Hopefully, I'm going to merge it tonight :)

@coleifer
Copy link
Author

coleifer commented May 9, 2019

Apologies for the new commit, but I was just thinking and realized that sqlite supports a limited subset of ALTER TABLE. So ALTER, plus VACUUM/ANALYZE/REINDEX have been added to the list. These are queries that may write to the database (so sqlite3_stmt_readonly returns false), but which would not have been subject to the transactional logic the Python sqlite3 module currently uses.

@berkerpeksag
Copy link
Member

Good catch! We had a test for VACUUM and apparently I forgot to add it back after I removed it in https://hg.python.org/cpython/rev/284676cf2ac8#l3.23

Also, looking at old commits and issues, it seems to me that we can move new tests into the existing test case TransactionalDDL:

class TransactionalDDL(unittest.TestCase):

What do you think?

Also add additional assertion covering ALTER statements.
@coleifer
Copy link
Author

I agree, and have moved the tests into the transactions module. I also added an assertion covering the ALTER statement to an existing transactional DDL case.

@ghost
Copy link

ghost commented May 11, 2019

Hi, I have two worries.

1, Fallback when sqlite doesn't have sqlite3_stmt_readonly(), then the same version Python may have diferent behaviors on different environments, this sounds not very good.

2, using PyOS_strnicmp() to determine statement is not very reliable, see: https://bugs.python.org/issue35398
So this patch brings more uncertainty (a little).

BTW, sqlite3_stmt_readonly() was added in SQLite 3.7.4, not 3.7.11
https://sqlite.org/releaselog/3_7_4.html

@ghost
Copy link

ghost commented May 11, 2019

How about this proposal.

Don't change the determine logic of is_dml, if a user is executing a complex statement, such as Charles' sample:

rc = conn.execute(
    'with c(k, v) as (select key, value + 10 from kv) '
    'update kv set value=(select v from c where k=kv.key)')

We encourage the user to set conn.isolation_level = None, let he/she manipulate transaction manually. This advice can be written in document.

Add a new parameter count_row=False to executeXX() functions, if this parameter is True, always accumulate rowcount:

/* in file \Modules\_sqlite\cursor.c */

-   if (self->statement->is_dml) {
+   if (self->statement->is_dml || count_row) {
        self->rowcount += (long)sqlite3_changes(self->connection->db);
    } else {
        self->rowcount= -1L;
    }

The idea is to let user manipulate complex statement manually, in order to round the historical issues.

@coleifer
Copy link
Author

then the same version Python may have diferent behaviors on different environments, this sounds not very good.

As long as you aren't using a 9 year old sqlite, the behavior will be correct. The behavior is currently buggy.

using PyOS_strnicmp() to determine statement is not very reliable

That patch is a much worse implementation of what this patch fixes. That patch was not merged.

The use of strncmp is sketchy, I agree, but please let's not throw the baby out with the bathwater.

Currently there are some real issues with the logic. The crux of the issue is that the sqlite lib needs to enter a txn if the user issues a data modifying statement... except ddl is not included in this.

My sense is that this patch is a real improvement. But is it perfect? No.

Nice catch I somehow misread the version.

@coleifer
Copy link
Author

The idea is to let user manipulate complex statement manually, in order to round the historical issues.

Ultimately it's up to the core team. This feels cheesey to me, tho.

Regarding isolation level: yes! For myself, I always always always set to none, because pysqlites implementation is so problematic.

This patch isn't trying to change the existing behavior, though. Simply make it more reliable.

Again,it's just a patch, it's up to the core devs to decide. I've already got a standalone pysqlite3 package and have merged these changes over there.

@berkerpeksag
Copy link
Member

I'm thinking about introducing a mode parameter to sqlite3.connect(). Its default value would be sqlite3.Mode.DBAPI2. Users who would like to leave transaction management to SQLite and/or control themselves can set mode to sqlite3.Mode.AUTOCOMMIT. We could then think about deprecating the default mode and (maybe) create an updated version of PEP 249.

This is similar to what we do in email package: https://docs.python.org/3/library/email.policy.html#module-email.policy It would also help us to fix other quirks of sqlite3 module without breaking backwards compatibility.

@coleifer
Copy link
Author

Rethinking the whole transaction handling is great, but obviously not something I had been trying to undertake.

If such a plan moves forward then this patch is not needed.

@coleifer
Copy link
Author

coleifer commented May 11, 2019

So thinking about this some more, and also looking at the interest and motivation for #10913 -- I do think there's sufficient interest in solving this particular bug, which I summarize as:

pysqlite does not reliably detect DML-type queries.

Ramifications: transaction controls may not reliably cover certain queries (CTEs, preceding comments), which also leads to the cursor not exposing the rowcount attribute.

This patch certainly fixes this, and fixes it much better than the linked patch.

Reasons why this patch is not perfect:

  • sqlite versions which are 9+ years old will not benefit from this patch. (Note that ubuntu 12.04 and newer all include a recent-enough version, for what it's worth. Looks like centos6, too.)
  • implementation still requires reliance on string comparisons to ensure that certain DDL queries are not treated as data-modifying, e.g. sqlite3_stmt_readonly() correctly says that CREATE TABLE queries are not read-only, but since pysqlite's transactional behavior does not cover DDL, we need to skip automatically starting a txn for these queries.

Options, as I see them:

  • do not merge this patch
    • pursue a documentation strategy
    • force cursor rowcount using a new parameter
    • new mode strategy, which has db-api2 pep consequences
  • merge this patch

@berkerpeksag
Copy link
Member

I'm definitely going to merge this PR. The mode idea still needs to be discussed on python-ideas. It can only be added 3.9. It would be nice to have less bug prone DML detection code for 3.7 and 3.8.

Feature added in sqlite 3.7.4, not .11.
@ghost
Copy link

ghost commented May 12, 2019

It would be nice to remove sqlite3's warts by adding a new mode.

I'm not strongly against this patch, because my proposal is not very beautiful.

For this patch, issue35398's problem is more serious, some statements that begin with a comment may abort the program:

>>> conn.execute('/* comment */ vacuum') # this will start a transaction
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
sqlite3.OperationalError: cannot VACUUM from within a transaction

It seems we should detect leading comments like PR #10913.

RHEL6 (Product retirement at November 30, 2020) is still alive, it's shipped with SQLite 3.6.x.

How about emit a RuntimeWarning when importing sqlite3 module if SQLite version < 3.7.4? Switch to a different behavior silently is disconcerting.

@coleifer
Copy link
Author

coleifer commented May 12, 2019

Switch to a different behavior silently is disconcerting.

Why don't you grep the pysqlite for ifdefs. You'll find that there are a number of differences that are dependant on the sqlite version.

In the case of an old version the original behavior is retained, so I don't understand why raising a runtime error makes sense.

Similarly, this patch is intended to do away with the necessity to parse comments. This patch fixes the issue you referenced.

@ghost
Copy link

ghost commented May 14, 2019

Why don't you grep the pysqlite for ifdefs. You'll find that there are a number of differences that are dependant on the sqlite version.

In the case of an old version the original behavior is retained, so I don't understand why raising a runtime error makes sense.

Search SQLITE_VERSION_NUMBER in \Modules\_sqlite\ folder, basically these are these cases:

  • use new API for better performance
  • synchronize the behavior
  • use new feature of SQLite

The first two don't introduce different behaviors. The third one is documented clearly, and if run a code on old SQLite, the program will abort.

While with this patch, please imagine this scene: the same program, under the same version Python, can get different results silently.

If this happened, it's very hard to find where the bug is, the programmer may get insane. Even worse, no one noticed the difference at all, but polluted results may break things (at that time or in the future) like a specter.

Moreover, IMO sqlite3_stmt_readonly() is not suitable for detecting statements, it seems it was designed for telling if a statement will changed the database file.

Similarly, this patch is intended to do away with the necessity to parse comments. This patch fixes the issue you referenced.

With this patch, this statement can't be executed:
conn.execute('/* comment */ vacuum')

is_dml is true, then a transaction is begun implicitly before the executing, but VACUUM can't execute inside a transaction, so the error raised:
sqlite3.OperationalError: cannot VACUUM from within a transaction

You still have to detect leading comments.

@coleifer
Copy link
Author

You still have to detect leading comments

Do you? I don't know. The error message is clear, the solution simple?

@coleifer
Copy link
Author

These are all good points. I don't see a good way forward that is both backward-compatible and can maintain compatibility across potentially different libsqlite3 installs.

Edge-cases with current implementation:

  • DML statements that begin with comments
  • DML statements that use CTEs
  • ?

Edge-cases with this patch:

  • Potentially fail to commit before DDL, VACUUM, ANALYZE in the event they are prefixed with comments.
  • DML may result in rowcount=-1 for old sqlite or correct result for sqlite 3.7.4+, in the event dml has comment/cte.
  • ?

@ghost
Copy link

ghost commented May 14, 2019

You still have to detect leading comments

Do you? I don't know. The error message is clear, the solution simple?

Due to the leading comment, this simple code will abort with this patch:

import sqlite3
conn = sqlite3.connect(':memory:')
conn.execute('/* comment */ vacuum')

I don't see a good way forward that is both backward-compatible and can maintain compatibility across potentially different libsqlite3 installs.

It's a troublesome issue. If must pick a solution, I vote for this order:
1, add an AUTOCOMMIT mode.
2, set minimum version to 3.7.4.
3, other solutions.

IMO sqlite3_stmt_readonly() is not very suitable for detecting statements, maybe some corner cases will give a surprise.

  • Before this patch, the is_dml logic is incomplete.
  • When use sqlite3_stmt_readonly(), more uncertainty is introduced.

Hard to decide for me, let the core developers choose. :)

@coleifer
Copy link
Author

coleifer commented Jun 4, 2019

What's the status on this?

@erlend-aasland
Copy link
Contributor

After #24106, this PR can be greatly simplified.

Its not for me to decide, but I think this is an improvement, and I'm in favour of merging it (after a rebase onto current master).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants