Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add write queue #736

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

feat: add write queue #736

wants to merge 4 commits into from

Conversation

marco6
Copy link
Collaborator

@marco6 marco6 commented Mar 4, 2025

This PR adds a write queue for the execution of statements.

To add that I had to change a lot the execution model which was, depending on the state, either sync or async. It is now async only. This is a good thing IMHO as the handling of the result can be done always in the callback while before this change it had to be managed both from the caller (in case of LEADER_NOT_ASYNC) and in the callback, spreading the handling logic and creating weird dependencies in the caller logic and in the tests.

This PR also depends on sqlite3_txn_state which appeared in sqlite 3.34. Given that 20.04 is stuck with version 3.31, this is dropping the tests for that version. This is not a big deal as I see unlikely that 20.04 will ever get dqlite 1.18 as it would be a new version in a frozen release.

@marco6
Copy link
Collaborator Author

marco6 commented Mar 4, 2025

Draft as there is no way for now to set any timeout other than 0ms and there are no tests that confirm the queue behavior.

@marco6 marco6 force-pushed the write-queue branch 8 times, most recently from d650c94 to 4c88c58 Compare March 5, 2025 14:15
@marco6
Copy link
Collaborator Author

marco6 commented Mar 5, 2025

This PR is ready to be tested and benchmarked.

@marco6 marco6 marked this pull request as ready for review March 5, 2025 15:04
@marco6 marco6 requested a review from just-now March 5, 2025 15:04
Copy link
Contributor

@just-now just-now left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first pass.

I read all the code. I could understand the goal of changing the flow of exec request.

IMHO, the following is missing:

  • A short but concise description of the nature of change. Motivation why do we need this new feature, what wasn't so efficient before having it, main technical decisions made.

  • I'm not convinced that exec request shall start from WAITING state however I could be wrong and if it's not ENQUEUED or INITIED I'm OK with WAITING but please explain why.

  • Please make a few comments about changes around barriers, so far I don't fully understand this part.

size_t n;
char *cursor;
failure.code = (uint64_t)code;
failure.message = message;
struct response_failure failure = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice.

@@ -71,7 +72,7 @@ static void gateway_handle_cb(struct handle *req,
return;
}

if (status != 0) {
if (status == SQLITE_IOERR_NOT_LEADER || status == SQLITE_IOERR_LEADERSHIP_LOST) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem an equivalent code to if (status != 0). There're certain reasons not to change this check this way, for example, in case if someone will fire or add a new SQLITE_IOERR_ this will lead to wrong handing in this place. Still, if you want to enforce that error types observed here are limited to SQLITE_IOERR_NOT_LEADER | SQLITE_IOERR_LEADERSHIP_LOST, it's better to put an assert inside the condition:

if (status != 0) {
     assert(status == SQLITE_IOERR_NOT_LEADER || status == SQLITE_IOERR_LEADERSHIP_LOST);

return 0;
}

/**
* State machine for exec requests.
*/
enum {
EXEC_WAITING,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This state semantically looks like EXEC_INITED. Curious, what am I missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide more docs about the new state and its interaction with other states here. It'll be curious to read about the purpose of introduced queue, the use of the timer and how it reflects on request state transitions.

* but also start a timer as this statement should not sit
* in the queue for too long. In the case the timer expires
* the statement will just fail with SQLITE_BUSY. */
err = raft_timer_start(l->raft, &req->timer, l->db->config->busy_timeout, 0, exec_timeout_cb);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is too long. Please try to fit it in 80 chars.

* in the queue for too long. In the case the timer expires
* the statement will just fail with SQLITE_BUSY. */
err = raft_timer_start(l->raft, &req->timer, l->db->config->busy_timeout, 0, exec_timeout_cb);
if (err != RAFT_RESULT_OK) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ignore][sigh] Also, I can't believe that our time_start() function allocates.

t = raft_malloc(sizeof *t);
assert(t != NULL);
t->type = TIMER;
t->completion_time = *io->time + (timeout ? timeout : 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just + timeout?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants