-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PKI PR Feedback. #3191
PKI PR Feedback. #3191
Conversation
@@ -176,7 +178,7 @@ type ParticipationRegistry interface { | |||
Record(account basics.Address, round basics.Round, participationType ParticipationAction) error | |||
|
|||
// Flush ensures that all changes have been written to the underlying data store. | |||
Flush() error | |||
Flush(duration time.Duration) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duration -> timeout
|
||
flushDone *sync.Cond | ||
flushesPending int | ||
|
||
Timeout time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timeout
-> flushTimeout
( it doesn't look like there is a reason to make it public, right ? )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not used, but I was thinking it might make sense to have it be configurable. I don't mind making it private.
flush: true, | ||
} | ||
db.flushesPending++ | ||
db.writeQueue <- partDBWriteRecord{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the writing of db.writeQueue <- partDBWriteRecord
which is done under mutex lock, could deadlock with the flush thread trying to take the lock (db.mutex.Lock()
). In fact, if you'll try to Flush very quickly many time, it's very likely to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch, I think there were a few opportunities for this to happen with other functions as well.
I reproduced one of the deadlocks with a unit test and added a separate mutex for the specific variable used by the flush thread.
Codecov Report
@@ Coverage Diff @@
## feature/partkey #3191 +/- ##
===================================================
- Coverage 47.70% 47.68% -0.02%
===================================================
Files 367 367
Lines 59079 59082 +3
===================================================
- Hits 28184 28176 -8
- Misses 27650 27663 +13
+ Partials 3245 3243 -2
Continue to review full report at Codecov.
|
I'd like to suggest a slightly different approach to the
This model doesn't really require any |
// PKI TODO: pick a better timeout, this is just something short. This could also be removed if we change | ||
// POST /v2/participation and DELETE /v2/participation to return "202 OK Accepted" instead of waiting and getting | ||
// the error message. | ||
err = node.accountManager.Registry().Flush(500 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be nice to have the 500 * time.Millisecond
as a constant, but we don't need this to be done in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. thanks for the changes.
Summary
Test Plan