Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PKI PR Feedback. #3191

Merged

Conversation

winder
Copy link
Contributor

@winder winder commented Nov 5, 2021

Summary

  • misc nits, clarifications, typos, organization
  • Close() now blocks (with timeout) until write queue completes.
  • Flush() now has a timeout.
  • REST API returns 404 for no-op deletes and missing keys

Test Plan

@@ -176,7 +178,7 @@ type ParticipationRegistry interface {
Record(account basics.Address, round basics.Round, participationType ParticipationAction) error

// Flush ensures that all changes have been written to the underlying data store.
Flush() error
Flush(duration time.Duration) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duration -> timeout


flushDone *sync.Cond
flushesPending int

Timeout time.Duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout -> flushTimeout ( it doesn't look like there is a reason to make it public, right ? )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not used, but I was thinking it might make sense to have it be configurable. I don't mind making it private.

flush: true,
}
db.flushesPending++
db.writeQueue <- partDBWriteRecord{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the writing of db.writeQueue <- partDBWriteRecord which is done under mutex lock, could deadlock with the flush thread trying to take the lock (db.mutex.Lock()). In fact, if you'll try to Flush very quickly many time, it's very likely to happen.

Copy link
Contributor Author

@winder winder Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, I think there were a few opportunities for this to happen with other functions as well.

I reproduced one of the deadlocks with a unit test and added a separate mutex for the specific variable used by the flush thread.

@winder winder requested a review from tsachiherman November 8, 2021 15:59
@codecov-commenter
Copy link

codecov-commenter commented Nov 8, 2021

Codecov Report

Merging #3191 (b924a84) into feature/partkey (0ca20bd) will decrease coverage by 0.01%.
The diff coverage is 45.78%.

Impacted file tree graph

@@                 Coverage Diff                 @@
##           feature/partkey    #3191      +/-   ##
===================================================
- Coverage            47.70%   47.68%   -0.02%     
===================================================
  Files                  367      367              
  Lines                59079    59082       +3     
===================================================
- Hits                 28184    28176       -8     
- Misses               27650    27663      +13     
+ Partials              3245     3243       -2     
Impacted Files Coverage Δ
agreement/abstractions.go 50.00% <ø> (ø)
agreement/agreementtest/keyManager.go 100.00% <ø> (ø)
daemon/algod/api/server/v2/handlers.go 0.00% <0.00%> (ø)
daemon/algod/api/server/v2/test/helpers.go 74.81% <ø> (ø)
data/accountManager.go 0.00% <0.00%> (ø)
node/node.go 23.89% <20.00%> (ø)
data/account/participationRegistry.go 81.29% <59.32%> (-0.84%) ⬇️
agreement/pseudonode.go 72.15% <100.00%> (ø)
network/wsPeer.go 65.83% <0.00%> (-2.50%) ⬇️
network/requestTracker.go 70.25% <0.00%> (-0.87%) ⬇️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0ca20bd...b924a84. Read the comment docs.

@tsachiherman
Copy link
Contributor

I'd like to suggest a slightly different approach to the participationDB.Flush implementation that, I believe, would make it much simpler and more robust; the pseudocode goes like that:

func (db *participationDB) Flush(timeout time.Duration) error {
        resultChannel := make(chan error, 1)
        timeoutCh := time.After(timeout)
	select {
        case db.writeQueue <- partDBWriteRecord{
			flush: true,
                        writeError: resultChannel,
		}
	case <-db.writeQueue:
		return err
	case <-timeoutCh:
		return fmt.Errorf("timeout while flushing changes, check results manually")
	}

	select {
        case err := <- resultChannel:
               // todo : log error here if we want to.
               return err
	case <-db.writeQueue:
		return err
	case <-timeoutCh:
		return fmt.Errorf("timeout while flushing changes, check results manually")
	}
}

This model doesn't really require any Cond variable, and given that the output channel is allocated per Flush call, there is no risk of the channel being full.

@winder winder marked this pull request as ready for review November 9, 2021 21:37
@winder winder requested a review from tsachiherman November 10, 2021 01:29
// PKI TODO: pick a better timeout, this is just something short. This could also be removed if we change
// POST /v2/participation and DELETE /v2/participation to return "202 OK Accepted" instead of waiting and getting
// the error message.
err = node.accountManager.Registry().Flush(500 * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to have the 500 * time.Millisecond as a constant, but we don't need this to be done in this PR.

Copy link
Contributor

@tsachiherman tsachiherman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. thanks for the changes.

@winder winder merged commit 28fd5db into algorand:feature/partkey Nov 12, 2021
@winder winder deleted the will/partkey-to-master-feedback branch November 12, 2021 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants