-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
peer: send reestablish, shutdown messages before starting writeHandler #8186
Conversation
7986a59
to
94e5402
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to have a test for this, to ensure we don't cause a similar breakage in the future.
If that's too difficult, second-best is probably some comments on writeMessage
explaining that it should only be used by writeHandler
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM🙏 and +1 for detailed documentation.
This is to avoid a potential race on WriteMessage and Flush internals. Because there is no locking on WriteMessage and Flush, if we allow writeMessage calls in Start after the writeHandler has started, the writeMessage calls may call WriteMessage/Flush at the same time that writeMessage calls from the writeHandler does. Since there is no locking, internals like b.nextHeaderSend can race and cause panics.
94e5402
to
7556402
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -703,6 +703,23 @@ func (p *Brontide) Start() error { | |||
|
|||
p.startTime = time.Now() | |||
|
|||
// Before launching the writeHandler goroutine, we send any channel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we abstract out loadActiveChannels
, or split it into: load chans and load messages, then we can write a unit test here to try to trigger the panic. IIUC, it relies on concurrent access to the brontide state machine, so the two goroutines need to line up directly. This may be easier to trigger with the -race
flag on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a unit test that detects the race. Will send a PR once I clean it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will make the release branch, merge this in, and add release notes. Goal here is a fast follow minor release with just this fix, if we can tighten up uni tests here than that's also ideal. Hopefully we're finally putting this saga to bed, and can devote resources to refactoring/re-writing this portion to get away from the mutex issues that started the saga in the first place. |
This is to avoid a race on
WriteMessage
andFlush
internals. Because there is no locking onWriteMessage
orFlush
, if we allowwriteMessage
calls inStart
after thewriteHandler
has started, thewriteMessage
calls may callWriteMessage
/Flush
at the same time thatwriteMessage
calls from thewriteHandler
does. Since there is no locking, internals likeb.nextHeaderSend
can race and cause panics.Fixes #8184