-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stop reconnect_on_network_failure rescue/retry loop on session#close #149
stop reconnect_on_network_failure rescue/retry loop on session#close #149
Conversation
925db94
to
13b98e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took another stab on this with the intention to make the logic within the code easier to understand. Also made the behavior change more explicit via a raised exception
@@ -318,7 +322,8 @@ def disable_automatic_recovery | |||
# Begins automatic connection recovery (typically only used internally | |||
# to recover from network failures) | |||
def automatically_recover | |||
@logger.debug("session: begin automatic connection recovery") | |||
raise ConnectionClosedException if @was_explicitly_closed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to make this new behaviors more obvious by raising a exception if automatically_recover
is called after an explicit close.
@@ -583,6 +588,7 @@ def converting_rjc_exceptions_to_ruby(&block) | |||
def reconnecting_on_network_failures(interval_in_ms, &fn) | |||
@logger.debug("session: reconnecting_on_network_failures") | |||
begin | |||
return if @was_explicitly_closed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow seems better to have the loop exit in there instead of in the rescue, but it's basically the same
@@ -10,7 +10,7 @@ | |||
|
|||
def close_all_connections! | |||
# wait for stats to refresh, make sure to run bin/ci/before_build.sh as well! | |||
sleep 0.7 | |||
sleep 1.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had huge issues in getting 2 specs here to work (l 318 & 333) increasing this made them stable for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script updates management plugin settings so that changes are observable over HTTP API earlier. Was it executed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did run the script. Although I had to modify it to work with the rabbitmq I have running inside a podman/docker container. I can revert and see if it works on travis, would be fine for me. I also didn't try lots of values here just added 1 extra seconds and then the specs worked stable locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine. Let's keep the value that works for you.
@@ -61,6 +71,7 @@ | |||
end | |||
c = MarchHare.connect(executor_factory: factory, network_recovery_interval: 0) | |||
c.close | |||
c.instance_variable_set(:@was_explicitly_closed, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to make these tests work I now override the internal instance var. Not ideal for sure. I also don't quite understand the intention behind the close
+ automatically_recover
in these tests. Maybe you can shed some light on why
- calling `automatically_recover` after explicit `close` will fail now - introduce `reopen` method to enable reconnect after explicit close - make explicit close handling more obvious on recovery
13b98e9
to
08574c0
Compare
Thank you! |
any plans on creating a release with this? |
Awesome thanks |
I noticed that sometimes our processes get stuck on shutdown after we call
close
on the MarchHare::SessionWhile debugging this I noticed that this happens during a network outage because in this case march_hare is "stuck" in a never stopping rescue/retry loop in
reconnecting_on_network_failures
.To solve this I now track if
close
was called and if so exit the automatic_recovery.Honestly not entirely sure how to automatically test this. I'm able to manually reproduce this issue and that this fixes it but not sure I can make a spec out of this because it somewhat involves killing rabbitmq.