stop reconnect_on_network_failure rescue/retry loop on session#close #149

andreaseger · 2020-03-04T12:32:47Z

I noticed that sometimes our processes get stuck on shutdown after we call close on the MarchHare::Session

While debugging this I noticed that this happens during a network outage because in this case march_hare is "stuck" in a never stopping rescue/retry loop in reconnecting_on_network_failures.

To solve this I now track if close was called and if so exit the automatic_recovery.

Honestly not entirely sure how to automatically test this. I'm able to manually reproduce this issue and that this fixes it but not sure I can make a spec out of this because it somewhat involves killing rabbitmq.

lib/march_hare/session.rb

andreaseger

Took another stab on this with the intention to make the logic within the code easier to understand. Also made the behavior change more explicit via a raised exception

andreaseger · 2020-03-06T12:25:00Z

lib/march_hare/session.rb

@@ -318,7 +322,8 @@ def disable_automatic_recovery
    # Begins automatic connection recovery (typically only used internally
    # to recover from network failures)
    def automatically_recover
-      @logger.debug("session: begin automatic connection recovery")
+      raise ConnectionClosedException if @was_explicitly_closed


I tried to make this new behaviors more obvious by raising a exception if automatically_recover is called after an explicit close.

andreaseger · 2020-03-06T12:26:11Z

lib/march_hare/session.rb

@@ -583,6 +588,7 @@ def converting_rjc_exceptions_to_ruby(&block)
    def reconnecting_on_network_failures(interval_in_ms, &fn)
      @logger.debug("session: reconnecting_on_network_failures")
      begin
+        return if @was_explicitly_closed


Somehow seems better to have the loop exit in there instead of in the rescue, but it's basically the same

andreaseger · 2020-03-06T12:27:06Z

spec/higher_level_api/integration/connection_recovery_spec.rb

@@ -10,7 +10,7 @@

  def close_all_connections!
    # wait for stats to refresh, make sure to run bin/ci/before_build.sh as well!
-    sleep 0.7
+    sleep 1.7


I had huge issues in getting 2 specs here to work (l 318 & 333) increasing this made them stable for me

This script updates management plugin settings so that changes are observable over HTTP API earlier. Was it executed?

I did run the script. Although I had to modify it to work with the rabbitmq I have running inside a podman/docker container. I can revert and see if it works on travis, would be fine for me. I also didn't try lots of values here just added 1 extra seconds and then the specs worked stable locally.

It's fine. Let's keep the value that works for you.

andreaseger · 2020-03-06T12:29:39Z

spec/higher_level_api/integration/connection_spec.rb

@@ -61,6 +71,7 @@
    end
    c = MarchHare.connect(executor_factory: factory, network_recovery_interval: 0)
    c.close
+    c.instance_variable_set(:@was_explicitly_closed, false)


to make these tests work I now override the internal instance var. Not ideal for sure. I also don't quite understand the intention behind the close + automatically_recover in these tests. Maybe you can shed some light on why

- calling `automatically_recover` after explicit `close` will fail now - introduce `reopen` method to enable reconnect after explicit close - make explicit close handling more obvious on recovery

michaelklishin · 2020-03-07T18:20:38Z

Thank you!

andreaseger · 2020-05-06T11:08:25Z

any plans on creating a release with this?

michaelklishin · 2020-05-06T13:22:13Z

4.2.0 is out

andreaseger · 2020-05-06T13:44:58Z

Awesome thanks

michaelklishin requested changes Mar 4, 2020

View reviewed changes

lib/march_hare/session.rb Outdated Show resolved Hide resolved

lib/march_hare/session.rb Outdated Show resolved Hide resolved

andreaseger force-pushed the end_network_recovery_on_session_close branch from 925db94 to 13b98e9 Compare March 6, 2020 12:23

andreaseger commented Mar 6, 2020

View reviewed changes

andreaseger requested a review from michaelklishin March 6, 2020 13:57

Stop reconnect_on_network_failure loop on session#close

08574c0

- calling `automatically_recover` after explicit `close` will fail now - introduce `reopen` method to enable reconnect after explicit close - make explicit close handling more obvious on recovery

andreaseger force-pushed the end_network_recovery_on_session_close branch from 13b98e9 to 08574c0 Compare March 6, 2020 21:39

michaelklishin approved these changes Mar 7, 2020

View reviewed changes

michaelklishin merged commit b6eb022 into ruby-amqp:master Mar 7, 2020

andreaseger deleted the end_network_recovery_on_session_close branch March 7, 2020 20:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stop reconnect_on_network_failure rescue/retry loop on session#close #149

stop reconnect_on_network_failure rescue/retry loop on session#close #149

andreaseger commented Mar 4, 2020

andreaseger left a comment

andreaseger Mar 6, 2020

andreaseger Mar 6, 2020

andreaseger Mar 6, 2020

michaelklishin Mar 7, 2020

andreaseger Mar 7, 2020

michaelklishin Mar 7, 2020

andreaseger Mar 6, 2020

michaelklishin commented Mar 7, 2020

andreaseger commented May 6, 2020

michaelklishin commented May 6, 2020

andreaseger commented May 6, 2020

stop reconnect_on_network_failure rescue/retry loop on session#close #149

stop reconnect_on_network_failure rescue/retry loop on session#close #149

Conversation

andreaseger commented Mar 4, 2020

andreaseger left a comment

Choose a reason for hiding this comment

andreaseger Mar 6, 2020

Choose a reason for hiding this comment

andreaseger Mar 6, 2020

Choose a reason for hiding this comment

andreaseger Mar 6, 2020

Choose a reason for hiding this comment

michaelklishin Mar 7, 2020

Choose a reason for hiding this comment

andreaseger Mar 7, 2020

Choose a reason for hiding this comment

michaelklishin Mar 7, 2020

Choose a reason for hiding this comment

andreaseger Mar 6, 2020

Choose a reason for hiding this comment

michaelklishin commented Mar 7, 2020

andreaseger commented May 6, 2020

michaelklishin commented May 6, 2020

andreaseger commented May 6, 2020