-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The python tornado integration causes the ioloop to stall #191
Comments
@krish203 "TIME_WAIT indicates that the local endpoint (your side) has closed the connection. The connection is being kept around so that any delayed packets can be matched to the connection and handled appropriately. The connections will be removed when they time out within four minutes." As this is a function of how TCP works, I do not believe this is caused by Bugsnag's notifier but just as a result of a high number of errors. I can think of two possible solutions: one would be to set SO_REUSEADDR to reuse the TIME_WAIT connections and another would be to load balance the application so there are less requests handled on each server. Do you think either of these would help? Also, just to clarify, you mentioned ~6300 connections were made, I assume this is because 6300 errors occurred in the app for which Bugsnag raised each to notify.bugsnag.com, does that sound correct? |
Closing issue for now as we haven’t heard back. If you wish to follow up then please comment below and we’ll reopen the issue |
Sorry was on parental leave, could not follow up. I did try to reuse the TIME_WAIT connections, that did not help. For whatever reason, I removed the Bugsnag Handler from the logger, and the ioloop stalling problem goes away. Is there a way to internally debug the bugsnag library to figure out what might be going on.
Yes that's correct. |
@krish203 Thanks for the additional information. As far as I can tell everything is behaving as expected. The fact that so many errors are occurring per second is causing the TCP connections in TIME_WAIT to stall your app. My best suggestion would be to load balance the web app across additional servers to reduce load. Closing for now as I cannot see any evidence that the Bugsnag notifier has an issue, but please comment and we can reopen should you have reason to believe that not to be the case. |
Description
Recently, we have been facing an issue where a large number of
TIME_WAIT
connections causes the tornadoioloop
to stall.Issue
When the application starts, initially it runs fine, but when an error gets reported, a large number of connections start getting queued in
TIME_WAIT
state to this ip:35.186.205.6
. Anetstat
command output gave me ~6300 connections in this state. This eventually caused the ioloop of the tornado application to lock up, causing timeouts in other parts of the application where we make use of the tornado Async HTTP Client.Environment
Library versions:
Output of the command below was
~6300
:bugsnag.tornado.BugsnagRequestHandler
andBugsnagHandler
Async, do they need to be called from an Executor instance?The text was updated successfully, but these errors were encountered: