Clear wnb after close to prevent memory leak to Gateways. #5212

davidzhao · 2024-03-13T06:53:17Z

We are seeing a memory leak whenever the gateway connections fail (and reconnects).

The sequence of events goes like this:

local NATS cluster having a lot of subscriptions (>1MM)
that cluster tries to send topic subscriptions to a remote region with limited bandwidth
the write times out and gateway reconnects
memory grows by a few hundred MBs each iteration
NATS OOMs

The problem is even after reconnection to gateways, the previous flush attempt buffers failed messages, including those that are too large and are timing out. When bandwidth limitation persists, it will lead to each reconnection attempt increasing memory usage by hundreds of MBs.

Signed-off-by: [email protected]

kozlovic

LGTM. Note, I would not consider this a memory leak, but if it is, then we should get to the bottom of it as to what is holding a reference to this connection? I know that we keep a rotation of closed connections, but that should not have a part into keeping a reference to this buffer. Also, this is not specific to gateways, just that in your setup this is affecting gateways because it goes over a poor link, but that would also apply to any other type of connections.
Lastly, are you using GOMEMLIMIT? If not, maybe that is why your server OOM'ed (again, assuming this is not a memory leak but just delayed GC).

davidzhao · 2024-03-13T16:21:47Z

we should get to the bottom of it as to what is holding a reference to this connection? I know that we keep a rotation of closed connections, but that should not have a part into keeping a reference to this buffer.

It looks to me that the connection goes into reconnect when timeout errors are encountered. Is it supposed to close it entirely for a new one to open up? (when the destination is a gateway)

neilalexander · 2024-03-13T16:25:44Z

Need to be especially careful with wnb as it is used outside of the lock during writev calls, so if there is a writev in progress, another goroutine setting wnb = nil will create a data race. Are you sure the flusher is not running at this point?

kozlovic · 2024-03-13T16:51:02Z

@davidzhao The reconnect() call will attempt to re-create a connection, depending on the type of connection we are dealing with. But at the end, it is a brand new object, so the current connection *client should no longer be referenced.

@neilalexander As to the use of wnb without lock, you are right. However, flushAndClose() is only invoked from the writeLoop when it is detected that the connection is "closed" (marked as such) - unless the writeLoop was never started. So I think it should be safe.

kozlovic · 2024-03-13T17:17:25Z

@derekcollison Let me experiment a bit more before we merge. I want to make sure that @neilalexander concerns are addressed.

kozlovic · 2024-03-13T18:32:24Z

@davidzhao We should backtrack a bit. I had assumed that you found from profiling that the memory used was because of wnb, but is that the case? Since you claim that there are over 1M subscriptions, I wonder if the memory increase you see at each reconnect is not due to the fact that the server saves "closed connections" in a ring buffer, and that includes the list of subscriptions, and may be the reason for the memory usage.
I am going to check the impact of setting c.out.wnb to nil, etc.. to make sure that there is no race as @neilalexander fears, but the point is that making those changes would be pointless if this was not the reason for the memory growth. So if you could go in more details on your findings that led you to submit this PR, I would appreciate it.

kozlovic · 2024-03-13T18:48:43Z

@davidzhao Hmm... never mind, after checking it looks like we save closed connections only for client and leaf connections, not other types. Digging more...

kozlovic

Ok, so I have verified that it is safe to access c.out.wnb from here because once a writeLoop is started, server code should always enqueue protocols for the writeLoop to send. The writeLoop is the one invoking flushOutbound and only if it has detected that the connection is closed will it call flushAndClose().
What I am not sure about is why not returning these buffers to the pool causes the memory growth. I mean, should not the GC reclaim those? I am not contesting that they should be returned to the pool, this is the right thing to do, but why is it not reclaimed if the connection is no longer referenced?
@neilalexander Is there anything particular about nbPoolGet that explains this?
@derekcollison I am approving this PR but would rather wait for @neilalexander to comment/approve.

derekcollison · 2024-03-13T20:44:08Z

GC will not reclaim if it does not think it needs to..

So in container based environments its critical to set GOMEMLIMIT to ~75% of the memory that is provisioned in the environment. If the memory provisioned is large, say 32GB, I suggest 16-20GB for GOMEMLIMIT for the server.

davidzhao · 2024-03-14T04:25:19Z

@kozlovic Here's the profile from our production environment. This instance had high memory use and was close to getting killed.

But at the end, it is a brand new object, so the current connection *client should no longer be referenced.

I'm not seeing this path.. tracing the code reconnect would re-use the same object when connecting to remote gateways.

What I am not sure about is why not returning these buffers to the pool causes the memory growth.

I could be wrong, but to me c.out.wnb is still holding onto these buffers. If the transmission fails and the client obj is reused, it'll keep appending to this slice. Clearing it out during close seems to be the sensible thing to do.

neilalexander · 2024-03-14T12:21:45Z

@kozlovic Yes, pools are emptied on a GC run, but if the GC doesn't think it needs to run yet, it's possible for them to build up in size.

The problem may not be the pooled buffers though, it may be the length of the c.out.wnb slice itself. In which case wiping it on shutdown is fine, but we may also want to be checking the cap(c.out.wnb) more regularly (perhaps in flushOutbound) to make sure that one spike doesn't mean we hold onto a large underlying array potentially forever.

@davidzhao Can you please supply the full memory profile from /allocs?debug=0 as the SVG attached looks to be empty?

kozlovic · 2024-03-14T14:50:24Z

I'm not seeing this path.. tracing the code reconnect would re-use the same object when connecting to remote gateways.

@davidzhao, no, look at the reconnect code. In the case of an outbound GW connection, we end-up invoking this:

nats-server/server/client.go

Line 5530 in f1cd3ed

srv.startGoRoutine(func() { srv.reconnectGateway(gwCfg) })

, which invokes this:

nats-server/server/gateway.go

Line 655 in f1cd3ed

func (s *Server) reconnectGateway(cfg *gatewayCfg) {

, then this:

nats-server/server/gateway.go

Line 672 in f1cd3ed

func (s *Server) solicitGateway(cfg *gatewayCfg, firstConnect bool) {

, finally this:

nats-server/server/gateway.go

Line 763 in f1cd3ed

func (s *Server) createGateway(cfg *gatewayCfg, url *url.URL, conn net.Conn) {

where a new connection (*client) is created here:

nats-server/server/gateway.go

Line 768 in f1cd3ed

c := &client{srv: s, nc: conn, start: now, last: now, kind: GATEWAY}

Starting at the first link I posted, there is no longer any reference to the original connection *client.

davidzhao · 2024-03-14T17:49:49Z

@davidzhao, no, look at the reconnect code. In the case of an outbound GW connection, we end-up invoking this:

Thank you for keeping me honest. that's the thread that I missed.

So the observed "leak" might just be that GC isn't reclaiming the memory. And I think the PR will still help to reduce allocations by returning these buffers back to the pool

davidzhao · 2024-03-18T21:49:40Z

Update: we've set GOMEMLIMIT and observed that GC is working much harder. However, it's still not cleaning up after the memory held in wnb. Does this mean that something is holding on to the old client obj? Updated profiles attached

updated-profile.zip

kozlovic

Unfortunately, @neilalexander is right that this is "racy". I have seen a test fail with a data race when an internalSendLoop for system account is calling flushClients()->flushOutbound() which races with accessing it from here.

However, I have spent a lot of time to figure out what is holding a reference to the connection and I may have find it but need more time to investigate and see if that would prevent the memory usage without doing the cleaning here.

The main issue for router/gateway connections were that they would be registered with the global account but never removed from there, which would cause the connection to be retained. The memory leak was apparent in one user setup with large amount of subscriptions that were transfered to a slow gateway, causing accumulation of partial buffers before the connection was dropped. For gateway connections, we also needed to clean the outbound's map since it can contain sublist referencing its own connection. Same for subscriptions in c.subs. Another found cause for retained connection is with leafnode where we are using a subscription referencing the leafnode connection that was globally registered in a global sublist but never removed. Also the connection's c.out.sg (a `*Cond`) needed to be set to `nil` when the connection is closed. Without that, it seems that the connection would not be released (at least based on experimentation). We now make sure that c.out.sg is not nil before sending a signal. The bottom line is that it looks like having an object referencing itself in some way, prevents the GC from releasing the object, even if the "top level" object is no longer reachable. For instance, suppose `obj.someField = obj` seem to prevent the GC from releasing `obj`, even if `obj` itself is no longer referenced. The original issue/PR (#5212) was simply cleaning the c.out.wnb, but the way it was done was racy since c.out.wnb is used without lock in flushOutbound. Once the retain/release issue is fixed, cleaning this buffer is not really required (but good practice especially if not running with GOMEMLIMIT), so we take care of cleaning this up, but under the protection of the flushOutbound flag. If set, flushAndClose() will not do the cleaning, flushOutbound will do it. Relates to #5212 Signed-off-by: Ivan Kozlovic <[email protected]>

kozlovic · 2024-03-26T00:37:34Z

@davidzhao I have submitted PR #5244 to address the issue in a more systemic way. You should be referenced in the release notes should my PR be accepted.

davidzhao · 2024-03-26T00:40:46Z

@davidzhao I have submitted PR #5244 to address the issue in a more systemic way. You should be referenced in the release notes should my PR be accepted.

That's fantastic news! I'm looking forward to trying it out once it makes it into a release. In the mean time, we have been running on a patched build with clearing wnb and it has improved the memory footprint in our case. But you are right that it's addressing the symptom and not the root cause.

…5244) The main issue for router/gateway connections were that they would be registered with the global account but never removed from there, which would cause the connection to be retained. The memory leak was apparent in one user setup with large amount of subscriptions that were transfered to a slow gateway, causing accumulation of partial buffers before the connection was dropped. For gateway connections, we also needed to clean the outbound's map since it can contain sublist referencing its own connection. Same for subscriptions in c.subs. Another found cause for retained connection is with leafnode where we are using a subscription referencing the leafnode connection that was globally registered in a global sublist but never removed. Also the connection's c.out.sg (a `*Cond`) needed to be set to `nil` when the connection is closed. Without that, it seems that the connection would not be released (at least based on experimentation). We now make sure that c.out.sg is not nil before sending a signal. The bottom line is that it looks like having an object referencing itself in some way, prevents the GC from releasing the object, even if the "top level" object is no longer reachable. For instance, suppose `obj.someField = obj` seem to prevent the GC from releasing `obj`, even if `obj` itself is no longer referenced. The original issue/PR (#5212) was simply cleaning the c.out.wnb, but the way it was done was racy since c.out.wnb is used without lock in flushOutbound. Once the retain/release issue is fixed, cleaning this buffer is not really required (but good practice especially if not running with GOMEMLIMIT), so we take care of cleaning this up, but under the protection of the flushOutbound flag. If set, flushAndClose() will not do the cleaning, flushOutbound will do it. Relates to #5212 Signed-off-by: Ivan Kozlovic <[email protected]>

kozlovic · 2024-03-26T16:24:47Z

@davidzhao Let's close this PR and the other has already been merged to main and should be available in nightly builds. If you have a chance to test it out, that would be great.

The main issue for router/gateway connections were that they would be registered with the global account but never removed from there, which would cause the connection to be retained. The memory leak was apparent in one user setup with large amount of subscriptions that were transfered to a slow gateway, causing accumulation of partial buffers before the connection was dropped. For gateway connections, we also needed to clean the outbound's map since it can contain sublist referencing its own connection. Same for subscriptions in c.subs. Another found cause for retained connection is with leafnode where we are using a subscription referencing the leafnode connection that was globally registered in a global sublist but never removed. Also the connection's c.out.sg (a `*Cond`) needed to be set to `nil` when the connection is closed. Without that, it seems that the connection would not be released (at least based on experimentation). We now make sure that c.out.sg is not nil before sending a signal. The bottom line is that it looks like having an object referencing itself in some way, prevents the GC from releasing the object, even if the "top level" object is no longer reachable. For instance, suppose `obj.someField = obj` seem to prevent the GC from releasing `obj`, even if `obj` itself is no longer referenced. The original issue/PR (#5212) was simply cleaning the c.out.wnb, but the way it was done was racy since c.out.wnb is used without lock in flushOutbound. Once the retain/release issue is fixed, cleaning this buffer is not really required (but good practice especially if not running with GOMEMLIMIT), so we take care of cleaning this up, but under the protection of the flushOutbound flag. If set, flushAndClose() will not do the cleaning, flushOutbound will do it. Relates to #5212 Signed-off-by: Ivan Kozlovic <[email protected]>

davidzhao requested a review from a team as a code owner March 13, 2024 06:53

derekcollison requested a review from kozlovic March 13, 2024 10:46

kozlovic approved these changes Mar 13, 2024

View reviewed changes

Clear wnb after close to prevent memory leak to Gateways

5728eb7

davidzhao force-pushed the fix-leak branch from 8c6da3b to 5728eb7 Compare March 19, 2024 20:43

kozlovic requested changes Mar 25, 2024

View reviewed changes

kozlovic mentioned this pull request Mar 26, 2024

[FIXED] Possible memory leak due to connections not being released #5244

Merged

kozlovic closed this Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clear wnb after close to prevent memory leak to Gateways. #5212

Clear wnb after close to prevent memory leak to Gateways. #5212

davidzhao commented Mar 13, 2024

kozlovic left a comment

davidzhao commented Mar 13, 2024

neilalexander commented Mar 13, 2024

kozlovic commented Mar 13, 2024

kozlovic commented Mar 13, 2024

kozlovic commented Mar 13, 2024

kozlovic commented Mar 13, 2024

kozlovic left a comment

derekcollison commented Mar 13, 2024

davidzhao commented Mar 14, 2024

neilalexander commented Mar 14, 2024

kozlovic commented Mar 14, 2024 •

edited

Loading

davidzhao commented Mar 14, 2024 •

edited

Loading

davidzhao commented Mar 18, 2024

kozlovic left a comment

kozlovic commented Mar 26, 2024

davidzhao commented Mar 26, 2024

kozlovic commented Mar 26, 2024

Clear wnb after close to prevent memory leak to Gateways. #5212

Clear wnb after close to prevent memory leak to Gateways. #5212

Conversation

davidzhao commented Mar 13, 2024

kozlovic left a comment

Choose a reason for hiding this comment

davidzhao commented Mar 13, 2024

neilalexander commented Mar 13, 2024

kozlovic commented Mar 13, 2024

kozlovic commented Mar 13, 2024

kozlovic commented Mar 13, 2024

kozlovic commented Mar 13, 2024

kozlovic left a comment

Choose a reason for hiding this comment

derekcollison commented Mar 13, 2024

davidzhao commented Mar 14, 2024

neilalexander commented Mar 14, 2024

kozlovic commented Mar 14, 2024 • edited Loading

davidzhao commented Mar 14, 2024 • edited Loading

davidzhao commented Mar 18, 2024

kozlovic left a comment

Choose a reason for hiding this comment

kozlovic commented Mar 26, 2024

davidzhao commented Mar 26, 2024

kozlovic commented Mar 26, 2024

kozlovic commented Mar 14, 2024 •

edited

Loading

davidzhao commented Mar 14, 2024 •

edited

Loading