File descriptor leak under high concurrency #145

ThePumpingLemma · 2023-10-30T20:32:20Z

Describe the bug
With our PoC at
https://github.com/kevink-sq/jetty-concurrency-issue-poc/tree/575469b3453e7b09d9d60f6460cdab8117b8e773, we noticed that there is unbounded growth of file descriptors when running our simple load test:

seq 1 15000 | xargs -P1500 -I{} curl --unix-socket ./junix-ingress.sock http://localhost/process

Leading to timeouts from curl and timeout exceptions in jetty:

java.util.concurrent.TimeoutException: Idle timeout expired: 32422/30000 ms
        at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:170)
        at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:112)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)

To Reproduce
Steps to reproduce the behavior (assuming macOS):

Checkout https://github.com/kevink-sq/jetty-concurrency-issue-poc/tree/575469b3453e7b09d9d60f6460cdab8117b8e773
Run mvn compile exec:java -Dexec.mainClass="org.example.Main"
Run seq 1 15000 | xargs -P1500 -I{} curl --unix-socket ./junix-ingress.sock http://localhost/process

Expected behavior
All curl calls should succeed and there should be no timeouts in jetty.

Output/Screenshots
N/A

Please make sure to test the problem still occurs on the latest version of junixsocket
Tested on v2.8.2

Notes
Proposed fix: #144

The text was updated successfully, but these errors were encountered:

kevink-sq · 2023-11-03T22:38:32Z

Hi @kohlschuetter, would it be possible to look into this issue?

kohlschuetter · 2023-11-06T12:05:22Z

Your commands to reproduce may not work on all platforms (e.g., Alpine Linux) and may return the wrong process ("grep" itself on macOS).

I'm trying to verify with

mvn clean compile exec:java -Dexec.mainClass="org.example.Main"

and

sudo apk add lsof
lsof -p $(jps -v | grep jetty-concurrency | cut -f1 -d' ') | grep junix-ingress

The commit you recommend to reproduce does not work for me. When I manually remove the monkey-patched files, everything still works OK on macOS, but I see connection errors on Alpine Linux. The monkey patched code (assuming it's the same as in the PR), indeed seems to fix that bug, but I'm not seeing the timeout error messages.

ThePumpingLemma · 2023-11-06T14:44:48Z

The commit you recommend to reproduce does not work for me. When I manually remove the monkey-patched files, everything still works OK on macOS, but I see connection errors on Alpine Linux. The monkey patched code (assuming it's the same as in the PR), indeed seems to fix that bug, but I'm not seeing the timeout error messages.

I didn't test on Alpine; @kevin-sq did reproduce on Linux?

Not sure if macOS version matters in this case:

Darwin redacted.local 23.1.0 Darwin Kernel Version 23.1.0: Mon Oct  9 21:27:24 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T6000 arm64

kohlschuetter · 2023-11-06T16:33:45Z

@ThePumpingLemma :

lsof -p $(ps aux | grep "org.example.Main" | awk {'print$2'} | head -n 1) | grep junix-ingress.sock | wc -l

may actually get the PID of the grep command ,which the jps approach prevents. You could also add another grep for "java", and remove the head -n 1 to raise a red flag should there still be more than one line:

lsof -p $(ps aux | grep "org.example.Main" | grep java | awk {'print$2'}) | grep junix-ingress.sock | wc -l

Also beware that lsof on Alpine is busybox-based, so it will actually show all PIDs unless we install the proper binary via sudo apk add lsof

kohlschuetter · 2023-11-06T16:35:36Z

I've merged the PR, however I'm not really happy that we don't have a proper test case. However, I can see these errors in curl go away with the fix.

kevink-sq · 2023-11-06T18:19:31Z

I'm able to reproduce the leaking junix file descriptor in linux via multipass against this baseline branch (which is the same as the above 575469b3453e7b09d9d60f6460cdab8117b8e773 but removed debug logging and other monkey patch files).

Linux info:

Welcome to Ubuntu 22.04.3 LTS (GNU/Linux 5.15.0-87-generic aarch64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Mon Nov  6 09:45:35 PST 2023

  System load:             0.1953125
  Usage of /:              10.9% of 24.05GB
  Memory usage:            8%
  Swap usage:              0%
  Processes:               127
  Users logged in:         1

In one terminal started monitoring the fd count of junix-ingress.sock

while sleep 1; do echo $(lsof -p $(ps aux | grep "org.example.Main" | awk {'print$2'} | head -n 1) | grep junix-ingress.sock | wc -l); done

In another terminal, started the server against baseline branch

mvn compile exec:java -Dexec.mainClass="org.example.Main"

In another terminal started calling the junix uds:

seq 1 300000 | xargs -P1500 -I{} curl --unix-socket /tmp/junix-ingress.sock http://localhost/process

I could verify the monitor count started climbing but not for the native endpoint. Screenshots of the above process:

Server started with monitoring. Junix uds not being called yet:

Started calling junix uds

Running against native endpoint does not leak

kevink-sq · 2023-11-06T18:27:47Z

Repeating the above with the monkey patched fix (baseline branch patched) results in junix behaving the same as native.

kohlschuetter · 2023-11-06T18:34:38Z

Can you please verify with the latest 2.8.3-SNAPSHOT, and all monkey-patched code removed?

Make sure the following section is in your POM:

            <repositories>
                <repository>
                    <id>snapshots-repo</id>
                    <url>https://oss.sonatype.org/content/repositories/snapshots</url>
                    <releases>
                        <enabled>false</enabled>
                    </releases>
                    <snapshots>
                        <enabled>true</enabled>
                    </snapshots>
                </repository>
            </repositories>

kevink-sq · 2023-11-06T18:45:53Z

I can verify the fix works against the snapshot with all of the monkey-patched code removed: baseline branch 2.8.3-SNAPSHOT

ThePumpingLemma · 2023-11-06T20:09:43Z

I've merged the PR, however I'm not really happy that we don't have a proper test case. However, I can see these errors in curl go away with the fix.

It's definitely possible to stress test this, so we could make integration tests to that effect.

However, I don't see why we couldn't write the code in such a way that these routines are unit testable; then we could mock selection keys and see how the cleanup code handles it. I could take a pass at a refactor to facilitate with unit testing if you would like?

kohlschuetter · 2023-11-06T22:44:36Z

Yes that would be great, thank you!

ThePumpingLemma · 2023-11-07T20:16:32Z

@kohlschuetter I think I want to approach it by first making some broader fixes before I break it down into unit testable functions. The first thing I want to attack is locking, where I noticed some inconsistencies in the way locks are managed. Here is a PR where I do some renaming followed by lock normalization: #146

I would like to follow this up with reworking how PollFd objects are managed; I think it's a bit hard to follow and I also think there is redundant code in initPollFd (which is the code we need to unit test).

Clients may iterate upon AFSelector.selectedKeys() while the selected keys are modified in another thread. Change the underlying datastructure to be thread-safe (use ConcurrentHashMap), and add all "ready" selectors even if they're marked invalid (they will be removed later). We reimplement the fix as the original patch caused a FileDescriptor leak and TIPC test failures. #142 #145

kohlschuetter · 2023-11-09T14:30:34Z

Before we continue with improvements to AFSelector, please verify that the updated patch (5e8a6e6) resolves all concurrency issues / FileDescriptor leaks for you.

I reverted the original patch attempts because we really introduced the regression in 2.8.2, and even with the latest changes, a TIPC test would fail on Linux.

Please also verify by running the selftest jar on Linux with the TIPC kernel module loaded, and a bearer enabled:

sudo modprobe tipc
sudo tipc bearer enable media eth device eth0
java -ea -jar junixsocket-selftest-2.8.3-SNAPSHOT-jar-with-dependencies.jar

ThePumpingLemma · 2023-11-09T15:04:33Z

Before we continue with improvements to AFSelector, please verify that the updated patch (5e8a6e6) resolves all concurrency issues / FileDescriptor leaks for you.

It seems to be working okay on macOS from a preliminary test.

However, this patch doesn't implement the locking in the way that the documentation suggests we should:

A selector's selected-key set is not, in general, safe for use by
multiple concurrent threads. If such a thread might modify the set directly
then access should be controlled by synchronizing on the set itself. The
iterators returned by the set's {@link java.util.Set#iterator() iterator}
methods are fail-fast: If the set is modified after the iterator is
created, in any way except by invoking the iterator's own {@link
java.util.Iterator#remove() remove} method, then a {@link
java.util.ConcurrentModificationException} will be thrown.

IMO, I think it would make sense to implement things as similar to Java's standard library as possible, but I'll leave that up to you.

I don't have a Linux test setup handy, maybe @kevink-sq can try to verify the change?

kohlschuetter · 2023-11-09T15:14:27Z

Yeah, it's really odd. When you replace that ConcurrentHashMap with HashMap, jetty throws a ConcurrentModificationException, which means that either two threads are accessing the structure concurrently, or modifications happen after the iterator has been created. Maybe there's something else we're missing?

kohlschuetter · 2023-11-09T15:26:10Z

So I think it's the .clear() that's throwing off jetty. In the original SelectorImpl, they check if the key already exists (processReadyEvents):

            if (selectedKeys.contains(ski)) {
                if (ski.translateAndUpdateReadyOps(rOps)) {
                    return 1;
                }
            } else {
                ski.translateAndSetReadyOps(rOps);
                if ((ski.nioReadyOps() & ski.nioInterestOps()) != 0) {
                    selectedKeys.add(ski);
                    return 1;
                }
            }

Since we already have a ConcurrentHashMap, let's make use of the value, which now indicates the "selected" state. Introduce MapValueSet, which is a view over elements of the "keysRegistered" map, and precisely only those elements that have a certain value. For each call to select, we increment the expected value, and then set only the actually selected entries to that value, so we don't have to clear the entire map. #145

kohlschuetter · 2023-11-09T18:44:45Z

@ThePumpingLemma I was able to remove the second ConcurrentHashMap with 38cda27. Can you please take another look?

Since we already have a ConcurrentHashMap, let's make use of the value, which now indicates the "selected" state. Introduce MapValueSet, which is a view over elements of the "keysRegistered" map, and precisely only those elements that have a certain value. For each call to select, we increment the expected value, and then set only the actually selected entries to that value, so we don't have to clear the entire map. #145

kohlschuetter · 2023-11-09T18:48:42Z

(Fixed checkstyle warning)

Fix errors only occurring with -Drelease #145

kevink-sq · 2023-11-09T19:40:33Z

@kohlschuetter are the latest changes (including 38cda27) on 2.8.3-SNAPSHOT?

kohlschuetter · 2023-11-09T19:56:11Z

@kevink-sq Uploaded just now! 20231109.194633-3
Be sure to run mvn with -U to update snapshots.

ThePumpingLemma · 2023-11-09T20:20:35Z

I reviewed the changes and have some thoughts:

I think this is a lot more complicated than synchronizing on this and selectedKeysPublic. My preference would be to keep the locking + data structures simple and try to mirror the reference implementation (i.e. Java standard library) to minimize surprises.
MapValueSet could be thread-safe, but I'm not convinced without doing a really thorough analysis (I looked pretty closely at the iterator, but still need to chew on the rest). In contrast, it is really easy to look at synchronized blocks and convince yourself that all of the state mutation is under a lock and that there are no races.
MapValueSet is designed to be very generic (e.g. removedSentinel is optional), but I'm not sure it's worth the additional complexity. If you decide to stick with this approach, consider simplifying where possible.

All in all, I think it is possible to prove your approach safe, but I'm not personally comfortable with it.

kohlschuetter · 2023-11-09T20:48:07Z

MapValueSet wraps the ConcurrentHashMap, so it is as safe as that, and may throw the ConcurrentModificationException only in certain narrow circumstances with iterator(), which is in line with the AFSelector documentation (selectedKeys() sets don't have to be thread-safe). I actually wanted to build this a long time ago, since having a single map for all keys simplifies things quite a bit, so I'm very comfortable with it.

Once we had the ConcurrentHashMap for registered keys, it was clear that simply ignoring the value was a waste. Adding another ConcurrentHashMap resulted in simpler code but higher memory requirements. I'm happy that we got that out of the way.

kohlschuetter · 2023-11-09T20:50:22Z

As a next step, could you take a look at creating some unit tests around the selected keys behavior? Once we have good test coverage, we can make more confident optimizations without introducing further regressions.

Thanks for your help, @ThePumpingLemma!

kevink-sq · 2023-11-09T22:28:09Z

Reran the above tests against junixsocket-jetty-2.8.3-20231109.194633-3.jar artifact and the concurrency issue and leaking file descriptors are still fixed.

kevink-sq · 2023-11-10T00:16:43Z

I'm also for @ThePumpingLemma's approach to mirroring the reference implementation in the Java standard library. Things like the concurrency issue and the file descriptor leaks were resolved from following their examples and straying from them with novel implementations may have setbacks simply because they haven't been battle tested as much. Unless there are clear benefits to the approach, I'm not sure what is gained.

Furthermore, JEP-380 implements kqueue/epoll under the hood which gives a performance edge over junix. If junix were to follow suit, I think it'd be easier to make the changes if references were taken from the Java standard library beforehand.

kohlschuetter · 2023-11-10T01:19:55Z

Once Java supports all sorts of obscure socket types like AF_UNIX, AF_TIPC, AF_VSOCK, etc., there will probably be little need for junixsocket, which would actually be a good thing.

Until then, junixsocket remains a standalone project, under a different license, so copying code from the standard library just isn't an easy option.

Moreover, unlike commercial endeavors like Square/Cash app, this is my personal project. I'm not being paid to develop it, which means I can deliberately spend some extra time on polishing bits that I would otherwise leave as-is.

I hear your concerns. I don't want to sound harsh. However, please accept my explanation, and decision.

In this instance, removing the second Set is, in my view, the right thing to do, as it removes unnecessary allocation operations in a hot path (polling sockets). It also obviously fixes the issue you initially reported, without introducing any obvious regressions (unlike the earlier attempts).

Since I see you care about improving junixsocket, I hope you continue to contribute by filing tickets and PRs. Abstract, high-level critique as in "not battle tested" or "what about kqueue/epoll" is not very constructive for this present ticket. Please be very specific with your concerns, and, ideally, provide solutions for these concerns. As I said, I'm not being paid for this project, so I have to manage my attention wisely.

If you find that junixsocket is slower/worse than some other approach, please file a bug with some data/code, ideally with a PR that you accept may end up not the way it started.

As I explained earlier, the best way to contribute to the very specific situation of AFSelector is to improve the unit test coverage around selectors. When we have good coverage, we can look into taking more inspiration from the standard library if and only if there is a measurable benefit. With good test coverage, the conversation automatically becomes more detail-oriented, more data-informed, and less opinion-based.

You're very welcome to use JEP-380, just as you're welcome to help improve junixsocket. For what it's worth, I'm seeing higher throughput with junixsocket, compared to JEP-380, and I hope you will find too that junixsocket is the better choice, at least for the foreseeable future.

Clients may iterate upon AFSelector.selectedKeys() while the selected keys are modified in another thread. Change the underlying datastructure to be thread-safe (use ConcurrentHashMap), and add all "ready" selectors even if they're marked invalid (they will be removed later). We reimplement the fix as the original patch caused a FileDescriptor leak and TIPC test failures. #142 #145

Since we already have a ConcurrentHashMap, let's make use of the value, which now indicates the "selected" state. Introduce MapValueSet, which is a view over elements of the "keysRegistered" map, and precisely only those elements that have a certain value. For each call to select, we increment the expected value, and then set only the actually selected entries to that value, so we don't have to clear the entire map. #145

Fix errors only occurring with -Drelease #145

kohlschuetter · 2023-11-12T13:44:32Z

2.8.3 has just been released, including this fix. Please verify and re-open this ticket if necessary. Thanks again for reporting and working with me on this issue.

kohlschuetter added a commit that referenced this issue Nov 9, 2023

AFSelector: Bring back Java 7 compatibility

047c9a8

Fix errors only occurring with -Drelease #145

kohlschuetter added a commit that referenced this issue Nov 9, 2023

AFSelector: Bring back Java 7 compatibility

a7c56f6

Fix errors only occurring with -Drelease #145

kohlschuetter closed this as completed Nov 10, 2023

kohlschuetter added this to the 2.8.3 milestone Nov 10, 2023

kohlschuetter self-assigned this Nov 10, 2023

kohlschuetter added bug The issue describes a bug in the code merged A fix has been merged to main labels Nov 10, 2023

kohlschuetter reopened this Nov 10, 2023

kohlschuetter modified the milestones: 2.8.3, 2.9.0 Nov 10, 2023

kohlschuetter added a commit that referenced this issue Nov 12, 2023

AFSelector: Bring back Java 7 compatibility

efacb5a

Fix errors only occurring with -Drelease #145

kohlschuetter modified the milestones: 2.9.0, 2.8.3 Nov 12, 2023

kohlschuetter closed this as completed Nov 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File descriptor leak under high concurrency #145

File descriptor leak under high concurrency #145

ThePumpingLemma commented Oct 30, 2023

kevink-sq commented Nov 3, 2023

kohlschuetter commented Nov 6, 2023

ThePumpingLemma commented Nov 6, 2023

kohlschuetter commented Nov 6, 2023 •

edited

Loading

kohlschuetter commented Nov 6, 2023

kevink-sq commented Nov 6, 2023 •

edited

Loading

kevink-sq commented Nov 6, 2023

kohlschuetter commented Nov 6, 2023

kevink-sq commented Nov 6, 2023

ThePumpingLemma commented Nov 6, 2023 •

edited

Loading

kohlschuetter commented Nov 6, 2023

ThePumpingLemma commented Nov 7, 2023

kohlschuetter commented Nov 9, 2023 •

edited

Loading

ThePumpingLemma commented Nov 9, 2023

kohlschuetter commented Nov 9, 2023

kohlschuetter commented Nov 9, 2023

kohlschuetter commented Nov 9, 2023

kohlschuetter commented Nov 9, 2023

kevink-sq commented Nov 9, 2023

kohlschuetter commented Nov 9, 2023

ThePumpingLemma commented Nov 9, 2023

kohlschuetter commented Nov 9, 2023

kohlschuetter commented Nov 9, 2023

kevink-sq commented Nov 9, 2023

kevink-sq commented Nov 10, 2023

kohlschuetter commented Nov 10, 2023

kohlschuetter commented Nov 12, 2023

File descriptor leak under high concurrency #145

File descriptor leak under high concurrency #145

Comments

ThePumpingLemma commented Oct 30, 2023

kevink-sq commented Nov 3, 2023

kohlschuetter commented Nov 6, 2023

ThePumpingLemma commented Nov 6, 2023

kohlschuetter commented Nov 6, 2023 • edited Loading

kohlschuetter commented Nov 6, 2023

kevink-sq commented Nov 6, 2023 • edited Loading

kevink-sq commented Nov 6, 2023

kohlschuetter commented Nov 6, 2023

kevink-sq commented Nov 6, 2023

ThePumpingLemma commented Nov 6, 2023 • edited Loading

kohlschuetter commented Nov 6, 2023

ThePumpingLemma commented Nov 7, 2023

kohlschuetter commented Nov 9, 2023 • edited Loading

ThePumpingLemma commented Nov 9, 2023

kohlschuetter commented Nov 9, 2023

kohlschuetter commented Nov 9, 2023

kohlschuetter commented Nov 9, 2023

kohlschuetter commented Nov 9, 2023

kevink-sq commented Nov 9, 2023

kohlschuetter commented Nov 9, 2023

ThePumpingLemma commented Nov 9, 2023

kohlschuetter commented Nov 9, 2023

kohlschuetter commented Nov 9, 2023

kevink-sq commented Nov 9, 2023

kevink-sq commented Nov 10, 2023

kohlschuetter commented Nov 10, 2023

kohlschuetter commented Nov 12, 2023

kohlschuetter commented Nov 6, 2023 •

edited

Loading

kevink-sq commented Nov 6, 2023 •

edited

Loading

ThePumpingLemma commented Nov 6, 2023 •

edited

Loading

kohlschuetter commented Nov 9, 2023 •

edited

Loading