Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File descriptor leak under high concurrency #145

Closed
ThePumpingLemma opened this issue Oct 30, 2023 · 27 comments
Closed

File descriptor leak under high concurrency #145

ThePumpingLemma opened this issue Oct 30, 2023 · 27 comments
Assignees
Labels
bug The issue describes a bug in the code merged A fix has been merged to main
Milestone

Comments

@ThePumpingLemma
Copy link
Contributor

Describe the bug
With our PoC at
https://github.com/kevink-sq/jetty-concurrency-issue-poc/tree/575469b3453e7b09d9d60f6460cdab8117b8e773, we noticed that there is unbounded growth of file descriptors when running our simple load test:

seq 1 15000 | xargs -P1500 -I{} curl --unix-socket ./junix-ingress.sock http://localhost/process

Leading to timeouts from curl and timeout exceptions in jetty:

java.util.concurrent.TimeoutException: Idle timeout expired: 32422/30000 ms
        at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:170)
        at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:112)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)

To Reproduce
Steps to reproduce the behavior (assuming macOS):

  1. Checkout https://github.com/kevink-sq/jetty-concurrency-issue-poc/tree/575469b3453e7b09d9d60f6460cdab8117b8e773
  2. Run mvn compile exec:java -Dexec.mainClass="org.example.Main"
  3. Run seq 1 15000 | xargs -P1500 -I{} curl --unix-socket ./junix-ingress.sock http://localhost/process

Expected behavior
All curl calls should succeed and there should be no timeouts in jetty.

Output/Screenshots
N/A

Please make sure to test the problem still occurs on the latest version of junixsocket
Tested on v2.8.2

Notes
Proposed fix: #144

@kevink-sq
Copy link
Contributor

Hi @kohlschuetter, would it be possible to look into this issue?

@kohlschuetter
Copy link
Member

Your commands to reproduce may not work on all platforms (e.g., Alpine Linux) and may return the wrong process ("grep" itself on macOS).

I'm trying to verify with

mvn clean compile exec:java -Dexec.mainClass="org.example.Main"

and

sudo apk add lsof
lsof -p $(jps -v | grep jetty-concurrency | cut -f1 -d' ') | grep junix-ingress

The commit you recommend to reproduce does not work for me. When I manually remove the monkey-patched files, everything still works OK on macOS, but I see connection errors on Alpine Linux. The monkey patched code (assuming it's the same as in the PR), indeed seems to fix that bug, but I'm not seeing the timeout error messages.

@ThePumpingLemma
Copy link
Contributor Author

The commit you recommend to reproduce does not work for me. When I manually remove the monkey-patched files, everything still works OK on macOS, but I see connection errors on Alpine Linux. The monkey patched code (assuming it's the same as in the PR), indeed seems to fix that bug, but I'm not seeing the timeout error messages.

I didn't test on Alpine; @kevin-sq did reproduce on Linux?

Not sure if macOS version matters in this case:

Darwin redacted.local 23.1.0 Darwin Kernel Version 23.1.0: Mon Oct  9 21:27:24 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T6000 arm64

@kohlschuetter
Copy link
Member

kohlschuetter commented Nov 6, 2023

@ThePumpingLemma :

lsof -p $(ps aux | grep "org.example.Main" | awk {'print$2'} | head -n 1) | grep junix-ingress.sock | wc -l 

may actually get the PID of the grep command ,which the jps approach prevents. You could also add another grep for "java", and remove the head -n 1 to raise a red flag should there still be more than one line:

lsof -p $(ps aux | grep "org.example.Main" | grep java | awk {'print$2'}) | grep junix-ingress.sock | wc -l 

Also beware that lsof on Alpine is busybox-based, so it will actually show all PIDs unless we install the proper binary via sudo apk add lsof

@kohlschuetter
Copy link
Member

I've merged the PR, however I'm not really happy that we don't have a proper test case. However, I can see these errors in curl go away with the fix.

@kevink-sq
Copy link
Contributor

kevink-sq commented Nov 6, 2023

I'm able to reproduce the leaking junix file descriptor in linux via multipass against this baseline branch (which is the same as the above 575469b3453e7b09d9d60f6460cdab8117b8e773 but removed debug logging and other monkey patch files).

Linux info:

Welcome to Ubuntu 22.04.3 LTS (GNU/Linux 5.15.0-87-generic aarch64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Mon Nov  6 09:45:35 PST 2023

  System load:             0.1953125
  Usage of /:              10.9% of 24.05GB
  Memory usage:            8%
  Swap usage:              0%
  Processes:               127
  Users logged in:         1

In one terminal started monitoring the fd count of junix-ingress.sock

while sleep 1; do echo $(lsof -p $(ps aux | grep "org.example.Main" | awk {'print$2'} | head -n 1) | grep junix-ingress.sock | wc -l); done

In another terminal, started the server against baseline branch

mvn compile exec:java -Dexec.mainClass="org.example.Main"

In another terminal started calling the junix uds:

seq 1 300000 | xargs -P1500 -I{} curl --unix-socket /tmp/junix-ingress.sock http://localhost/process

I could verify the monitor count started climbing but not for the native endpoint. Screenshots of the above process:

Server started with monitoring. Junix uds not being called yet:

Screenshot 2023-11-06 at 10 12 12 AM

Started calling junix uds

Screenshot 2023-11-06 at 10 18 10 AM

Running against native endpoint does not leak

Screenshot 2023-11-06 at 10 16 20 AM

Screenshot 2023-11-06 at 10 16 53 AM

@kevink-sq
Copy link
Contributor

Repeating the above with the monkey patched fix (baseline branch patched) results in junix behaving the same as native.

@kohlschuetter
Copy link
Member

Can you please verify with the latest 2.8.3-SNAPSHOT, and all monkey-patched code removed?

Make sure the following section is in your POM:

            <repositories>
                <repository>
                    <id>snapshots-repo</id>
                    <url>https://oss.sonatype.org/content/repositories/snapshots</url>
                    <releases>
                        <enabled>false</enabled>
                    </releases>
                    <snapshots>
                        <enabled>true</enabled>
                    </snapshots>
                </repository>
            </repositories>

@kevink-sq
Copy link
Contributor

I can verify the fix works against the snapshot with all of the monkey-patched code removed: baseline branch 2.8.3-SNAPSHOT

@ThePumpingLemma
Copy link
Contributor Author

ThePumpingLemma commented Nov 6, 2023

I've merged the PR, however I'm not really happy that we don't have a proper test case. However, I can see these errors in curl go away with the fix.

It's definitely possible to stress test this, so we could make integration tests to that effect.

However, I don't see why we couldn't write the code in such a way that these routines are unit testable; then we could mock selection keys and see how the cleanup code handles it. I could take a pass at a refactor to facilitate with unit testing if you would like?

@kohlschuetter
Copy link
Member

Yes that would be great, thank you!

@ThePumpingLemma
Copy link
Contributor Author

@kohlschuetter I think I want to approach it by first making some broader fixes before I break it down into unit testable functions. The first thing I want to attack is locking, where I noticed some inconsistencies in the way locks are managed. Here is a PR where I do some renaming followed by lock normalization: #146

I would like to follow this up with reworking how PollFd objects are managed; I think it's a bit hard to follow and I also think there is redundant code in initPollFd (which is the code we need to unit test).

kohlschuetter added a commit that referenced this issue Nov 9, 2023
Clients may iterate upon AFSelector.selectedKeys() while the selected
keys are modified in another thread.

Change the underlying datastructure to be thread-safe (use
ConcurrentHashMap), and add all "ready" selectors even if they're marked
invalid (they will be removed later).

We reimplement the fix as the original patch caused a FileDescriptor
leak and TIPC test failures.

#142
#145
@kohlschuetter
Copy link
Member

kohlschuetter commented Nov 9, 2023

Before we continue with improvements to AFSelector, please verify that the updated patch (5e8a6e6) resolves all concurrency issues / FileDescriptor leaks for you.

I reverted the original patch attempts because we really introduced the regression in 2.8.2, and even with the latest changes, a TIPC test would fail on Linux.

Please also verify by running the selftest jar on Linux with the TIPC kernel module loaded, and a bearer enabled:

sudo modprobe tipc
sudo tipc bearer enable media eth device eth0
java -ea -jar junixsocket-selftest-2.8.3-SNAPSHOT-jar-with-dependencies.jar 

@ThePumpingLemma
Copy link
Contributor Author

Before we continue with improvements to AFSelector, please verify that the updated patch (5e8a6e6) resolves all concurrency issues / FileDescriptor leaks for you.

It seems to be working okay on macOS from a preliminary test.

However, this patch doesn't implement the locking in the way that the documentation suggests we should:

A selector's selected-key set is not, in general, safe for use by
multiple concurrent threads. If such a thread might modify the set directly
then access should be controlled by synchronizing on the set itself. The
iterators returned by the set's {@link java.util.Set#iterator() iterator}
methods are fail-fast: If the set is modified after the iterator is
created, in any way except by invoking the iterator's own {@link
java.util.Iterator#remove() remove} method, then a {@link
java.util.ConcurrentModificationException} will be thrown.

IMO, I think it would make sense to implement things as similar to Java's standard library as possible, but I'll leave that up to you.

I don't have a Linux test setup handy, maybe @kevink-sq can try to verify the change?

@kohlschuetter
Copy link
Member

Yeah, it's really odd. When you replace that ConcurrentHashMap with HashMap, jetty throws a ConcurrentModificationException, which means that either two threads are accessing the structure concurrently, or modifications happen after the iterator has been created. Maybe there's something else we're missing?

@kohlschuetter
Copy link
Member

So I think it's the .clear() that's throwing off jetty. In the original SelectorImpl, they check if the key already exists (processReadyEvents):

            if (selectedKeys.contains(ski)) {
                if (ski.translateAndUpdateReadyOps(rOps)) {
                    return 1;
                }
            } else {
                ski.translateAndSetReadyOps(rOps);
                if ((ski.nioReadyOps() & ski.nioInterestOps()) != 0) {
                    selectedKeys.add(ski);
                    return 1;
                }
            }

kohlschuetter added a commit that referenced this issue Nov 9, 2023
Since we already have a ConcurrentHashMap, let's make use of the value,
which now indicates the "selected" state.

Introduce MapValueSet, which is a view over elements of the
"keysRegistered" map, and precisely only those elements that have a
certain value.

For each call to select, we increment the expected value, and then set
only the actually selected entries to that value, so we don't have to
clear the entire map.

#145
@kohlschuetter
Copy link
Member

@ThePumpingLemma I was able to remove the second ConcurrentHashMap with 38cda27. Can you please take another look?

kohlschuetter added a commit that referenced this issue Nov 9, 2023
Since we already have a ConcurrentHashMap, let's make use of the value,
which now indicates the "selected" state.

Introduce MapValueSet, which is a view over elements of the
"keysRegistered" map, and precisely only those elements that have a
certain value.

For each call to select, we increment the expected value, and then set
only the actually selected entries to that value, so we don't have to
clear the entire map.

#145
@kohlschuetter
Copy link
Member

(Fixed checkstyle warning)

kohlschuetter added a commit that referenced this issue Nov 9, 2023
Fix errors only occurring with -Drelease

#145
kohlschuetter added a commit that referenced this issue Nov 9, 2023
Fix errors only occurring with -Drelease

#145
@kevink-sq
Copy link
Contributor

@kohlschuetter are the latest changes (including 38cda27) on 2.8.3-SNAPSHOT?

@kohlschuetter
Copy link
Member

@kevink-sq Uploaded just now! 20231109.194633-3
Be sure to run mvn with -U to update snapshots.

@ThePumpingLemma
Copy link
Contributor Author

I reviewed the changes and have some thoughts:

  • I think this is a lot more complicated than synchronizing on this and selectedKeysPublic. My preference would be to keep the locking + data structures simple and try to mirror the reference implementation (i.e. Java standard library) to minimize surprises.
  • MapValueSet could be thread-safe, but I'm not convinced without doing a really thorough analysis (I looked pretty closely at the iterator, but still need to chew on the rest). In contrast, it is really easy to look at synchronized blocks and convince yourself that all of the state mutation is under a lock and that there are no races.
  • MapValueSet is designed to be very generic (e.g. removedSentinel is optional), but I'm not sure it's worth the additional complexity. If you decide to stick with this approach, consider simplifying where possible.

All in all, I think it is possible to prove your approach safe, but I'm not personally comfortable with it.

@kohlschuetter
Copy link
Member

MapValueSet wraps the ConcurrentHashMap, so it is as safe as that, and may throw the ConcurrentModificationException only in certain narrow circumstances with iterator(), which is in line with the AFSelector documentation (selectedKeys() sets don't have to be thread-safe). I actually wanted to build this a long time ago, since having a single map for all keys simplifies things quite a bit, so I'm very comfortable with it.

Once we had the ConcurrentHashMap for registered keys, it was clear that simply ignoring the value was a waste. Adding another ConcurrentHashMap resulted in simpler code but higher memory requirements. I'm happy that we got that out of the way.

@kohlschuetter
Copy link
Member

As a next step, could you take a look at creating some unit tests around the selected keys behavior? Once we have good test coverage, we can make more confident optimizations without introducing further regressions.

Thanks for your help, @ThePumpingLemma!

@kevink-sq
Copy link
Contributor

Reran the above tests against junixsocket-jetty-2.8.3-20231109.194633-3.jar artifact and the concurrency issue and leaking file descriptors are still fixed.

@kevink-sq
Copy link
Contributor

I'm also for @ThePumpingLemma's approach to mirroring the reference implementation in the Java standard library. Things like the concurrency issue and the file descriptor leaks were resolved from following their examples and straying from them with novel implementations may have setbacks simply because they haven't been battle tested as much. Unless there are clear benefits to the approach, I'm not sure what is gained.

Furthermore, JEP-380 implements kqueue/epoll under the hood which gives a performance edge over junix. If junix were to follow suit, I think it'd be easier to make the changes if references were taken from the Java standard library beforehand.

@kohlschuetter
Copy link
Member

Once Java supports all sorts of obscure socket types like AF_UNIX, AF_TIPC, AF_VSOCK, etc., there will probably be little need for junixsocket, which would actually be a good thing.

Until then, junixsocket remains a standalone project, under a different license, so copying code from the standard library just isn't an easy option.

Moreover, unlike commercial endeavors like Square/Cash app, this is my personal project. I'm not being paid to develop it, which means I can deliberately spend some extra time on polishing bits that I would otherwise leave as-is.

I hear your concerns. I don't want to sound harsh. However, please accept my explanation, and decision.

In this instance, removing the second Set is, in my view, the right thing to do, as it removes unnecessary allocation operations in a hot path (polling sockets). It also obviously fixes the issue you initially reported, without introducing any obvious regressions (unlike the earlier attempts).

Since I see you care about improving junixsocket, I hope you continue to contribute by filing tickets and PRs. Abstract, high-level critique as in "not battle tested" or "what about kqueue/epoll" is not very constructive for this present ticket. Please be very specific with your concerns, and, ideally, provide solutions for these concerns. As I said, I'm not being paid for this project, so I have to manage my attention wisely.

If you find that junixsocket is slower/worse than some other approach, please file a bug with some data/code, ideally with a PR that you accept may end up not the way it started.

As I explained earlier, the best way to contribute to the very specific situation of AFSelector is to improve the unit test coverage around selectors. When we have good coverage, we can look into taking more inspiration from the standard library if and only if there is a measurable benefit. With good test coverage, the conversation automatically becomes more detail-oriented, more data-informed, and less opinion-based.

You're very welcome to use JEP-380, just as you're welcome to help improve junixsocket. For what it's worth, I'm seeing higher throughput with junixsocket, compared to JEP-380, and I hope you will find too that junixsocket is the better choice, at least for the foreseeable future.

@kohlschuetter kohlschuetter added this to the 2.8.3 milestone Nov 10, 2023
@kohlschuetter kohlschuetter self-assigned this Nov 10, 2023
@kohlschuetter kohlschuetter added bug The issue describes a bug in the code merged A fix has been merged to main labels Nov 10, 2023
@kohlschuetter kohlschuetter reopened this Nov 10, 2023
@kohlschuetter kohlschuetter modified the milestones: 2.8.3, 2.9.0 Nov 10, 2023
kohlschuetter added a commit that referenced this issue Nov 12, 2023
Clients may iterate upon AFSelector.selectedKeys() while the selected
keys are modified in another thread.

Change the underlying datastructure to be thread-safe (use
ConcurrentHashMap), and add all "ready" selectors even if they're marked
invalid (they will be removed later).

We reimplement the fix as the original patch caused a FileDescriptor
leak and TIPC test failures.

#142
#145
kohlschuetter added a commit that referenced this issue Nov 12, 2023
Since we already have a ConcurrentHashMap, let's make use of the value,
which now indicates the "selected" state.

Introduce MapValueSet, which is a view over elements of the
"keysRegistered" map, and precisely only those elements that have a
certain value.

For each call to select, we increment the expected value, and then set
only the actually selected entries to that value, so we don't have to
clear the entire map.

#145
kohlschuetter added a commit that referenced this issue Nov 12, 2023
Fix errors only occurring with -Drelease

#145
@kohlschuetter kohlschuetter modified the milestones: 2.9.0, 2.8.3 Nov 12, 2023
@kohlschuetter
Copy link
Member

2.8.3 has just been released, including this fix. Please verify and re-open this ticket if necessary. Thanks again for reporting and working with me on this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The issue describes a bug in the code merged A fix has been merged to main
Projects
None yet
Development

No branches or pull requests

3 participants