Skip to content

Commit

Permalink
Set TCP_KEEPIDLE=60 if possible
Browse files Browse the repository at this point in the history
Cap'n Proto connections tend to be long lived and we therefore turn on
the `SO_KEEPALIVE` option. However, the default keepalive timeout of 2
hours is much too long for some networks. In particular, Docker's
libnetwork silently drops idle connections after about 10 minutes.

The OCaml standard library doesn't provide a way to control the timeout,
but extunix does.
  • Loading branch information
talex5 committed Nov 2, 2020
1 parent 7a04a67 commit 9612f12
Show file tree
Hide file tree
Showing 4 changed files with 32 additions and 1 deletion.
21 changes: 21 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ See [LICENSE.md](LICENSE.md) for details.
* [Summary](#summary)
* [Further reading](#further-reading)
* [FAQ](#faq)
* [Why does my connection stop working after 10 minutes?](#why-does-my-connection-stop-working-after-10-minutes)
* [How can I return multiple results?](#how-can-i-return-multiple-results)
* [Can I create multiple instances of an interface dynamically?](#can-i-create-multiple-instances-of-an-interface-dynamically)
* [Can I get debug output?](#can-i-get-debug-output)
Expand Down Expand Up @@ -1006,6 +1007,26 @@ Congratulations! You now know how to:

## FAQ

### Why does my connection stop working after 10 minutes?

Cap'n Proto connections are often idle for long periods of time, and some networks automatically close idle connections.
To avoid this, capnp-rpc-unix sets the `SO_KEEPALIVE` option when connecting to another vat,
so that the initiator of the connection will send a TCP keep-alive message at regular intervals.
However, TCP keep-alives are sent after the connection has been idle for 2 hours by default,
and this isn't frequent enough for e.g. Docker's libnetwork,
which silently breaks idle TCP connections after about 10 minutes.

A typical sequence looks like this:

1. A client connects to a server and configures a notification callback.
2. The connection is idle for 10 minutes. libnetwork removes the connection from its routing table.
3. Later, the server tries to send the notification and discovers that the connection has failed.
4. After 2 hours, the client sends a keep-alive message and it too discovers that the connection has failed.
It establishes a new connection and retries.

On some platforms, capnp-rpc-unix (>= 0.9.0) is able to reduce the timeout to 1 minute by setting the `TCP_KEEPIDLE` socket option.
On other platforms, you may have to configure this setting globally (e.g. with `sudo sysctl net.ipv4.tcp_keepalive_time=60`).

### How can I return multiple results?

Every Cap'n Proto method returns a struct, although the examples in this README only use a single field.
Expand Down
1 change: 1 addition & 0 deletions capnp-rpc-unix.opam
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ depends: [
"astring"
"fmt" {>= "0.8.4"}
"logs"
"extunix"
"base64" {>= "3.0.0"}
"dune" {>= "2.0"}
"alcotest-lwt" {with-test & >= "1.0.1"}
Expand Down
2 changes: 1 addition & 1 deletion unix/dune
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
(name capnp_rpc_unix)
(public_name capnp-rpc-unix)
(libraries lwt.unix astring capnp-rpc-lwt capnp-rpc-net capnp-rpc fmt logs
mirage-crypto-rng.unix cmdliner cstruct-lwt))
mirage-crypto-rng.unix cmdliner cstruct-lwt extunix))
9 changes: 9 additions & 0 deletions unix/network.ml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,14 @@ let addr_of_host host =
else
addr.Unix.h_addr_list.(0)

let have_tcp_keepidle = ExtUnix.All.have_sockopt_int ExtUnix.All.TCP_KEEPIDLE

let try_set_keepalive_idle socket i =
if have_tcp_keepidle then (
let socket = Lwt_unix.unix_file_descr socket in
ExtUnix.All.setsockopt_int socket ExtUnix.All.TCP_KEEPIDLE i
)

let connect_socket = function
| `Unix path ->
Logs.info (fun f -> f "Connecting to %S..." path);
Expand All @@ -81,6 +89,7 @@ let connect_socket = function
Lwt.catch
(fun () ->
Lwt_unix.setsockopt socket Unix.SO_KEEPALIVE true;
try_set_keepalive_idle socket 60;
Lwt_unix.connect socket (Unix.ADDR_INET (addr_of_host host, port)) >|= fun () ->
socket
)
Expand Down

0 comments on commit 9612f12

Please sign in to comment.