Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mainnet node connections are breaking - make gRPC endpoint connection configurable #833

Open
iamjpotts opened this issue Sep 3, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@iamjpotts
Copy link
Contributor

Problem

We are seeing a significant number of TCP and HTTP connection related errors with the gRPC endpoints, such as:

GrpcStatus(Status { code: Unavailable, message: "error trying to connect: tcp connect error: Broken pipe (os error 32)", source: Some(tonic::transport::Error(Transport, hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 32, kind: BrokenPipe, message: "Broken pipe" })))) })

GrpcStatus(Status { code: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", source: Some(tonic::transport::Error(Transport, hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })))) })

GrpcStatus(Status { code: Unavailable, message: "error trying to connect: tcp connect error: deadline has elapsed", source: Some(tonic::transport::Error(Transport, hyper::Error(Connect, ConnectError("tcp connect error", Custom { kind: TimedOut, error: Elapsed(()) })))) })

GrpcStatus(Status { code: Internal, message: "protocol error: received message with invalid compression flag: 60 (valid flags are 0 and 1) while receiving response with status: 503 Service Unavailable", metadata: MetadataMap { headers: {"content-length": "107", "cache-control": "no-cache", "content-type": "text/html"} }, source: None })

In most cases, backing off and retrying after a short delay of 1-2 seconds results in a successful request.

Some, but not all, could potentially be mitigated or avoided by adjustments to timeouts and keep alives, or in the alternative, adjustments to those attributes may empirically show they do not resolve the issue and stability efforts should be put elsewhere.

Solution

In the NodeConnection channel setup, several values are hard coded:

  • Connect timeout duration
  • Keep alive timeout duration
  • Keep alive while idle flag
  • TCP keep alive duration

https://github.com/hashgraph/hedera-sdk-rust/blob/40cc835628347772ca1cc71b9d860e2a4c3b1214/src/client/network/mod.rs#L594-L597

We would like to make them configurable at the time a new Client is created.

Alternatives

  • Forking the repository and altering the hard-coded values
  • Forking the repository and adding endpoint configuration to ClientBuilder, and making ClientBuilder public
@iamjpotts
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant