-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error while decoding incoming Akka PDU Exception when communicating between Remote Actors with a large number of messages on Linux #3370
Comments
I have simplified this to calling a remote actor and the problem persists, which means that it is probably not an issue with either Clustering or Sharding, and is a more general problem of networking on Linux. |
I've been able to reproduce this using the sample above, so the original sample fails on my machine too. However, when I ported WebCrawler and ran that in a pure Linux environment this week - had no issues at all. Makes me wonder if the problem might actually be related to the combination of the .NET Core version + Linux. |
@IanPattersonMuo so if I look at the Dockerfiles I'm using for WebCrawler, I'm using an older version of the .NET Core 2.0 runtime: https://github.com/petabridge/akkadotnet-code-samples/blob/master/Cluster.WebCrawler/src/WebCrawler.Web/Dockerfile 2.0 vs 2.0.5. But I don't think that's the issue. I'm wondering though if the issue might stem from the way the Docker images are built in your solution. Judging from your What do you think - is that worth trying to rule out as a possibility? |
I have tested it natively on multiple Linux hosts where it is build and run on each host. This failed as well so I don't think it is to do with the way it is built in docker. I'll look to give it a go with earlier version of the runtime and see if that makes a difference. |
@IanPattersonMuo I take that back - the versioning is a bit misleading around these images. microsoft/aspnetcore:2.0 - this actually is an image that is updated for each minor revision. Meaning that the version itself is mutable. It's not 2.0.0, but rather 2.0.8 or whatever the latest of the 2.0.* branch is. So I'm actually using a newer version of the image. |
I have checked it with the 2.1.0 preview images as well with the same error. I created a branch in the sample code which just uses akka.remoting rather than clustering and sharding and it fails with the same error. I also created a branch that uses Windows containers and it works correctly and it works as expected. |
@Aaronontheweb I checked the issue building the code within the container itself and it fails with the same error. |
@IanPattersonMuo I'm going to bring this up with the DotNetty folks - sure looks like a message framing inconsistency between platforms. Weird part is that I can't recreate this error using WebCrawler on Akka.Remote / Akka.Cluster in .NET Core on Linux, but I can with your application... |
…e Linux by disabling buffer pooling via HOCON
… disabling buffer pooling via HOCON (#3395)
Akka.Net 1.3.5
We get an "Error while decoding incoming Akka PDU" exception when sending a large number of messages from a Shard Entity to another actor in a different process in the same cluster. It manifests when deployed on multiple Linux machines or whilst running in Docker (Linux). The error is seen on the client side of the communication and forces the process to Disassociate from the cluster. The full stack trace from the client process is as follows:
The server also shows a disassociation error
I have created a sample application that replicates the issue.
https://github.com/muo-ltd/Akka-NetCore-DockerClusterWithShards
It creates a cluster with two processes, one acts as a client and the other acts as a server. The server has a Sharded Entity which the client sends a message to requesting information. The Sharded entity then streams a large number of messages back to the client to consume. The messages returned are simple, the payload is string with 1000 characters all 1's, and 10000 messages are generated and returned. Using a lower number of messages (e.g. 1000) does not show this error so it does appear to relate to volume. I have tested it using both the default JSON serialiser and Hyperion. The only difference between the two is that the Hyperion error is slightly different. It contains the error
Full stack trace below
While running this locally on Mac, Windows or Linux it appears to work correctly. If it is deployed to Docker it will fail and if deployed across multiple Linux hosts it will also fail. Deploying to multiple Windows hosts appears to work correctly.
The text was updated successfully, but these errors were encountered: