Shard coordinator failing on movement to non-shard and proxy nodes. #3277

marrick66 · 2018-01-18T22:21:45Z

For Akka.Cluster.Sharding 1.3.2-beta54, I've noticed that if a non-shard node or proxy node is selected to become the singleton shard coordinator due to failure, it will not start until that new node is restarted and the coordinator is moved back to a node with a shard region started. The two cases I can reproduce are:

Movement to a node with no shard region or proxy started. This fails with an error of "Error [Cannot find serializer with id [13]", which matches ClusterShardingMessageSerializer in the code. This would make sense, since it's probably not loaded in the first place on that node.
Movement to a node with a shard region proxy started. This shows as the coordinator being moved to the shard, but no persistence recovery is attempted and no new registrations can be completed. No exceptions seem to be thrown, however.

As an example, I have two nodes with shard regions started on them, and a single node that either has a shard proxy or none. If crash the shard nodes and bring them back up, the coordinator is moved to the single node, and the behavior occurs as above.

Is this a valid use case I'm attempting?

Horusiath · 2018-01-19T06:50:37Z

@marrick66 When specifying settings on start of cluster sharding region, you may specify the role, that all nodes containing regions of that type, should have:

var sharding = ClusterSharding.Get(system);
var region = await sharding.StartAsync(
    typeName: nameof(MyActor),
    entityProps: Props.Create<MyActor>(),
    settings: ClusterShardingSettings.Create(system).WithRole(nameof(MyActor) + "-region"),
    messageExtractor: new MessageExtractor());

All nodes with that role are expected to have shard regions started on them.

Regarding shard region proxy, here role also should be provided, but node, which hosts proxy shouldn't contain that role.

marrick66 · 2018-01-19T12:29:32Z

I had roles set for the cluster, but not the shards. Adding them to the shard configuration prevents the non-shard nodes from attempting to become the coordinator. Thanks for the help, I appreciate it.

marrick66 closed this as completed Jan 19, 2018

ondrejpialek mentioned this issue Mar 9, 2018

DData: Cannot create a shard proxy on a cluster node that is not in the same role as the proxied shard entity #3352

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shard coordinator failing on movement to non-shard and proxy nodes. #3277

Shard coordinator failing on movement to non-shard and proxy nodes. #3277

marrick66 commented Jan 18, 2018

Horusiath commented Jan 19, 2018 •

edited

Loading

marrick66 commented Jan 19, 2018 •

edited

Loading

Shard coordinator failing on movement to non-shard and proxy nodes. #3277

Shard coordinator failing on movement to non-shard and proxy nodes. #3277

Comments

marrick66 commented Jan 18, 2018

Horusiath commented Jan 19, 2018 • edited Loading

marrick66 commented Jan 19, 2018 • edited Loading

Horusiath commented Jan 19, 2018 •

edited

Loading

marrick66 commented Jan 19, 2018 •

edited

Loading