You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In general, setting proper expected tablet count before restoring the data improves performance, as we are able to avoid tablet splits and migrations. It also looks like we need to disable tablet load balancing during the restore (see scylladb/scylladb#22707), but then we could end up with a really bad data balance if we don't set the proper expected tablet count first.
expected_data_size_in_gb:
This option provides a hint for the anticipated table size, before replication. ScyllaDB will generate a tablets topology that matches that expectation (see details below). It can be set when the table is created to allocate more tablets for it, as if it already occupies that size. This will prevent unnecessary tablet splits and tablet migrations during data ingestion. It can also be changed later in the table life cycle to induce tablet splits or merges to fit the new expected size. The minimum tablet count is calculated by dividing the expected data size by the target_tablet_size_in_bytes config option.
There are a few considerations when using expected_data_size_in_gb:
it's data size before replication, so we need to divide the whole backup size by RF
it's more about utilized disk space, not just the file size, which means that file size should be rounded up to the disk block size
Also, what about space amplification? Backed up data is not repaired, nor compacted, so it might be difficult to reliably estimate the expected_data_size_in_gb by just looking at the backed up sstables and schema.
But perhaps exact estimations are not needed and just using some +5% rule of thumb give results which are good enough.
cc: @bhalevy
min_tablet_count:
Determines the minimum number of tablets to allocate for the table. The hint is based on the deprecated keyspace initial tablets option. Note that the actual number of tablet replicas that are owned by each shard is a function of the tablet count, the replication factor in the datacenter, and the number of nodes and shards in the datacenter. It is recommended to use higher-level options such as expected_data_size_in_gb or min_per_shard_tablet_count instead.
Another approach could for SM to calculate the min_tablet_count by reading backed up sstables metadata. @bhalevy could you write down how exactly SM should do it?
One final thing to consider is that perhaps in the future SM would like to back up and restore tablet map, which is currently not possible (no Scylla API for restoring such tablet map), but if it was possible (and safe), perhaps it would be better alternative to setting the tablet count by the estimations mentioned above.
Temporarily skipping this tests since there is no clear understanding
on how Manager should behave in such situation. See details in (1).
Should be revisited after issue (2) resolution.
refs:
1. scylladb/scylla-manager#4276
2. scylladb/scylla-manager#4275
Controlling tablet count via keyspace tablet initial option is going to be deprecated, so we shouldn't use it.
It's also insufficient, because different tables in the same keyspace can have completely different tablet counts in the backup.
Also, what about space amplification? Backed up data is not repaired, nor compacted, so it might be difficult to reliably estimate the expected_data_size_in_gb by just looking at the backed up sstables and schema.
load balancer also measures tablet size on uncompacted/unrepaired data, so it's what we want.
After backup is restored, the hints should be dropped so that the tablet count can live on its own. We should restore user-provided tablet hints which were there at the time of backup.
When restoring tablet keyspaces, SM should:
In general, setting proper expected tablet count before restoring the data improves performance, as we are able to avoid tablet splits and migrations. It also looks like we need to disable tablet load balancing during the restore (see scylladb/scylladb#22707), but then we could end up with a really bad data balance if we don't set the proper expected tablet count first.
Controlling tablet count via keyspace tablet
initial
option is going to be deprecated, so we shouldn't use it.Instead, Scylla 2025.1.0 is introducing per table tablet options (see https://opensource.docs.scylladb.com/master/cql/ddl.html#per-table-tablet-options).
SM can control tablet count by either:
expected_data_size_in_gb
:This option provides a hint for the anticipated table size, before replication. ScyllaDB will generate a tablets topology that matches that expectation (see details below). It can be set when the table is created to allocate more tablets for it, as if it already occupies that size. This will prevent unnecessary tablet splits and tablet migrations during data ingestion. It can also be changed later in the table life cycle to induce tablet splits or merges to fit the new expected size. The minimum tablet count is calculated by dividing the expected data size by the target_tablet_size_in_bytes config option.
There are a few considerations when using
expected_data_size_in_gb
:Also, what about space amplification? Backed up data is not repaired, nor compacted, so it might be difficult to reliably estimate the
expected_data_size_in_gb
by just looking at the backed up sstables and schema.But perhaps exact estimations are not needed and just using some +5% rule of thumb give results which are good enough.
cc: @bhalevy
min_tablet_count
:Determines the minimum number of tablets to allocate for the table. The hint is based on the deprecated keyspace initial tablets option. Note that the actual number of tablet replicas that are owned by each shard is a function of the tablet count, the replication factor in the datacenter, and the number of nodes and shards in the datacenter. It is recommended to use higher-level options such as expected_data_size_in_gb or min_per_shard_tablet_count instead.
Another approach could for SM to calculate the
min_tablet_count
by reading backed up sstables metadata.@bhalevy could you write down how exactly SM should do it?
One final thing to consider is that perhaps in the future SM would like to back up and restore tablet map, which is currently not possible (no Scylla API for restoring such tablet map), but if it was possible (and safe), perhaps it would be better alternative to setting the tablet count by the estimations mentioned above.
cc: @mykaul @tzach @karol-kokoszka
The text was updated successfully, but these errors were encountered: