Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set expected tablet count during restore #4275

Open
Michal-Leszczynski opened this issue Feb 27, 2025 · 2 comments
Open

Set expected tablet count during restore #4275

Michal-Leszczynski opened this issue Feb 27, 2025 · 2 comments
Assignees

Comments

@Michal-Leszczynski
Copy link
Collaborator

When restoring tablet keyspaces, SM should:

In general, setting proper expected tablet count before restoring the data improves performance, as we are able to avoid tablet splits and migrations. It also looks like we need to disable tablet load balancing during the restore (see scylladb/scylladb#22707), but then we could end up with a really bad data balance if we don't set the proper expected tablet count first.

Controlling tablet count via keyspace tablet initial option is going to be deprecated, so we shouldn't use it.
Instead, Scylla 2025.1.0 is introducing per table tablet options (see https://opensource.docs.scylladb.com/master/cql/ddl.html#per-table-tablet-options).

SM can control tablet count by either:

  1. expected_data_size_in_gb:
    This option provides a hint for the anticipated table size, before replication. ScyllaDB will generate a tablets topology that matches that expectation (see details below). It can be set when the table is created to allocate more tablets for it, as if it already occupies that size. This will prevent unnecessary tablet splits and tablet migrations during data ingestion. It can also be changed later in the table life cycle to induce tablet splits or merges to fit the new expected size. The minimum tablet count is calculated by dividing the expected data size by the target_tablet_size_in_bytes config option.

There are a few considerations when using expected_data_size_in_gb:

  • it's data size before replication, so we need to divide the whole backup size by RF
  • it's more about utilized disk space, not just the file size, which means that file size should be rounded up to the disk block size

Also, what about space amplification? Backed up data is not repaired, nor compacted, so it might be difficult to reliably estimate the expected_data_size_in_gb by just looking at the backed up sstables and schema.
But perhaps exact estimations are not needed and just using some +5% rule of thumb give results which are good enough.
cc: @bhalevy

  1. min_tablet_count:
    Determines the minimum number of tablets to allocate for the table. The hint is based on the deprecated keyspace initial tablets option. Note that the actual number of tablet replicas that are owned by each shard is a function of the tablet count, the replication factor in the datacenter, and the number of nodes and shards in the datacenter. It is recommended to use higher-level options such as expected_data_size_in_gb or min_per_shard_tablet_count instead.

Another approach could for SM to calculate the min_tablet_count by reading backed up sstables metadata.
@bhalevy could you write down how exactly SM should do it?

One final thing to consider is that perhaps in the future SM would like to back up and restore tablet map, which is currently not possible (no Scylla API for restoring such tablet map), but if it was possible (and safe), perhaps it would be better alternative to setting the tablet count by the estimations mentioned above.

cc: @mykaul @tzach @karol-kokoszka

@mykaul
Copy link
Contributor

mykaul commented Feb 27, 2025

@tgrabiec , @bhalevy - thoughts?

mikliapko added a commit to mikliapko/scylla-cluster-tests that referenced this issue Feb 27, 2025
Temporarily skipping this tests since there is no clear understanding
on how Manager should behave in such situation. See details in (1).

Should be revisited after issue (2) resolution.

refs:
1. scylladb/scylla-manager#4276
2. scylladb/scylla-manager#4275
@tgrabiec
Copy link

Controlling tablet count via keyspace tablet initial option is going to be deprecated, so we shouldn't use it.

It's also insufficient, because different tables in the same keyspace can have completely different tablet counts in the backup.

Also, what about space amplification? Backed up data is not repaired, nor compacted, so it might be difficult to reliably estimate the expected_data_size_in_gb by just looking at the backed up sstables and schema.

load balancer also measures tablet size on uncompacted/unrepaired data, so it's what we want.

After backup is restored, the hints should be dropped so that the tablet count can live on its own. We should restore user-provided tablet hints which were there at the time of backup.

@Michal-Leszczynski Michal-Leszczynski self-assigned this Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants