-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-14598 object: correct epoch for parity migration #13453
Conversation
Bug-tracker data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
13bb0ae
to
1fd3216
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may test with "Test-tag: aggregation"
another point is ds_cont_child_reset_ec_agg_eph_all can reset to sc_ec_agg_eph_boundary rather 0
ok |
Use stable epoch for partial parity update to make sure these partial updates are not below stable epoch boundary, otherwise both EC and VOS aggregation might operate on the same recxs at the same time, which can corrupt the data during rebuild. During EC aggregation, it should consider the un-aggregate epoch on non-leader parity as well, otherwise if the leader parity failed, which will be excluded from global EC stable epoch calculation immediately, then before the leader parity is being rebuilt, the global stable epoch might pass the un-aggregated epoch on the failed target, then these partial update on the data shard might be aggregated before EC aggregation, which might cause data corruption. And also it should choose a less fseq shard among all parity shards as the aggregate leader, in case the last parity can not be rebuilt in time. Required-githooks: true Test-tag:aggregation pr Signed-off-by: Di Wang <[email protected]>
69b427d
to
c640a78
Compare
Required-githooks: true Signed-off-by: Di Wang <[email protected]>
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13453/5/execution/node/1318/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, other than the spelling errors that need to be corrected.
src/object/srv_ec_aggregate.c
Outdated
@@ -123,13 +125,13 @@ struct ec_agg_param { | |||
struct ec_agg_entry ap_agg_entry; /* entry used for each OID */ | |||
daos_epoch_range_t ap_epr; /* hi/lo extent threshold */ | |||
daos_epoch_t ap_filter_eph; /* Aggregatable filter epoch */ | |||
daos_epoch_t ap_min_unagg_eph; /* minum unaggregate epoch */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to fix this, otherwise, it will show on every PR
src/object/srv_ec_aggregate.c
Outdated
} | ||
} | ||
|
||
/* No parity shard is avaible */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
correct word spelling Test-tag: pr aggregation Required-githooks: true Signed-off-by: Di Wang <[email protected]>
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13453/6/execution/node/1364/log |
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13453/6/execution/node/1504/log |
Increase rebuild ec timeout and fix test number. Required-githooks: true Signed-off-by: Di Wang <[email protected]>
FYI, |
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13453/7/execution/node/1318/log |
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13453/7/execution/node/1410/log |
failure due to DAOS-14845 |
Use stable epoch for partial parity update to make sure these partial updates are not below stable epoch boundary, otherwise both EC and VOS aggregation might operate on the same recxs at the same time, which can corrupt the data during rebuild. During EC aggregation, it should consider the un-aggregate epoch on non-leader parity as well, otherwise if the leader parity failed, which will be excluded from global EC stable epoch calculation immediately, then before the leader parity is being rebuilt, the global stable epoch might pass the un-aggregated epoch on the failed target, then these partial update on the data shard might be aggregated before EC aggregation, which might cause data corruption. And also it should choose a less fseq shard among all parity shards as the aggregate leader, in case the last parity can not be rebuilt in time. Signed-off-by: Di Wang <[email protected]>
Use stable epoch for partial parity update to make sure these partial updates are not below stable epoch boundary, otherwise both EC and VOS aggregation might operate on the same recxs at the same time, which can corrupt the data during rebuild. During EC aggregation, it should consider the un-aggregate epoch on non-leader parity as well, otherwise if the leader parity failed, which will be excluded from global EC stable epoch calculation immediately, then before the leader parity is being rebuilt, the global stable epoch might pass the un-aggregated epoch on the failed target, then these partial update on the data shard might be aggregated before EC aggregation, which might cause data corruption. And also it should choose a less fseq shard among all parity shards as the aggregate leader, in case the last parity can not be rebuilt in time. Signed-off-by: Di Wang <[email protected]>
Use stable epoch for partial parity update to make sure these partial updates are not below stable epoch boundary, otherwise both EC and VOS aggregation might operate on the same recxs at the same time, which can corrupt the data during rebuild. During EC aggregation, it should consider the un-aggregate epoch on non-leader parity as well, otherwise if the leader parity failed, which will be excluded from global EC stable epoch calculation immediately, then before the leader parity is being rebuilt, the global stable epoch might pass the un-aggregated epoch on the failed target, then these partial update on the data shard might be aggregated before EC aggregation, which might cause data corruption. And also it should choose a less fseq shard among all parity shards as the aggregate leader, in case the last parity can not be rebuilt in time. Signed-off-by: Di Wang <[email protected]>
Use stable epoch for partial parity update to make sure these partial updates are not below stable epoch boundary, otherwise both EC and VOS aggregation might operate on the same recxs at the same time, which can corrupt the data during rebuild. During EC aggregation, it should consider the un-aggregate epoch on non-leader parity as well, otherwise if the leader parity failed, which will be excluded from global EC stable epoch calculation immediately, then before the leader parity is being rebuilt, the global stable epoch might pass the un-aggregated epoch on the failed target, then these partial update on the data shard might be aggregated before EC aggregation, which might cause data corruption. And also it should choose a less fseq shard among all parity shards as the aggregate leader, in case the last parity can not be rebuilt in time. Signed-off-by: Di Wang <[email protected]>
Use stable epoch for partial parity update to make sure these partial updates are not below stable epoch boundary, otherwise both EC and VOS aggregation might operate on the same recxs at the same time, which can corrupt the data during rebuild.
Required-githooks: true
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: