-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-14484 cart: Implement per-context inflight queue #13202
Conversation
Required-githooks: true Signed-off-by: Alexander A Oganezov <[email protected]>
- Implement RPC inflight quota Required-githooks: true Signed-off-by: Alexander A Oganezov <[email protected]>
Bug-tracker data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Unit Test on EL 8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13202/1/testReport/ |
Required-githooks: true Signed-off-by: Alexander A Oganezov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13202/2/execution/node/1121/log |
Required-githooks: true Signed-off-by: Alexander A Oganezov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
src/include/cart/api.h
Outdated
* failure. | ||
*/ | ||
int crt_context_quotas_finalize(crt_context_t crt_ctx); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are init / finalize public APIs ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still going back and forth whether we want to have a runtime control to disable (and subsequently re-enable) quotas. I started with that idea, which is why these are public, but for now auto-enabling it on every context
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13202/3/execution/node/1266/log |
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13202/3/execution/node/1404/log |
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13202/3/execution/node/1358/log |
- add comment to not implemented quotas - set default to 32 inflight Required-githooks: true Signed-off-by: Alexander A Oganezov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13202/4/execution/node/1266/log |
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13202/4/execution/node/1404/log |
its quota reservation for any rpc in the list. Required-githooks: true Signed-off-by: Alexander A Oganezov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13202/5/execution/node/1266/log |
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13202/5/execution/node/1404/log |
when quota limit is reached. Required-githooks: true Signed-off-by: Alexander A Oganezov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13202/6/execution/node/1399/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
@@ -139,6 +139,12 @@ This file lists the environment variables used in CaRT. | |||
It its value exceed 256, then will use 256 for flow control. | |||
Set it to zero means disable the flow control in cart. | |||
|
|||
. D_QUOTA_RPCS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should just call it D_RPC_MAX_IN_FLIGHT
even if it's a little longer ? :) as I feel D_QUOTA_RPCS
might be too generic and introduce a different nomenclature with D_QUOTA
(although I know that's what you're trying to do here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I am open to naming, but i dont like D_RPC_MAX_IN_FLIGHT as its long and harder to remember.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am okay with D_QUOTA_RPCS.
* failure. | ||
*/ | ||
int crt_context_quota_limit_get(crt_context_t crt_ctx, crt_quota_type_t quota, int *value); | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about it I'm not certain we should introduce those new APIs unless we already have a specific use for them. Right now I don't think we really have one yet ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I am mixed on this. I was thinking that one use-case we have is self-test (or future perf tools?) to be able to adjust quotas on a fly, based cmd line args and not have to set env.
src/cart/crt_init.c
Outdated
@@ -257,6 +258,8 @@ prov_data_init(struct crt_prov_gdata *prov_data, crt_provider_t provider, | |||
return DER_SUCCESS; | |||
} | |||
|
|||
#define CRT_QUOTA_RPCS_DEFAULT 64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that default should be set in the middle of nowhere but maybe in some place where other defaults are set ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not in a middle of nowhere, but sure:) It was declared outside of the function that used it. i ll find a better place:)
/** Total count of supported quotas */ | ||
CRT_QUOTA_COUNT, | ||
} crt_quota_type_t; | ||
|
||
/** @} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above, maybe we should not have all this quota API and keep it simple for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah if we decide we dont need public apis, this will move to internal header
@@ -11,6 +11,8 @@ | |||
#include "crt_internal.h" | |||
|
|||
static void crt_epi_destroy(struct crt_ep_inflight *epi); | |||
static int context_quotas_init(crt_context_t crt_ctx); | |||
static int context_quotas_finalize(crt_context_t crt_ctx); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here, unless we have more, maybe that's not necessary to have quotas init and finalize (is finalize really needed also?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have quota mutex to destroy in finalize. Possibly for other quotas we might want to clean any lists, but not needed for rpc list as untrack logic of each rpc will take care of it. But we might need for allocation queues if we ever add those.
@@ -1264,6 +1286,10 @@ crt_context_req_track(struct crt_rpc_priv *rpc_priv) | |||
/* reference taken by d_hash_rec_find or "epi->epi_ref = 1" above */ | |||
D_MUTEX_LOCK(&crt_ctx->cc_mutex); | |||
d_hash_rec_decref(&crt_ctx->cc_epi_table, &epi->epi_link); | |||
|
|||
if (quota_rc == -DER_QUOTA_LIMIT) | |||
d_list_add_tail(&rpc_priv->crp_waitq_link, &crt_ctx->cc_quotas.rpc_waitq); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this not within the block at line 1254 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this not within the block at line 1254 ?
So that it would not be done with epi->epi_mutex lock, but instead needs context lock.
We can reorganize things nicer once we can get rid of EP credits-related code, which should simplify this and few other calls greatly.
src/cart/crt_context.c
Outdated
int rc; | ||
|
||
if (rpc == NULL) | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't that be an assert ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can just remove it now or change to assert yes. its a left-over from a previous behavior
src/cart/crt_context.c
Outdated
if (tmp_rpc != NULL) | ||
dispatch_rpc(tmp_rpc); | ||
else | ||
crt_context_put_quota_resource(rpc_priv->crp_pub.cr_ctx, CRT_QUOTA_RPCS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand why crt_context_put_quota_resource
needs to be invoked all the time ? which acquires/releases another lock ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once the rpc is done, you either process the next rpc (reusing the existing quota) or you put the quota back if there is nothing else queued
Test stage Functional Hardware Medium Verbs Provider completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13202/12/testReport/ |
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13202/12/execution/node/1552/log |
- move default to a diff file Required-githooks: true Signed-off-by: Alexander A Oganezov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
- Add per-context quotas - Implement RPC inflight quota Signed-off-by: Alexander A Oganezov <[email protected]>
Required-githooks: true Signed-off-by: Alexander A Oganezov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
src/cart/crt_context.c
Outdated
if (ctx->cc_quotas.current[quota] < ctx->cc_quotas.limit[quota]) | ||
ctx->cc_quotas.current[quota]++; | ||
else { | ||
D_WARN("Quota limit reached for quota_type=%d\n", quota); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is overly chatty and should be a debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB: "overly chatty" == gigabytes of client logs under load. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
- crt_req_set/get quota resource shortened and static inline now - changed warning to debug message when exceeding quotas Required-githooks: true Signed-off-by: Alexander A Oganezov <[email protected]>
Required-githooks: true Signed-off-by: Alexander A Oganezov <[email protected]>
@@ -139,6 +139,12 @@ This file lists the environment variables used in CaRT. | |||
It its value exceed 256, then will use 256 for flow control. | |||
Set it to zero means disable the flow control in cart. | |||
|
|||
. D_QUOTA_RPCS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am okay with D_QUOTA_RPCS.
Backport of in-flight upstream PR #13202 - Add per-context quotas - Implement RPC inflight quota Required-githooks: true Change-Id: I3fdf77082d66d8009ee7099cc838fcff5da72d4b Signed-off-by: Alexander A Oganezov <[email protected]> Signed-off-by: Jeff Olivier <[email protected]>
Backport of in-flight upstream PR #13202 - Add per-context quotas - Implement RPC inflight quota Required-githooks: true Change-Id: I3fdf77082d66d8009ee7099cc838fcff5da72d4b Signed-off-by: Alexander A Oganezov <[email protected]> Signed-off-by: Jeff Olivier <[email protected]>
- D_QUOTA_RPCS envariable added. When set, limits the number of RPCs on a wire being sent out by the process. - RPCs that exceed quota limit (if set), will now be queued by the sender - Quota support code added to handle and track resources Required-githooks: true Signed-off-by: Alexander A Oganezov <[email protected]>
- D_QUOTA_RPCS envariable added. When set, limits the number of RPCs on a wire being sent out by the process. - RPCs that exceed quota limit (if set), will now be queued by the sender - Quota support code added to handle and track resources Signed-off-by: Alexander A Oganezov <[email protected]>
- D_QUOTA_RPCS envariable added. When set, limits the number of RPCs on a wire being sent out by the process. - RPCs that exceed quota limit (if set), will now be queued by the sender - Quota support code added to handle and track resources Required-githooks: true Signed-off-by: Alexander A Oganezov <[email protected]>
- D_QUOTA_RPCS envariable added. When set, limits the number of RPCs on a wire being sent out by the process. - RPCs that exceed quota limit (if set), will now be queued by the sender - Quota support code added to handle and track resources Required-githooks: true Signed-off-by: Alexander A Oganezov <[email protected]>
Signed-off-by: Alexander Oganezov [email protected]
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: