-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-15655 control: fix support for non default system name #14170
Conversation
Features: control Required-githooks: true Signed-off-by: Mohamad Chaarawi <[email protected]>
Ticket title is 'Using a different system name (other than daos_server) does not work.' |
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14170/1/execution/node/1198/log |
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14170/2/execution/node/701/log |
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14170/2/execution/node/747/log |
src/include/daos/pool.h
Outdated
@@ -78,6 +78,9 @@ | |||
*/ | |||
#define DAOS_POOL_GLOBAL_VERSION 3 | |||
|
|||
/** Default system name used in srv_pool.c */ | |||
extern char daos_sysname[]; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since daos_sysname
is defined in engine
, its declaration should be in daos_engine.h instead, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im not sure that it matters? we only use it in the the server client pool operations, but i will move it, which would require though including daos_engine.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually pool/srv_cli.c already includes that, so that actually works better. thanks ill update.
@@ -46,7 +44,7 @@ static char modules[MAX_MODULE_OPTIONS + 1]; | |||
static unsigned int nr_threads; | |||
|
|||
/** DAOS system name (corresponds to crt group ID) */ | |||
static char *daos_sysname = DAOS_DEFAULT_SYS_NAME; | |||
char daos_sysname[DAOS_SYS_NAME_MAX + 1] = DAOS_DEFAULT_SYS_NAME; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Question] Why is this change to array necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to allow for a sys name that might be larger than DAOS_DEFAULT_SYS_NAME and could be up to DAOS_SYS_NAME_MAX
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
daos_sysname
was a pointer, it could be assigned the address of a string longer than DAOS_DEFAULT_SYS_NAME
, is that not so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, i'm not sure then why i made that change :-)
I'll revert it back on my follow on PR to the attach info optimization.
src/engine/init.c
Outdated
@@ -1032,7 +1030,7 @@ parse(int argc, char **argv) | |||
rc = -DER_INVAL; | |||
break; | |||
} | |||
daos_sysname = optarg; | |||
strcpy(daos_sysname, optarg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Nit?] For this to be safe, we have to assume that optarg
won't be too long. Although that may indeed be true, it's not obvious---we might want to check the length (either by strlen
or strncpy
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but if you look at the check right before this strcpy, we already use strnlen for the check and error if it's larger. so this strcpy is safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update it though to use memcpy to avoid coverity warnings in case it's not smart enough
Required-githooks: true
Required-githooks: true Signed-off-by: Mohamad Chaarawi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a few other areas where changes are needed to fully support a non-default system name:
- System property always displays the default instead of the configured name: https://github.com/daos-stack/daos/blob/master/src/control/lib/daos/system_prop.go#L502
- If the request doesn't supply a system name, the agent uses the built-in default system name to cache system GetAttachInfo responses. This isn't very important -- right now we don't support multi-system access for a single agent. But it should be a fairly simple fix: https://github.com/daos-stack/daos/blob/master/src/control/cmd/daos_agent/infocache.go#L348
I tried:
so it might be taken into account already?
we gave up on the idea of 1 agent supporting multiple systems i believe.. but in this case attach info would return the system name anyway so it should not hit that part of the code that you indicated? |
Oops, yes, I found the place where this is updated. Disregard.
The agent case is looking for a locally-stored value based on the system name requested by the client. If no system name was requested, it tries the built-in default. But yeah, I agree, the single-agent-multi-system case isn't on the table for the foreseeable future, so this isn't important. I'm OK approving as-is. |
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14170/3/execution/node/1405/log |
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14170/3/execution/node/1503/log |
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14170/4/execution/node/1492/log |
@@ -46,7 +44,7 @@ static char modules[MAX_MODULE_OPTIONS + 1]; | |||
static unsigned int nr_threads; | |||
|
|||
/** DAOS system name (corresponds to crt group ID) */ | |||
static char *daos_sysname = DAOS_DEFAULT_SYS_NAME; | |||
char daos_sysname[DAOS_SYS_NAME_MAX + 1] = DAOS_DEFAULT_SYS_NAME; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
daos_sysname
was a pointer, it could be assigned the address of a string longer than DAOS_DEFAULT_SYS_NAME
, is that not so?
https://build.hpdd.intel.com/blue/organizations/jenkins/daos-stack%2Fdaos/detail/PR-14170/4/pipeline/267/ |
Signed-off-by: Mohamad Chaarawi <[email protected]>
…stem (#14318) improve the daos_init() and pool_connect() process to reuse the attach info instead of doing agent drpc upcalls multiple times. Also includes: DAOS-15655 control: fix support for non default system name (#14170) Signed-off-by: Mohamad Chaarawi <[email protected]>
Features: control
Required-githooks: true
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: