Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-15655 control: fix support for non default system name #14170

Merged
merged 4 commits into from
Apr 22, 2024

Conversation

mchaarawi
Copy link
Contributor

Features: control

Required-githooks: true

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

Features: control

Required-githooks: true

Signed-off-by: Mohamad Chaarawi <[email protected]>
Copy link

github-actions bot commented Apr 16, 2024

Ticket title is 'Using a different system name (other than daos_server) does not work.'
Status is 'Open'
Labels: 'triaged'
https://daosio.atlassian.net/browse/DAOS-15655

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14170/1/execution/node/1198/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14170/2/execution/node/701/log

@mchaarawi mchaarawi marked this pull request as ready for review April 18, 2024 15:11
@mchaarawi mchaarawi requested review from a team as code owners April 18, 2024 15:11
@mchaarawi mchaarawi requested review from liw, tanabarr and kjacque April 18, 2024 15:11
mjmac
mjmac previously approved these changes Apr 18, 2024
tanabarr
tanabarr previously approved these changes Apr 18, 2024
@mchaarawi mchaarawi requested a review from kccain April 18, 2024 15:48
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14170/2/execution/node/747/log

@mchaarawi
Copy link
Contributor Author

mchaarawi commented Apr 18, 2024

i would still like a review from a @daos-stack/metadata-owners before requesting this to land. @liw or @kccain please :-)

@@ -78,6 +78,9 @@
*/
#define DAOS_POOL_GLOBAL_VERSION 3

/** Default system name used in srv_pool.c */
extern char daos_sysname[];

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since daos_sysname is defined in engine, its declaration should be in daos_engine.h instead, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not sure that it matters? we only use it in the the server client pool operations, but i will move it, which would require though including daos_engine.h

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually pool/srv_cli.c already includes that, so that actually works better. thanks ill update.

@@ -46,7 +44,7 @@ static char modules[MAX_MODULE_OPTIONS + 1];
static unsigned int nr_threads;

/** DAOS system name (corresponds to crt group ID) */
static char *daos_sysname = DAOS_DEFAULT_SYS_NAME;
char daos_sysname[DAOS_SYS_NAME_MAX + 1] = DAOS_DEFAULT_SYS_NAME;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question] Why is this change to array necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to allow for a sys name that might be larger than DAOS_DEFAULT_SYS_NAME and could be up to DAOS_SYS_NAME_MAX

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

daos_sysname was a pointer, it could be assigned the address of a string longer than DAOS_DEFAULT_SYS_NAME, is that not so?

Copy link
Contributor Author

@mchaarawi mchaarawi Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, i'm not sure then why i made that change :-)
I'll revert it back on my follow on PR to the attach info optimization.

@@ -1032,7 +1030,7 @@ parse(int argc, char **argv)
rc = -DER_INVAL;
break;
}
daos_sysname = optarg;
strcpy(daos_sysname, optarg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit?] For this to be safe, we have to assume that optarg won't be too long. Although that may indeed be true, it's not obvious---we might want to check the length (either by strlen or strncpy).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but if you look at the check right before this strcpy, we already use strnlen for the check and error if it's larger. so this strcpy is safe.

Copy link
Contributor Author

@mchaarawi mchaarawi Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update it though to use memcpy to avoid coverity warnings in case it's not smart enough

Required-githooks: true

Signed-off-by: Mohamad Chaarawi <[email protected]>
@mchaarawi mchaarawi dismissed stale reviews from tanabarr and mjmac via 785702a April 19, 2024 17:05
Copy link
Contributor

@kjacque kjacque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a few other areas where changes are needed to fully support a non-default system name:

@mchaarawi
Copy link
Contributor Author

I found a few other areas where changes are needed to fully support a non-default system name:

I tried:

dmg system get-prop -o=/home/mschaara/config/daos.yml
Name                                             Value   
----                                             -----   
Checksum scrubbing threshold (pool_scrub_thresh) 0       
DAOS system name (daos_system)                   my_srv  
DAOS version (daos_version)                      2.5.101 
Checksum scrubbing mode (pool_scrub_mode)        off 

so it might be taken into account already?

we gave up on the idea of 1 agent supporting multiple systems i believe.. but in this case attach info would return the system name anyway so it should not hit that part of the code that you indicated?

@mchaarawi mchaarawi requested review from liw, tanabarr, mjmac and kjacque April 19, 2024 20:40
@kjacque
Copy link
Contributor

kjacque commented Apr 19, 2024

I tried:

dmg system get-prop -o=/home/mschaara/config/daos.yml
Name                                             Value   
----                                             -----   
Checksum scrubbing threshold (pool_scrub_thresh) 0       
DAOS system name (daos_system)                   my_srv  
DAOS version (daos_version)                      2.5.101 
Checksum scrubbing mode (pool_scrub_mode)        off 

so it might be taken into account already?

Oops, yes, I found the place where this is updated. Disregard.

we gave up on the idea of 1 agent supporting multiple systems i believe.. but in this case attach info would return the system name anyway so it should not hit that part of the code that you indicated?

The agent case is looking for a locally-stored value based on the system name requested by the client. If no system name was requested, it tries the built-in default. But yeah, I agree, the single-agent-multi-system case isn't on the table for the foreseeable future, so this isn't important. I'm OK approving as-is.

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14170/3/execution/node/1405/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14170/3/execution/node/1503/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14170/4/execution/node/1492/log

@@ -46,7 +44,7 @@ static char modules[MAX_MODULE_OPTIONS + 1];
static unsigned int nr_threads;

/** DAOS system name (corresponds to crt group ID) */
static char *daos_sysname = DAOS_DEFAULT_SYS_NAME;
char daos_sysname[DAOS_SYS_NAME_MAX + 1] = DAOS_DEFAULT_SYS_NAME;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

daos_sysname was a pointer, it could be assigned the address of a string longer than DAOS_DEFAULT_SYS_NAME, is that not so?

@mchaarawi mchaarawi requested a review from a team April 22, 2024 16:06
@mchaarawi
Copy link
Contributor Author

https://build.hpdd.intel.com/blue/organizations/jenkins/daos-stack%2Fdaos/detail/PR-14170/4/pipeline/267/
was the last test result before accidently restarting.
the failures there are known: DAOS-15431. DAOS-15662

@mchaarawi mchaarawi merged commit 5bf0d4f into master Apr 22, 2024
49 of 51 checks passed
@mchaarawi mchaarawi deleted the mschaara/15655 branch April 22, 2024 16:12
mjmac pushed a commit that referenced this pull request May 6, 2024
mjmac added a commit that referenced this pull request May 6, 2024
…stem (#14318)

improve the daos_init() and pool_connect() process to reuse the attach info
instead of doing agent drpc upcalls multiple times.

Also includes: DAOS-15655 control: fix support for non default system name (#14170)

Signed-off-by: Mohamad Chaarawi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

7 participants