Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-10625 control: Create the tool to collect the logs/config for support purpose #11094

Merged
merged 38 commits into from
Jun 14, 2023

Conversation

ravalsam
Copy link
Contributor

@ravalsam ravalsam commented Dec 20, 2022

1> Adding dmg support collectlog option collect the logs/configs from servers.
2> Adding daos_server support collectlog option to get the system and server side information in case dmg Management layer is not working for some reason.
3> Adding daos_admin support collectlog option to get the client side logs and configs.
4> Added unit test for support lib
5> Added functional tests for dmg,daos_agent and daos_server support options

Signed-off-by: Samir Raval [email protected]

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate watchers.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

@ravalsam ravalsam requested a review from a team as a code owner December 20, 2022 21:49
@ravalsam ravalsam requested review from mjmac and removed request for a team December 20, 2022 21:49
@github-actions
Copy link

github-actions bot commented Dec 20, 2022

Bug-tracker data:
Ticket title is 'Provide tool which can collect the logs/configuration/matric from the customer sites for support purpose.'
Status is 'In Review'
Labels: 'sustaining_internal'
Errors are Title of PR is too long
https://daosio.atlassian.net/browse/DAOS-10625

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-11094/1/execution/node/1092/log

…pport purpose.

 *** WORK IN PROGRESS ***

1> Adding dmg support collectlog option collect the logs/configs from servers.
2> Adding daos_server support collectlog option to get the system and server side information incase
   dmg Management layer is not working for some reason.
3> Adding daos_admin support collectlog option to get the client side logs and configs.

 --- To BE DONE ---

1> Unit testing still in progress.
2> All command which collect the system related information still needs to be added.
3> Functional testing to be done with this PR.

Required-githooks: true

Signed-off-by: Samir Raval <[email protected]>
@ravalsam ravalsam force-pushed the samirrav/Support/DAOS-10625-Final branch from 727f0fd to 8353d18 Compare December 21, 2022 16:59
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-11094/3/testReport/(root)/

@ravalsam ravalsam requested review from tanabarr and kjacque January 3, 2023 16:59
Samir Raval added 2 commits January 10, 2023 17:07
Required-githooks: true

Signed-off-by: Samir Raval <[email protected]>
Required-githooks: true
Signed-off-by: Samir Raval <[email protected]>
Copy link
Contributor

@tanabarr tanabarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some initial observations (most points refer to multiple occurrences in the PR):

if b == true can be shortened to if b

initialising to an empty value i.e. LogCollection["..."] = []string{""} it is rarely necessary as zero values are usually good enough un-initialised (one of the useful things of go)

If using string constants as keys, define them e.g. const CopyServerConfigCmd = "CopyServerConfig", not convinced this is the best use of the map though, you might be able to define enum for the keys

progress.Total = progress.Total + 1 can be shortened to progress.Total++

The code for the daos_server and dmg command variants should be consolidated into shared helper functions to reduce duplication.

}
if len(resp.GetHostErrors()) > 0 {
var bld strings.Builder
_ = pretty.PrintResponseErrors(resp, &bld)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errors should always be handled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

// Rsync the logs from servers
hostName, _ := support.GetHostName()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errors should always be handled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Required-githooks: true

Signed-off-by: Samir Raval <[email protected]>
Copy link
Contributor

@kjacque kjacque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of code! Most of my comments are Go language tips and/or naming suggestions, and can be applied in multiple places.

I haven't tried running it so can't speak to any oddities there.

type collectLogCmd struct {
configCmd
cmdutil.LogCmd
Stop bool `short:"s" long:"Stop" description:"Stop the collectlog command on very first error"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion - Maybe "stop-on-error" for the long name? As it is, this option feels a bit confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

configCmd
cmdutil.LogCmd
Stop bool `short:"s" long:"Stop" description:"Stop the collectlog command on very first error"`
TargetFolder string `short:"t" long:"loglocation" description:"Folder location where log is going to be copied"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion - "dest" or "target" for the long name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Stop bool `short:"s" long:"Stop" description:"Stop the collectlog command on very first error"`
TargetFolder string `short:"t" long:"loglocation" description:"Folder location where log is going to be copied"`
Archive bool `short:"z" long:"archive" description:"Archive the log/config files"`
CustomLogs string `short:"c" long:"custom-logs" description:"Collect the Logs from given directory"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I fully understand this option. Is this an extra source dir that the logs will be copied from? "extra-logs-dir" may be a clearer name in that case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

// Default 3 steps of log/conf collection.
progress := support.ProgressBar{1, 3, 0, false}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more readable model with Go is to use the struct member names inline.

Suggested change
progress := support.ProgressBar{1, 3, 0, false}
progress := support.ProgressBar{
Start: 1,
Total: 3,
}

No need to initialize the zeroes. All the members are automatically initialized to 0 unless otherwise specified.

An aside, I am not sure what the 1 means here. Shouldn't we always start at 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and remove the zeros.

Start int // start int number
Total int // end int number
Steps int // Int number be increased per steps
JsonOutput bool // Json option to skip progress bar if it's enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name suggestion - "NoDisplay"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

JsonOutput bool // Json option to skip progress bar if it's enabled
}

type Params struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name suggestion - CollectLogsParams

Considering we may someday add other support commands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

LogCmd string
}

type copy struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy of what? I'd suggest a different name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

// Print the progress bar during log collect command
func PrintProgress(progBar *ProgressBar) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion - could make these member functions of ProgressBar.A function signature like:

func (p *ProgressBar) Display(out io.Writer)

It would be easy enough to decide within the function whether to print progress or end status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

func CollectSupportLog(log logging.Logger, opts ...Params) error {
switch opts[0].LogFunction {
case "CopyServerConfig":
return CopyServerConfig(log, opts...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do all of these functions need to be exported? Starting with a capital letter in Go means the symbol is exported. If they are solely called from this function I'd suggest unexporting them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ravalsam
Copy link
Contributor Author

Thank you and appreciated for the review comments. I will update the PR.

Samir Raval added 6 commits January 12, 2023 23:59
Required-githooks: true

Signed-off-by: Samir Raval <[email protected]>
Required-githooks: true

Signed-off-by: Samir Raval <[email protected]>
Required-githooks: true

Signed-off-by: Samir Raval <[email protected]>
Required-githooks: true

Signed-off-by: Samir Raval <[email protected]>
@ravalsam ravalsam requested a review from a team as a code owner January 20, 2023 04:56
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:avocado: tags=hw,medium
:avocado: tags=basic,control,dmg
:avocado: tags=test_dmg_support_collect_log
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(style) trailing whitespace

@daosbuild1
Copy link
Collaborator

@daosbuild1 daosbuild1 dismissed their stale review January 20, 2023 05:02

Updated patch

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-11094/43/execution/node/1034/log

phender
phender previously approved these changes May 18, 2023
@ravalsam ravalsam requested a review from mjmac May 19, 2023 15:41
Test-tag: pr control

Required-githooks: true
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-11094/45/execution/node/1236/log

@ravalsam ravalsam requested a review from phender May 31, 2023 19:31
mjmac
mjmac previously approved these changes Jun 5, 2023
Copy link
Contributor

@mjmac mjmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is a huge PR, it seems low risk in that the changes are adding new functionality rather than changing existing code. I'm in support of getting it landed so that we can continue to iterate on it.

kjacque
kjacque previously approved these changes Jun 5, 2023
}

// Get the system hostname
func GetHostName() (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not blocking on this, but there is a Go function os.Hostname() that can fetch the hostname without calling out to the shell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review. I will test and updated when will work on second iteration.

If I recall, I was trying to use that os.Hostname() but it was giving the full name including the domain name. So another option is to use the function and remove the domain from it.

@ravalsam ravalsam requested a review from a team June 5, 2023 23:48
Test-tag: pr control

Required-githooks: true

Signed-off-by: Samir Raval <[email protected]>
@ravalsam ravalsam dismissed stale reviews from kjacque and mjmac via d09903b June 8, 2023 15:33
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@ravalsam ravalsam requested review from mjmac and kjacque June 8, 2023 15:36
@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-11094/47/execution/node/1114/log

@ravalsam ravalsam requested a review from a team June 13, 2023 16:08
@mjmac
Copy link
Contributor

mjmac commented Jun 13, 2023

@tanabarr: Has @ravalsam addressed your concerns? If so, please approve so we can get this landed, thanks.

Copy link
Contributor

@tanabarr tanabarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent a while trying to work out whether the changes requested were addressed but tbh it's quite hard to tell, some of them definitely have been so I will approve.

@mjmac mjmac merged commit 21693c0 into master Jun 14, 2023
@mjmac mjmac deleted the samirrav/Support/DAOS-10625-Final branch June 14, 2023 19:53
cdavis28 added a commit that referenced this pull request May 20, 2024
Although this port copies the code from master to 2.4, there
are issues related to permissions making it non-functional.

This is a partial cherry-pick of the following PR's on master:
DAOS-10625 control: Create the tool to collect the logs/config for support purpose (#11094)
DAOS-13759 control: Update support collect-log tool. (#12906)
DAOS-13763 control: Fix daos_metrics collection for support collect-log. (#12555)
DAOS-13936 support: Collect the specific logs and Time range log for support (#13325)

Change-Id: I168c14e177a5003c4e315595b1bf154e84cef473
cdavis28 added a commit that referenced this pull request May 20, 2024
Although this port copies the code from master to 2.4, there
are issues related to permissions making it non-functional.

This is a partial cherry-pick of the following PR's on master:
DAOS-10625 control: Create the tool to collect the logs/config for support purpose (#11094)
DAOS-13759 control: Update support collect-log tool. (#12906)
DAOS-13763 control: Fix daos_metrics collection for support collect-log. (#12555)
DAOS-13936 support: Collect the specific logs and Time range log for support (#13325)

Change-Id: I168c14e177a5003c4e315595b1bf154e84cef473
cdavis28 added a commit that referenced this pull request May 20, 2024
Although this port copies the code from master to 2.4, there
are issues related to permissions making it non-functional.

This is a partial cherry-pick of the following PR's on master:
DAOS-10625 control: Create the tool to collect the logs/config for support purpose (#11094)
DAOS-13759 control: Update support collect-log tool. (#12906)
DAOS-13763 control: Fix daos_metrics collection for support collect-log. (#12555)
DAOS-13936 support: Collect the specific logs and Time range log for support (#13325)

Change-Id: I168c14e177a5003c4e315595b1bf154e84cef473
cdavis28 added a commit that referenced this pull request May 20, 2024
Although this port copies the code from master to 2.4, there
are issues related to permissions making it non-functional.

This is a partial cherry-pick of the following PR's on master:
DAOS-10625 control: Create the tool to collect the logs/config for support purpose (#11094)
DAOS-13759 control: Update support collect-log tool. (#12906)
DAOS-13763 control: Fix daos_metrics collection for support collect-log. (#12555)
DAOS-13936 support: Collect the specific logs and Time range log for support (#13325)

Signed-off-by: Chris Davis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

7 participants