Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Node benchmarking utility #6198

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

urtho
Copy link
Contributor

@urtho urtho commented Dec 16, 2024

go-algorand could use a standardized way to compare and benchmark the underlying hardware, ideally with a repeatable workload that closely matches a real scenario.

Users could compare their results online and make sure their hardware's performance is above the median so that network peak performance can grow with the number of new nodes.

The catchpointdump utility is the perfect first candidate for such a utility.

  • It is already in the repo
  • Can simulate a fast catchup procedure closely in a repeatable setting

This patch adds a bench command to the utility by combining both network and file restore scenarios. Download, SQLite loading and Merkle tree build can be benchmarked all in one go.
It reuses some of the dependencies that are already in go.mod to get information about the hardware - at least on the Linux platform.

Results are optionally dumped to a JSON file and ready for submission to some central benchmark repository.

Examples

Simple Network, SSD and CPU test

Known catchpoint label, sourced from a random relay/archiver :

./catchpointdump bench -r 41600000 -n mainnet.algorand.network

# Benchmark report:
# >> stage:network duration_sec:89.1 duration_min:1.5 cpu_sec:101
# >> stage:database duration_sec:648.8 duration_min:10.8 cpu_sec:507
# >> stage:digest duration_sec:385.1 duration_min:6.4 cpu_sec:550

SSD and CPU test with local file

Benchmarking the disk and CPU part only using the already downloaded ledger snapshot:

catchpointdump bench -n mainnet.algorand.network -t mainnet/snap/41600000.tar 

Full benchmark with JSON report and hosted snapshot

A repeatable benchmark with a CloudFlare hosted catchpoint and report dump

./catchpointdump bench -r 41600000 -n mainnet.algorand.network -p snap.nodely.io -j report.json

Report file

Sample report.json:

{
    "report": "a193cbc7-6e6a-732b-93cf-36f0c0589864",
    "stages": [
        {
            "stage": "network",
            "duration_sec": 39,
            "cpu_time_sec": 59
        },
        {
            "stage": "database",
            "duration_sec": 795,
            "cpu_time_sec": 629
        },
        {
            "stage": "digest",
            "duration_sec": 363,
            "cpu_time_sec": 482
        }
    ],
    "host": {
        "cores": 20,
        "log_cores": 20,
        "base_mhz": 2500,
        "max_mhz": 3500,
        "cpu_name": "13th Gen Intel(R) Core(TM) i5-13500",
        "cpu_vendor": "Intel",
        "mem_mb": 64105,
        "os": "linux",
        "uuid": "c3acdb4e-3937-a9a6-2266-d80ce615ef45"
    }
}

File can be uploaded to a 3rd pty benchmark site like:

curl -X POST https://benchmarks.nodely.io/api/report -d @report.json
#{"success":true,"goto":"https://benchmarks.nodely.io/edit/a193cbc7-6e6a-732b-93cf-36f0c0589864"}

Copy link

codecov bot commented Dec 16, 2024

Codecov Report

Attention: Patch coverage is 0% with 163 lines in your changes missing coverage. Please review.

Project coverage is 51.68%. Comparing base (269945c) to head (9d83da6).

Files with missing lines Patch % Lines
cmd/catchpointdump/bench.go 0.00% 93 Missing ⚠️
cmd/catchpointdump/bench_report.go 0.00% 59 Missing ⚠️
util/util.go 0.00% 10 Missing ⚠️
cmd/catchpointdump/commands.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6198      +/-   ##
==========================================
- Coverage   51.78%   51.68%   -0.10%     
==========================================
  Files         644      646       +2     
  Lines       86697    86860     +163     
==========================================
+ Hits        44894    44895       +1     
- Misses      38933    39098     +165     
+ Partials     2870     2867       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@urtho
Copy link
Contributor Author

urtho commented Dec 18, 2024

Submitting a report might be fun :

image

Copy link
Contributor

@algorandskiy algorandskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! I left few comments.
The PR will need an update after #6177 gets merged.

return fmt.Sprintf(">> stage:%s duration_sec:%.1f duration_min:%.1f cpu_sec:%d", bs.stage, bs.duration.Seconds(), bs.duration.Minutes(), bs.cpuTimeNS/1000000000)
}

func maybeGetTotalMemory() uint64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider moving to util/util.go

benchCmd.Flags().IntVarP(&round, "round", "r", 0, "Specify the round number ( i.e. 7700000 )")
benchCmd.Flags().StringVarP(&relayAddress, "relay", "p", "", "Relay address to use ( i.e. r-ru.algorand-mainnet.network:4160 )")
benchCmd.Flags().StringVarP(&catchpointFile, "tar", "t", "", "Specify the catchpoint file (either .tar or .tar.gz) to process")
benchCmd.Flags().StringVarP(&reportJsonPath, "report", "j", "", "Specify the file to save the Json formatted report to")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
benchCmd.Flags().StringVarP(&reportJsonPath, "report", "j", "", "Specify the file to save the Json formatted report to")
benchCmd.Flags().StringVarP(&reportJsonPath, "report", "j", "", "Specify the file to save the JSON formatted report to")

}

func GetCPU() int64 {
usage := new(syscall.Rusage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, move to util

addrs = []string{relayAddress}
} else {
//append relays
dnsaddrs, err := tools.ReadFromSRV(context.Background(), "algobootstrap", "tcp", networkName, "", false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"algobootstrap" probably should not be here since they not obliged to have catchpoints except few most recent ones.

@gmalouf
Copy link
Contributor

gmalouf commented Jan 29, 2025

@urtho if want to refresh this PR from master, now is a good time!

@CASABECI

This comment was marked as duplicate.

@gmalouf
Copy link
Contributor

gmalouf commented Feb 5, 2025

Some issues to be worked out before can move forward:

# github.com/algorand/go-algorand/cmd/catchpointdump
cmd/catchpointdump/bench.go:178:35: assignment mismatch: 4 variables but catchupAccessor.GetVerifyData returns 6 values

@algorandskiy
Copy link
Contributor

@urtho could you remerge/fix the build and go through my comments?

@gmalouf
Copy link
Contributor

gmalouf commented Feb 18, 2025

@urtho remaining failures can be tracked down/addressed by:

  • running check_license.sh locally- that will add the license to your new files (Codegen failure)
  • make lint reveals a number of new items introduced by the changes in this PR. (ReviewDog failure)

@gmalouf gmalouf changed the title Node benchmarking utility Performance: Node benchmarking utility Feb 18, 2025
Comment on lines +17 to +31
// Copyright (C) 2019-2024 Algorand, Inc.
// This file is part of go-algorand
//
// go-algorand is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License as
// published by the Free Software Foundation, either version 3 of the
// License, or (at your option) any later version.
//
// go-algorand is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU Affero General Public License for more details.
//
// You should have received a copy of the GNU Affero General Public License
// along with go-algorand. If not, see <https://www.gnu.org/licenses/>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double copyright, remove

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants