Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

telemetry: report heartbeat metrics as JSON numbers #3802

Merged
merged 8 commits into from
Apr 12, 2022

Conversation

cce
Copy link
Contributor

@cce cce commented Mar 18, 2022

Summary

The current "Heartbeat" telemetry event uses JSON strings inside a Metrics object to encode counter values. This PR sends them as JSON numbers instead in a new m object. For backwards compatibility the StringGauge version, version-num, channel, branch, and commit-hash string metadata included in the old Metrics object are preserved.

This PR also fixes a bug where counters with multiple labels were not being reported correctly (only one of the labeled values was being reported, with each iteration over the labels overwriting the reported value for the the base metric name). Label names and values are now added to the reported metric names, sanitized with _ for non-alphanumeric unsupported characters.

Test Plan

Slightly updated old tests. Added a new test to assert the new message JSON serialization created by calling logTelemetry with this event type.

@cce cce self-assigned this Mar 18, 2022
@cce cce requested a review from winder March 19, 2022 15:40
winder
winder previously approved these changes Mar 21, 2022
Copy link
Contributor

@winder winder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me. You force telemetry to start locally with -t 1, that would create the index for devnet so you can verify that the mapping is correct.

Channel string `json:"channel"`
Branch string `json:"branch"`
CommitHash string `json:"commit-hash"`
} `json:"Metrics"` // backwards compatible name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

@codecov-commenter
Copy link

codecov-commenter commented Mar 23, 2022

Codecov Report

Merging #3802 (fb10f39) into master (5a9f94a) will increase coverage by 0.03%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master    #3802      +/-   ##
==========================================
+ Coverage   49.83%   49.86%   +0.03%     
==========================================
  Files         392      392              
  Lines       68719    68707      -12     
==========================================
+ Hits        34243    34260      +17     
+ Misses      30716    30691      -25     
+ Partials     3760     3756       -4     
Impacted Files Coverage Δ
cmd/algod/main.go 0.00% <0.00%> (ø)
util/metrics/counter.go 80.95% <0.00%> (+2.38%) ⬆️
util/metrics/gauge.go 70.66% <0.00%> (+2.66%) ⬆️
util/metrics/registry.go 83.33% <ø> (-16.67%) ⬇️
util/metrics/registryCommon.go 0.00% <0.00%> (ø)
util/metrics/tagcounter.go 50.76% <0.00%> (ø)
catchup/peerSelector.go 98.95% <0.00%> (-1.05%) ⬇️
cmd/tealdbg/debugger.go 71.42% <0.00%> (-0.99%) ⬇️
catchup/service.go 69.38% <0.00%> (ø)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a9f94a...fb10f39. Read the comment docs.

@cce cce requested review from brianolson and algorandskiy March 25, 2022 15:06
Copy link
Contributor

@algorandskiy algorandskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, one question

a.Equal(hb, data[0]["details"])

// assert JSON serialization is backwards compatible
js, err := json.Marshal(data[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just store an old serialized value and compare against new?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't figure out how to write a test that had access to the serialized value that the telemetry system makes ... there is a JSON conversion going on inside the logrus hook in a third party library, before being sent via the telemetry client. So I figured I would just simulate that call to json.Marshal here..

Copy link
Contributor

@brianolson brianolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of good. I think one more bit of comment+test will make it more durable.

@cce cce requested review from winder and brianolson March 28, 2022 14:36
@cce
Copy link
Contributor Author

cce commented Apr 4, 2022

@winder @brianolson @algorandskiy addressed CR feedback and ready for re-review

@cce cce merged commit 0cc8a66 into algorand:master Apr 12, 2022
@cce cce deleted the heartbeat-json-numbers branch April 12, 2022 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants