Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report watcher metrics related to the latest revision #1094

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

be-hase
Copy link
Member

@be-hase be-hase commented Feb 6, 2025

close #1092

Additional note

I can't think of a use case for customizing the meter name prefix, so I won't support it at this time.
(In the case of Armeria, there are occasional use cases when using both gRPC and Thrift simultaneously, though...)

Tasks

  • Get feedback
  • Write test

@CLAassistant
Copy link

CLAassistant commented Feb 6, 2025

CLA assistant check
All committers have signed the CLA.

Copy link

codecov bot commented Feb 6, 2025

Codecov Report

Attention: Patch coverage is 96.07843% with 2 lines in your changes missing coverage. Please review.

Project coverage is 70.12%. Comparing base (767eff3) to head (61684c1).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
...ogma/client/armeria/legacy/LegacyCentralDogma.java 33.33% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1094      +/-   ##
============================================
+ Coverage     70.07%   70.12%   +0.05%     
- Complexity     4486     4496      +10     
============================================
  Files           453      453              
  Lines         18161    18212      +51     
  Branches       2008     2015       +7     
============================================
+ Hits          12727    12772      +45     
- Misses         4345     4347       +2     
- Partials       1089     1093       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@be-hase be-hase changed the title (WIP) Report watcher metrics related to the latest revision Report watcher metrics related to the latest revision Feb 6, 2025
@be-hase be-hase marked this pull request as ready for review February 6, 2025 11:38
Comment on lines 61 to 62
private static final String LATEST_REVISION_METER_NAME = "centraldogma.watcher.latest.revision";
private static final String LATEST_COMMIT_TIME_METER_NAME = "centraldogma.watcher.latest.commit.time";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private static final String LATEST_REVISION_METER_NAME = "centraldogma.watcher.latest.revision";
private static final String LATEST_COMMIT_TIME_METER_NAME = "centraldogma.watcher.latest.commit.time";
private static final String LATEST_REVISION_METER_NAME = "centraldogma.client.watcher.latest.revision";
private static final String LATEST_COMMIT_TIME_METER_NAME = "centraldogma.client.watcher.latest.commit.time";

Some people may want to use a different prefix and add custom tags for their metrics.
What do you think of adding a customization point for that?

I realized that the centraldogma-client module has no dependency on Armeria. Actually, we have decided to add Armeria dependency to centraldogma-client but didn't put it in action.

To conclude, let's add Armeria dependency and use MeterIdPrefix, which I proposed in the issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wrote this in the README, but since this is a relatively unique metric, I couldn’t think of many use cases for customization at this point.

I considered using this, but copying and pasting it didn’t seem ideal, so I decided not to for now.
(That said, it's also not a good idea for centraldogma-client to depend on Armeria.)
MeterIdPrefix in Armeria

And if we want to rename or retag metrics, we can transform them using a meter filter.
Micrometer Meter Filters Documentation

...How about?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. How about just adding an infix .client? I want to distinguish it from the server metric. We may add additional metrics under centraldogma.client namespace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I overlooked the client infix remark. You're absolutely right. I'll fix it!

@ikhoon ikhoon added this to the 0.74.0 milestone Feb 7, 2025
@be-hase
Copy link
Member Author

be-hase commented Feb 7, 2025

( Hmm... For some reason, the CI is failing on Windows... Why...?

@@ -124,6 +150,13 @@ void start() {
if (state.compareAndSet(State.INIT, State.STARTED)) {
scheduleWatch(0);
}
if (meterRegistry != null) {
// emit metrics once the values are ready
initialValueFuture.thenAccept(latest -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question) What do you think of just watching this.latest.revision() while setting -1 if not set?

I think the current approach introduces the additional complexity of maintaining an additional state.

e.g. Assuming:

val watcher = dogma.watcher()...
watcher.start()
watcher.close()
// `initialValueFuture` is completed after the close and a gauge is registered

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also struggled with this.

Since I judged that it wouldn't become too complex, I wanted to provide as accurate metrics as possible, which is why I implemented it this way.

However, if the Central Dogma team determines that it would make maintenance more complex, I think it's fine to set it to -1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, Double.NaN could also be used

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you prefer precise metrics, should we use a lock to register metrics, close the watcher and remove metrics?

  • The lock will solve the problem that jrhee17 mentioned.
  • The metric registration could be placed in one method.
ReentrantLock lock = ...

if (initialValueFuture.complete(newLatest)) {
  updateWatchMetrics();
}
void updateWatchMetrics() {
   lock.lock();

   if (isStopped()) {
      return;
   }

   if (this.latest == null) {
     // register two metrics
   }
   // update the last received time

   lock.unlock();
}

void close() {
    lock.lock();

    ....
    lock.unlock();
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...Thinking about this topic again, the -1 seems fine.

if we don't care about -1, we can simply write ... >= 0 in PromQL.

(I prefer -1 because NaN has some quirks in handling.)

"pathPattern", pathPattern,
// There is a possibility of using the same watcher for the same project, repo, and pathPattern.
// Therefore, the watcher’s hash code should be included as a tag.
"watcher_hash", String.valueOf(System.identityHashCode(this))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional) What do you think of assigning an ID instead of a hashcode on the off-chance there is a conflict? It can also help users check how many watchers there are in a JVM easily.

e.g.

private static final AtomicLong WATCHER_ID = new AtomicLong();
...
"watcher_id", String.valueOf(WATCHER_ID.incrementAndGet())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that sounds good too. If others prefer this approach, I’ll consider changing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about allowing users to set the name of a Watcher? If it is absent, we may create a name automatically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think adding a method that allows passing a name would be a good approach.
Reference

...Personally, I feel that cases where one explicitly wants to specify a name (i.e., creating multiple watchers with the same project, repo, and path) are rare.
So, I prefer implementing it with @jrhee17 nim approach first, and if there is demand, we can add support for explicitly specifying a name.

// noinspection ConstantValue
if (latestCommit.getAndSet(commit) == null) {
// emit metrics once the values are ready
meterRegistry.gauge(LATEST_COMMIT_TIME_METER_NAME, tags, latestCommit,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question) I understand the usefulness of last commit time since it may be more convenient to check when a file was last committed as opposed to checking the revision numbers.

Having said this, this commit time may not accurate represent the current watched object since it is done in a separate IO.

What do you think of handling commit time separately by embedding commit time information in the watch response? This will involve server-side changes as well, so it may be better to handle in a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of handling commit time separately by embedding commit time information in the watch response? This will involve server-side changes as well, so it may be better to handle in a separate PR.

Yes, I actually wanted this as well. However, since the API interface on the Central Dogma server side will also change, there seem to be many things that need to be discussed.

Copy link
Contributor

@ikhoon ikhoon Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn’t it more important when the client receives a new revision for a watch?
If you think the commit time recorded in the server is meaningful, I prefer @jrhee17's approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh, now that you mention it, that makes sense.

It seems better to simply record the value at the time the watcher receives it, without calling the get history API.
I'll make that change. It should be much simpler.

e.g. centraldogma.client.watcher.latest.received.time

@be-hase
Copy link
Member Author

be-hase commented Feb 14, 2025

I have fixed it.

@be-hase
Copy link
Member Author

be-hase commented Feb 20, 2025

PTAL


if (latestReceivedTime.getAndSet(Instant.now().getEpochSecond()) == 0) {
// emit metrics once the values are ready
meterRegistry.gauge(LATEST_RECEIVED_TIME_METER_NAME, tags, latestReceivedTime, AtomicLong::get);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question) Will this actually register latestReceivedTime subsequently after the first registration since the Meter.Id is already registered? Would it make more sense to register the gauge once and define the update function to fetch AbstractWatcher#latestReceivedTime instead?

@@ -124,6 +150,13 @@ void start() {
if (state.compareAndSet(State.INIT, State.STARTED)) {
scheduleWatch(0);
}
if (meterRegistry != null) {
// emit metrics once the values are ready
initialValueFuture.thenAccept(latest -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, Double.NaN could also be used

@@ -124,6 +150,13 @@ void start() {
if (state.compareAndSet(State.INIT, State.STARTED)) {
scheduleWatch(0);
}
if (meterRegistry != null) {
// emit metrics once the values are ready
initialValueFuture.thenAccept(latest -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you prefer precise metrics, should we use a lock to register metrics, close the watcher and remove metrics?

  • The lock will solve the problem that jrhee17 mentioned.
  • The metric registration could be placed in one method.
ReentrantLock lock = ...

if (initialValueFuture.complete(newLatest)) {
  updateWatchMetrics();
}
void updateWatchMetrics() {
   lock.lock();

   if (isStopped()) {
      return;
   }

   if (this.latest == null) {
     // register two metrics
   }
   // update the last received time

   lock.unlock();
}

void close() {
    lock.lock();

    ....
    lock.unlock();
}

Copy link
Contributor

@ikhoon ikhoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! 🙏🚀

@@ -247,6 +296,7 @@ private void doWatch(int numAttemptsSoFar) {
logger.debug("watcher noticed updated file {}/{}{}: rev={}",
projectName, repositoryName, pathPattern, newLatest.revision());
notifyListeners(newLatest);
latestReceivedTimeSeconds.set(Instant.now().getEpochSecond());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should we use volatile long instead? For now, we don't get any advantage from AtomicLong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's right.

Comment on lines 162 to 164
watcher -> Optional.ofNullable(watcher.latest)
.map(it -> it.revision().major())
.orElse(-1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Micro optimization) Should we use if ( != null) instead because we prefer to minimize object allocations even though it is trivial?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, fixed

Copy link
Contributor

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. 👍 Left small suggestions. 😉

Comment on lines 162 to 165
if (watcher.latest == null) {
return -1;
} else {
return watcher.latest.revision().major();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
if (watcher.latest == null) {
return -1;
} else {
return watcher.latest.revision().major();
final Latest<T> latest = watcher.latest;
if (latest == null) {
return -1;
} else {
return latest.revision().major();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. In that case, it’s good since it also removes the warning in the IDE.

fixed: 1d7dd3b

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it also doen't access the volatile field twice. 😉

@@ -142,6 +188,12 @@ public void close() {
if (currentWatchFuture != null && !currentWatchFuture.isDone()) {
currentWatchFuture.cancel(false);
}

if (meterRegistry != null) {
meterRegistry.remove(new Id(LATEST_REVISION_METER_NAME, tags, null, null, Type.GAUGE));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's a good idea to remove this gauge since it takes long to remove it when there are a lot of meters in the map.
Anyway, how about creating the gauge in start() method and remove it here?

this.gauge = Gauge.builder(LATEST_REVISION_METER_NAME, this, watcher -> {
    final Latest<T> latest = watcher.latest;
    if (latest == null) {
        return -1;
    } else {
        return latest.revision().major();
    }
}).tags(tags).register(meterRegistry);
...
meterRegistry.remove(gauge);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this gauge since it takes long to remove

The performance regression has been resolved, so removal operations will no longer take a long time. micrometer-metrics/micrometer#5750 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance regression has been resolved, so removal operations will no longer take a long time. micrometer-metrics/micrometer#5750 (comment)

Thanks for the link. I didn't know they fixed it. 😉

it seems that using remove(Meter meter) results in the same behavior.

Yeah, it simply delegates the call to the remove method that takes a meter.
I left this comment while considering a scenario where watcher.close() is called without calling watcher.start().
In that case, it would attempt to remove unregistered meters.
I understand this is unlikely in most cases, but I wanted to clarify it since modifying the code wouldn’t be difficult. 😉

@minwoox minwoox modified the milestones: 0.74.0, 0.75.0 Mar 5, 2025
Copy link
Contributor

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for adding this nice metric, @be-hase! 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A request to expose the latest revision and latest committed at of the watcher as metrics
5 participants