[V0][Metrics] Deprecate some KV/prefix cache metrics #14136
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
vllm:num_requests_swapped
,vllm:cpu_cache_usage_perc
andvllm:cpu_prefix_cache_hit_rate
will no longer be relevant in V1 since we no longer implement KV cache offloading. So these metrics should be considered deprecated.And as agreed in #12592, we have added
prefix_cache_queries
andprefix_cache_hits counters
to replace theprefix_cache_hit_rate
gauge as it allows the interval over which the hit rate is calculated to be controlled in a Prometheus query like:In theory, we could ease the transition be implementing the old hit rate metric in V1 and the new queries/hits metrics in V0, but it's probably not worthwhile unless we learn the hit rate metric is heavily used by V0 users.