Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[V0][Metrics] Deprecate some KV/prefix cache metrics
vllm:cpu_cache_usage_perc and vllm:cpu_prefix_cache_hit_rate will no longer be relevant in V1 since we no longer implement KV cache offloading. So these metrics should be considered deprecated. And as agreed in #12592, we have added prefix_cache_queries and prefix_cache_hits counters to replace the prefix_cache_hit_rate gauge as it allows the interval over which the hit rate is calculated to be controlled in a Prometheus query like: ``` rate(prefix_cache_queries[5m]) / rate(prefix_cache_hits[5m]) ``` In theory, we could ease the transition be implementing the old hit rate metric in V1 and the new queries/hits metrics in V0, but it's probably not worthwhile unless we learn the hit rate metric is heavily used by V0 users. Signed-off-by: Mark McLoughlin <[email protected]>
- Loading branch information