-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add error checking for the case where MetricFamily is deleted before … #254
Conversation
…deleting Metric. Update documentation.
@@ -1326,7 +1326,10 @@ Starting from 23.05, you can utlize Custom Metrics API to register and collect | |||
custom metrics in the `initialize`, `execute`, and `finalize` functions of your | |||
Python model. The Custom Metrics API is the Python equivalent of the | |||
[TRITON C API custom metrics](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/metrics.md#custom-metrics) | |||
support. | |||
support. You will need to take the ownership of the custom metrics created | |||
through the APIs and must manage their lifetime. Note that a `MetricFamily` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look too deeply at the code yet, but a high level if MetricFamily
is being maintained as a shared_ptr
or equivalent in PB, would it make sense to hold a reference to the family inside the Metric
object on creation, to basically remove the possibility of MetricFamily
being deleted before its metrics on the PB side of things? Let me know if that makes sense.
This is just a passing idea without thinking too much about it, so it might not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a good idea, although users who wish to remove a custom metric from the server metrics will need to clean up not only the MetricFamily
but also all associated Metric
objects to remove all the references of MetricFamily
.(Right now users can just delete the MetricFamily
and the custom metric will be removed from the server metrics)
However, I think it would be beneficial to provide an easier way to manage the lifetime of these objects. We can consider documenting that in order to remove a custom metric from the server, users should delete all related objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although users who wish to remove a custom metric from the server metrics will need to clean up not only the MetricFamily but also all associated Metric objects to remove all the references of MetricFamily.
I think this is an acceptable tradeoff to improve the usability of it, i.e. not having to worry about returning both the metric_family and metric objects from a helper function if you only need the metric object around for your processing and don't explicitly reference the family object ever again afterwards - as it happened in this issue.
I think users that need the metric_family object for other things (create more metrics, delete the family explicitly, etc.) would inherently keep it around and not mind deleting the metric objects as well. However, I don't expect many users to even manually call del metric/family
- I moreso expect users to just let vars fall out of scope like in most python scripts. We don't have any requirement that users call del
on anything today, do we? I assume it is used only if they explicitly wanted to clean something up early.
Either way - I think it's fine to document the behavior for now and we can think on this a bit based on more feedback out in the wild. This PR is adding similar behavior to what's on core side ATM.
CC @Tabrizian if you have any thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I've filed a ticket for the improvement. We can address it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to hold a reference to the family inside the Metric object on creation
I like this idea and I think it is more intuitive. On the other hand, I wonder how the users would handle metrics that may be used across the models. It seems like in the current APIs, the metric would be deleted on unloading of the model so it wouldn't be possible to keep a metric alive beyond the scope of a given model lifecycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmccorm4 please correct me if I'm wrong. I believe that if two models both have a reference to the same metrics, and one of the models is unloaded, it should not affect the reference of the other model. This behavior should be consistent for both Python custom metrics and the Triton Metrics C API."
…deleting Metric. Update documentation.
When the
MetricFamily
is deleted before the correspondingMetric
objects are deleted, although theTRITONSERVER_MetricFamily
andTRITONSERVER_Metric
will be cleaned up properly in the Python backend side, users will still be able to use theMetric
object for metric operation in the model. This PR added a check and throw an exception when having this kind of user error.Added testing: triton-inference-server/server#5915
Resolves: triton-inference-server/server#5901