-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Add extra debug logging to enable end-to-end profiling of jobs #29857
Comments
Original comment by @richcollier: thanks for writing this up @droberts195 - I came here to do it but found your entry first! |
After discussing with @sophiec20 we decided that rather than just make this information available via debug messages it would be nice to incorporate it into the output of the get jobs stats and get datafeed stats APIs. In the first instance we will add |
|
Pinging @elastic/ml-core (Team:ML) |
Original comment by @droberts195:
When an ML job is running time can be spent in the following areas:
@richcollier has found that it is extremely hard to pinpoint which of these processing phases is responsible for an ML job running slower than real-time at a customer.
We calculate and store the end-of-bucket processing time in the C++ anomaly detection code, but time spent in other areas is not easy to determine (other than by using a profiler in a development environment).
Such troubleshooting would be greatly helped by the following instrumentation:
Item (3) is that hardest here, as the categorization and data gathering are both done in sequence per input record. Using a millisecond timer to time the categorization part is probably not accurate enough, and using our current nanosecond timer on some platforms (Windows) is quite slow.
But even if just items (1) and (2) are added then it will improve our ability to troubleshoot certain performance problems at customer sites.
The text was updated successfully, but these errors were encountered: