-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarks feature requests #493
Comments
@AVHopp @fabianliebig can you chime in how we can increase the runtime or possibly achieve the parallelization requested above? @Hrovatin |
|
I dont think thats necessary. From what I understood: if we have all benchmarks implemented we will have two results from:
Those two will be compared in the dashboard. No code adjustment for the benchmarks needed |
Some features add new arguments to code, so the benchmark must be changed. E.g. using botorch kernel factory was not added as default, but similar as one would use EDBO |
well this would result in a complicated way of being able to compare results, why would you prefer that instead of just changing the default + triggering the benchmark action on the feature branch? Then we check the result, and depending on that keep or do not keep the default. Also, even if the benchmark code changes, there is no reason to make copies and maintain the unchanged benchmarks in the same branch as they always have their comparison in the reference branch. I think it makes the third point somewhat obsolete. |
The issue for example arrises where we add many small changes, which would mean that for each we need to create a new branch and set it as default (e.g. StratifiedScaler that can be optionally used for botroch MultiTaskGP). Then branch management gets really hard, as we would in the above example need to create 2 branches with new MultiTaskGP feature, one with and one without StandardScaler. And then I would need to constantly make sure they are synchronised |
it is not intended to check for every small change. Once per PR / feature proposal is fine, eg once when the potential prior change is fully implemented Im not entirely sure, but I think you can also compare them based on commits, so even if you wanted two snapshots from the same branch that should be no problem |
Hi @Hrovatin, many thanks for those ideas. Sorry for my late reply. I have to confess (even though we talked already) that I'm not sure if I understand the full load-bearing range of your requirements. My thought on your points are as follows:
Sorry for the long command. Please let me know if I miss something based on you requirements. We may also talk about your workflow at some point, as I have the impression that more local functionalities for the benchmarking module would also help :) |
I was curious and wanted to test what happens if a jobs exceed 24H. Well, the container just kept running. So I would guess it will work as long as the GITHUB_TOKEN is not used. |
Hi everyone, as one of the requested changes from #493, this PR sets the GitHub-side runtime limit for the container to 24 hours. This refers to the GITHUB_TOKEN generated per job, which expires after that time period. I've also tested longer runtimes and found that we should also be safe to go up to 35 days if necessary. Further changes regarding parallelization may follow once discussed.
I've added a PR for basic logging of benchmark information to the INFO channel, including the runtime. Just for the records: Please take the runtime with a grain of salt, as it may highly vary. |
…499) This PR adds two lines for logging which benchmark is started and how long it took with regards to #493. I think the name and random seed is sufficient, but feel free to suggest the more or less information. An exemplary log looks like this: @Hrovatin Kindly ask you to comment if that fits your needs. Thanks :)
I think we can close this issue (at the very latest after #491 merged)
|
Hi everyone, this PR adds benchmark parallelization as requested in #493 by using a matrix in the workflow. The container still needs to be deployed separately, so two matrixes are added. To start only a subset, the `__main__.py` was altered to take a CMD line argument so that existing domains don't have to be changed. The selection of different benchmarks was a bit tricky, so the separation of groups may be removed if it overcomplicates stuff.  I also noticed that the average CPU utilization was at about 48% for the 15 H benchmarks which ran recently:  So I changed CPUs requested compute from 16 vCPUs to 8.
Created issue to keep track of things I would like to see for benchmarks. May add more topics in the future as needed.
E.g. general benchmark defining data domain for TL and this is then imported into a benchmark on new feature branch.
@AdrianSosic @Scienfitz @AVHopp @fabianliebig
The text was updated successfully, but these errors were encountered: