Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tests][dask] reduce number of collisions tests #4501

Merged
merged 4 commits into from
Aug 9, 2021

Conversation

jmoralez
Copy link
Collaborator

@jmoralez jmoralez commented Aug 4, 2021

The recent change of the search for open ports for the dask interface (#4498) removed the collisions and to make sure we never got any the test ran 1,000 times, which is too much for the QEMU CI job (#4498 (comment)).

This reduces the number of times the collisions are tested to 25 (as suggested in #4498 (comment)).

@jmoralez
Copy link
Collaborator Author

jmoralez commented Aug 4, 2021

Seems like it's still taking too long (53 minutes) logs

2021-08-04T18:30:40.3348177Z ../tests/python_package_test/test_dask.py .............................. [ 14%]
2021-08-04T18:39:07.3817934Z ........................................................................ [ 26%]
2021-08-04T18:54:31.2839865Z ........................................................................ [ 37%]
2021-08-04T19:11:17.5745812Z ......s...............s...............s...............s................. [ 49%]
2021-08-04T19:21:33.5172290Z ..................................s..................................... [ 61%]
2021-08-04T19:23:08.2142881Z s.......................s........                                        [ 66%]
2021-08-04T19:23:08.2245588Z ../tests/python_package_test/test_dual.py s                              [ 66%]

@jmoralez
Copy link
Collaborator Author

jmoralez commented Aug 4, 2021

@jameslamb would you support adding --durations=0 to the pytest call to get the running time of each test? It may be another one that's taking this long.

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Aug 4, 2021

... which is too much for the QEMU CI job

We can set number of runs depending on the architecture tests are run on.

from platform import machine

...

n_runs = 1000 if machine() == 'x86_64' else 25
for _ in range(n_runs):

...

@jmoralez
Copy link
Collaborator Author

jmoralez commented Aug 4, 2021

@StrikerRUS do you think it should still be run that many times in the other jobs? I think I may have got too excited with the "should never collide" idea.

@jameslamb
Copy link
Collaborator

I think I may have got too excited with the "should never collide" idea.

Haha yeah I personally would prefer to just run it, say, 25 times in each job (like you currently have in the PR) and not introduce a difference in different test environments. I think that's more than enough to catch issues, given the current level of activity in this repo and the number of concurrent CI jobs running on each commit.

@jameslamb would you support adding --durations=0 to the pytest call to get the running time of each test

I at least would support you pushing a commit to this PR right now so we can see those results in logs! I'd want to see how verbose that output is to make a decision of whether or not to merge such a change, but it would at least help us to understand where time is being spent.

@jmoralez
Copy link
Collaborator Author

jmoralez commented Aug 5, 2021

Hmm I thought changing

pytest $BUILD_DIRECTORY/tests || exit -1
affected the QEMU job, however it didn't print the times (I can see them in other jobs though).

Does that job run a different call to pytest?

@StrikerRUS
Copy link
Collaborator

OK, please forget my suggestion.

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Aug 6, 2021

@jmoralez

Does that job run a different call to pytest?

QEMU uses the following call of pytest

pytest $BUILD_DIRECTORY/tests || exit -1

We produce wheel artifacts from QEMU job and hence set bdist task.
TASK: bdist

@jmoralez
Copy link
Collaborator Author

jmoralez commented Aug 6, 2021

1.76s call tests/python_package_test/test_dask.py::test_assign_open_ports_to_workers 🤣
logs

@jameslamb
Copy link
Collaborator

Ha ok, thanks for looking into it!

I think that's strong evidence that #4498 was not the cause of timeouts on the QEMU builds. Could you please remove the changes to test.sh? Then I think we should still merge this (reducing number of checks from 1000 to 25) anyway, even if the impact on timing will be small.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the investigation!

I just observed another QEMU timeout on #4504 (https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=10669&view=logs&j=c2f9361f-3c13-57db-3206-cee89820d5e3).

If you have ideas from these timings on how we could reduce the runtime for those jobs without sacrificing too much test coverage, we'd welcome them!

@StrikerRUS StrikerRUS merged commit cfe8eb1 into microsoft:master Aug 9, 2021
@jmoralez jmoralez deleted the fix/collisions-test branch August 9, 2021 17:24
@jmoralez
Copy link
Collaborator Author

jmoralez commented Aug 9, 2021

Some jobs just seem to take longer in all the tests, for example here:

2021-08-06T18:36:11.2971161Z ../tests/python_package_test/test_basic.py ............................. [  5%]
2021-08-06T18:36:14.1860271Z ......................                                                   [  8%]
2021-08-06T18:38:54.7440546Z ../tests/python_package_test/test_consistency.py ......                  [  9%]
2021-08-06T18:42:39.7003057Z ../tests/python_package_test/test_dask.py .............................. [ 14%] 

2 minutes test_basic and 4 minutes test_consistency.
The last timeout:

2021-08-09T02:45:03.2870447Z ../tests/python_package_test/test_basic.py ............................. [  5%]
2021-08-09T02:45:07.6525079Z ......................                                                   [  8%]
2021-08-09T02:59:01.0798201Z ../tests/python_package_test/test_consistency.py ......                  [  9%]
2021-08-09T03:04:47.1295927Z ../tests/python_package_test/test_dask.py .............................. [ 14%]

14 minutes test_basic, 6 minutes test_consistency.

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants