-
-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data provider are too slow #305
Comments
Thanks, @nosrio! I would consider this a |
I am wondering why the dataprovider statistics contain so much more invocations for this calls:
I guess the reason for beeing slow are some file operations happening and the test is getting slow because it needs to wait for such IO (the slowest call is @Chemaclass where in the bashunit codebase are the dataproviders implemented? |
@staabm data providers are implemented here: https://github.com/TypedDevs/bashunit/blob/main/src%2Frunner.sh#L68 I think the issue is that we are loading and reading the test file to load the provider data function here: https://github.com/TypedDevs/bashunit/blob/main/src/helpers.sh#L124 Do you have an idea of what other implementations we could use to avoid&optimize this? 🤔 |
I am not entirely sure whether I am reading the implementation correctly, but it looks like we are running grep over the source file for every test over and over again. maybe we can collect all dataproviders in a single go before the loop, instead of reading it 1-by-1 when executing the test? |
@staabm I thought that at the beginning, but debugging you can see that I am asining only data_provider_function(therefore using that grep to the script file) only once 🤔 ![]() This is what I am using in my #!/bin/bash
TOTAL_NUMBERS=10
# To debug the internal bash behaviour:
# ./bashunit local/example_test.sh --debug local/debug.sh
function test_for_each(){
local elements=$(seq 1 1 $TOTAL_NUMBERS)
local regex='^[[:digit:]]+$'
for element in ${elements[@]}; do
assert_matches "${regex}" "${element}"
done
}
# data_provider provide_elements
function test_for_each_slow(){
local regex='^[[:digit:]]+$'
local element=${1}
assert_matches "${regex}" "${element}"
}
function provide_elements(){
local elements=$(seq 1 1 $TOTAL_NUMBERS)
echo "${elements[@]}"
} |
One idea worth speeding up the current implementation is to avoid doing the grep at all (to look for the function name). Instead, use the same test name by convention but with the prefix function test_something() {
local arg1=$1
local arg2=$2
# ...
}
function provider_something() {
echo "first-1st-arg" "first-2nd-arg"
echo "second-1st-arg" "second-2nd-arg"
} |
You could as a POC just hardcode the dataprovider name (instead of the grep) and run the benchmark to compare how much time the grep alone is taking and whether its worh building a alternative for it |
I don't see much difference. It is not worth it (at least for now, although the change is relatively easy to implement). After much thought, @nosrio @staabm, I concluded that I don't know how to improve the data providers, and they should be used with small data. If you need to handle more data (and performance is an issue), then you should consider doing a loop inside one unique test. This is because spawning each test costs around ~20ms, and the data provider creates a new test for each data provided line—rather than a loop inside the same test. With this benchmark in mind, I don't see a solution for this problem. Do you have any other ideas? Otherwise, I would suggest documenting what we learned in the docs and closing the issue until we discover something more. |
Summary
Using data provider to interate on long list decrease performance of tests.
Current behavior
If I use this code:
The output is
However if I change the code and use data provider:
The output is:
This is 24x time slower and it get worse when N is greater.
How to reproduce
See code above.
I run strace on both examples, maybe this would help:
Expected behavior
I think that using data provider is a cleaner way to code tests however using them on long elements list is nearly unusuable for unit testing.
I thinkt that running tests in parallel could help but it will not be the final solution.
The text was updated successfully, but these errors were encountered: