Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data provider are too slow #305

Closed
nosrio opened this issue Aug 20, 2024 · 8 comments
Closed

Data provider are too slow #305

nosrio opened this issue Aug 20, 2024 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@nosrio
Copy link

nosrio commented Aug 20, 2024

Q A
OS Ubuntu 20.04.4 LTS
Shell bash
bashunit version 0.14.0

Summary

Using data provider to interate on long list decrease performance of tests.

Current behavior

If I use this code:

function test_for_each(){
    local elements=$(seq 1 1 100)
    local regex='^[[:digit:]]+$'

    for element in ${elements[@]}
    do
        assert_matches "${regex}" "${element}"
    done
}

The output is

bashunit - 0.14.0
Running tests/parallel_test.sh
✓ Passed: For each

Tests:      1 passed, 1 total
Assertions: 100 passed, 100 total

 All tests passed 
Time taken: 149 ms

However if I change the code and use data provider:

function provider_elements(){
    local elements=$(seq 1 1 100)
    echo ${elements[@]}
}

# data_provider provider_elements
function test_for_each(){
    local regex='^[[:digit:]]+$'
    local element=${1}

    assert_matches "${regex}" "${element}"
}

The output is:

bashunit - 0.14.0
Running tests/parallel_test.sh
✓ Passed: For each (1)
✓ Passed: For each (2)
✓ Passed: For each (3)
.....
✓ Passed: For each (100)

Tests:      100 passed, 100 total
Assertions: 100 passed, 100 total

 All tests passed 
Time taken: 3706 ms

This is 24x time slower and it get worse when N is greater.

How to reproduce

See code above.
I run strace on both examples, maybe this would help:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 77,65    0,080395         176       455       216 wait4
  9,47    0,009803          41       239           clone
  2,11    0,002184           1      2110           rt_sigprocmask
  1,70    0,001757           2       771           read
  1,41    0,001458           1      1428        46 close
  1,06    0,001101           0      3152           rt_sigaction
  0,99    0,001021           1       921           mmap
  0,94    0,000973          24        40           execve
  0,91    0,000945           4       216           pipe
  0,74    0,000771           1       482       138 openat
  0,66    0,000688           0      1347      1048 stat
  0,49    0,000505           1       257           mprotect
  0,31    0,000319           1       216           rt_sigreturn
  0,24    0,000245           1       180           brk
  0,22    0,000223           0       377           fstat
  0,16    0,000163           3        48           munmap
  0,12    0,000120           0       288           pread64
  0,11    0,000118           0       369           write
  0,09    0,000098           0       129        24 lseek
  0,09    0,000093           0       190        49 access
  0,06    0,000059           0        99         3 fcntl
  0,05    0,000056           0       172           geteuid
  0,05    0,000050           0        64        51 ioctl
  0,05    0,000047           0       171           getgid
  0,04    0,000045           0       171           getuid
  0,04    0,000043           0        80        40 arch_prctl
  0,04    0,000039           0       171           getegid
  0,04    0,000038           2        17        16 statfs
  0,02    0,000024           1        24           set_tid_address
  0,02    0,000024           0        33           prlimit64
  0,02    0,000021           1        18           futex
  0,02    0,000020           0       270           dup2
  0,02    0,000016           0        24           set_robust_list
  0,01    0,000011           0       248           getpid
  0,01    0,000010           2         4           readlink
  0,01    0,000008           4         2           unlink
  0,01    0,000008           4         2           faccessat
  0,01    0,000007           3         2           uname
  0,01    0,000007           1         4           fchmod
  0,01    0,000007           1         5           sysinfo
  0,01    0,000006           1         5           getppid
  0,00    0,000005           0         9           sigaltstack
  0,00    0,000004           1         4           getpgrp
  0,00    0,000000           0         6           getgroups
  0,00    0,000000           0         2           sched_getaffinity
  0,00    0,000000           0         4         3 fadvise64
------ ----------- ----------- --------- --------- ----------------
100.00    0,103535                 14826      1634 total
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 81,75    1,801993         226      7969      3380 wait4
  5,64    0,124336          27      4589           clone
  1,74    0,038247          22      1720           execve
  1,37    0,030232           0     38956           mmap
  1,22    0,026845           1     18470           read
  1,15    0,025449           0     36517           rt_sigprocmask
  1,08    0,023814           0     71302           rt_sigaction
  1,05    0,023224           0     37209      2418 close
  1,03    0,022801           1     21321      6648 openat
  0,84    0,018625           0     58862     46607 stat
  0,49    0,010896           1     10829           mprotect
  0,48    0,010555           3      3478           pipe
  0,32    0,007109           0     15195           fstat
  0,26    0,005684           0      7396           brk
  0,18    0,003954           0     12342           pread64
  0,17    0,003751           2      1728           munmap
  0,17    0,003695           1      3380           rt_sigreturn
  0,16    0,003603           0      8391      2322 access
  0,14    0,003052           0      3935           write
  0,11    0,002387           0      4682       815 lseek
  0,09    0,002030           0      3965       102 fcntl
  0,08    0,001734           1      1203      1202 statfs
  0,06    0,001372           0      2638      2031 ioctl
  0,05    0,001132           0      7288           geteuid
  0,05    0,001122           0      3440      1720 arch_prctl
  0,05    0,001120           0      7287           getuid
  0,05    0,001116           0      7287           getgid
  0,05    0,001108           0      7287           getegid
  0,03    0,000572           0      5710           dup2
  0,02    0,000523           0       807           futex
  0,02    0,000476           0      1011           set_tid_address
  0,02    0,000466           0      1020           prlimit64
  0,02    0,000416           2       202           readlink
  0,02    0,000411           0      1011           set_robust_list
  0,00    0,000101           0       202           fchmod
  0,00    0,000089           0       101           unlink
  0,00    0,000069           0      4697           getpid
  0,00    0,000062           0       103         3 fadvise64
  0,00    0,000009           1         6           getgroups
  0,00    0,000007           1         7           sigaltstack
  0,00    0,000005           1         5           getppid
  0,00    0,000005           1         4           getpgrp
  0,00    0,000000           0         2           uname
  0,00    0,000000           0         5           sysinfo
  0,00    0,000000           0         2           sched_getaffinity
  0,00    0,000000           0         2           faccessat
------ ----------- ----------- --------- --------- ----------------
100.00    2,204197                423563     67248 total

Expected behavior

I think that using data provider is a cleaner way to code tests however using them on long elements list is nearly unusuable for unit testing.

I thinkt that running tests in parallel could help but it will not be the final solution.

@nosrio nosrio added the bug Something isn't working label Aug 20, 2024
@Chemaclass Chemaclass added enhancement New feature or request and removed bug Something isn't working labels Sep 1, 2024
@Chemaclass
Copy link
Member

Thanks, @nosrio! I would consider this a feature rather than a bug as it's not "broken", but slow. Feel free to try to suggest an improvement (PRs welcome!), otherwise we are going to work on it as we find some spare time 😄

@staabm
Copy link
Contributor

staabm commented Sep 21, 2024

I am wondering why the dataprovider statistics contain so much more invocations for this calls:

  1,05    0,023224           0     37209      2418 close
  1,03    0,022801           1     21321      6648 openat
  0,84    0,018625           0     58862     46607 stat

I guess the reason for beeing slow are some file operations happening and the test is getting slow because it needs to wait for such IO (the slowest call is wait4).

@Chemaclass where in the bashunit codebase are the dataproviders implemented?

@Chemaclass
Copy link
Member

@staabm data providers are implemented here: https://github.com/TypedDevs/bashunit/blob/main/src%2Frunner.sh#L68

I think the issue is that we are loading and reading the test file to load the provider data function here: https://github.com/TypedDevs/bashunit/blob/main/src/helpers.sh#L124

Do you have an idea of what other implementations we could use to avoid&optimize this? 🤔

@staabm
Copy link
Contributor

staabm commented Sep 21, 2024

I am not entirely sure whether I am reading the implementation correctly, but it looks like we are running grep over the source file for every test over and over again.

maybe we can collect all dataproviders in a single go before the loop, instead of reading it 1-by-1 when executing the test?

@Chemaclass Chemaclass mentioned this issue Sep 21, 2024
2 tasks
@Chemaclass
Copy link
Member

Chemaclass commented Sep 21, 2024

@staabm I thought that at the beginning, but debugging you can see that I am asining only data_provider_function(therefore using that grep to the script file) only once 🤔

Screenshot 2024-09-21 at 12 31 52

This is what I am using in my local/example_test.sh for the local testing:

#!/bin/bash

TOTAL_NUMBERS=10
# To debug the internal bash behaviour:
# ./bashunit local/example_test.sh --debug local/debug.sh

function test_for_each(){
    local elements=$(seq 1 1 $TOTAL_NUMBERS)
    local regex='^[[:digit:]]+$'

    for element in ${elements[@]}; do
        assert_matches "${regex}" "${element}"
    done
}

# data_provider provide_elements
function test_for_each_slow(){
    local regex='^[[:digit:]]+$'
    local element=${1}

    assert_matches "${regex}" "${element}"
}

function provide_elements(){
    local elements=$(seq 1 1 $TOTAL_NUMBERS)
    echo "${elements[@]}"
}

@Chemaclass
Copy link
Member

One idea worth speeding up the current implementation is to avoid doing the grep at all (to look for the function name). Instead, use the same test name by convention but with the prefix provider_* instead of test_*. Eg

function test_something() {
   local arg1=$1
   local arg2=$2
  # ...
}

function provider_something() {
  echo "first-1st-arg" "first-2nd-arg"
  echo "second-1st-arg" "second-2nd-arg"
}

@staabm
Copy link
Contributor

staabm commented Sep 21, 2024

You could as a POC just hardcode the dataprovider name (instead of the grep) and run the benchmark to compare how much time the grep alone is taking and whether its worh building a alternative for it

@Chemaclass Chemaclass self-assigned this Sep 21, 2024
@Chemaclass Chemaclass moved this to In Progress in bashunit Sep 21, 2024
@Chemaclass
Copy link
Member

I don't see much difference. It is not worth it (at least for now, although the change is relatively easy to implement).

After much thought, @nosrio @staabm, I concluded that I don't know how to improve the data providers, and they should be used with small data. If you need to handle more data (and performance is an issue), then you should consider doing a loop inside one unique test.

This is because spawning each test costs around ~20ms, and the data provider creates a new test for each data provided line—rather than a loop inside the same test.

Screenshot 2024-09-21 at 20 32 13

With this benchmark in mind, I don't see a solution for this problem. Do you have any other ideas? Otherwise, I would suggest documenting what we learned in the docs and closing the issue until we discover something more.

@github-project-automation github-project-automation bot moved this from In Progress to Done in bashunit Oct 4, 2024
@Chemaclass Chemaclass moved this from Done to Wont do in bashunit Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Wont do
Development

No branches or pull requests

3 participants