Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test count and smart sharding count dont match #986

Closed
kozaxinan opened this issue Aug 12, 2020 · 19 comments · Fixed by #1059
Closed

Test count and smart sharding count dont match #986

kozaxinan opened this issue Aug 12, 2020 · 19 comments · Fixed by #1059
Assignees

Comments

@kozaxinan
Copy link
Contributor

Describe the bug

A clear and concise description of what the bug is.

Test result shows app module has 349 tests but smart sharding show and calculate with 232.

To Reproduce

Steps to reproduce the behavior:

It happens with out project. i dont have a repo to share.

Expected behavior

A clear and concise description of what you expected to happen.

Test result and smart sharding should have same count.

Details (please complete the following information):

Have you tested on the latest Flank snapshot?

Post the output of flank --version.

Flank 20.08.1
In the repo, there are Ignored, suppressed and parameterized tests.
With no orchectrator

@kozaxinan kozaxinan added the Bug label Aug 12, 2020
@bootstraponline
Copy link
Contributor

A shard can have multiple tests. The only time shard count matches test count is if every test is in exactly one shard.

@kozaxinan
Copy link
Contributor Author

if the printing for smart sharding is for total number of test that will be put in shard, I expect them to be same. In original post, all numbers are for test count not for shard bucket count.

gw runFlank -PdumpShards prints

Smart Flank cache hit: 99% (229 / 232)
  Shard times: 104s, 104s, 104s, 104s, 105s, 105s, 120s, 120s, 120s


 Smart Flank cache hit: 100% (1 / 1)
  Shard times: 1s


 Smart Flank cache hit: 100% (1 / 1)
  Shard times: 34s


 Smart Flank cache hit: 50% (1 / 2)
  Shard times: 13s, 120s


 Smart Flank cache hit: 100% (16 / 16)
  Shard times: 25s

Saved 5 shards to android_shards.json

At the end of flank run, results shows 349 tests for app, and 1, 1, 2 and 16 for additional apks.

Shouldn't smart flank use 349 for calculation of shard bucket?

@kozaxinan
Copy link
Contributor Author

And what is actually bizarre is we should have 542 tests 😄

@bootstraponline
Copy link
Contributor

interesting, yeah there's probably a bug somewhere 🤔 I'll let the team take a deeper look!

@kozaxinan
Copy link
Contributor Author

Here is obfuscated android_shards.json and flank.yml

flank.zip

@pawelpasterz pawelpasterz self-assigned this Aug 13, 2020
@pawelpasterz
Copy link
Contributor

pawelpasterz commented Aug 20, 2020

tl;dr

Flank does not support parameterized tests sharding. Every class with parameterized is considered as one test during shard calculation.


Flank is using DEX parser to decompile apks and gather info about all the tests inside. As for now, Flank is unable to determine how many times a test in parameterized class is invoked. Due to this fact scans apks for any class with an annotation that contains JUnitParamsRunner or Parameterized:

@RunWith(JUnitParamsRunner::class)
...
@RunWith(Parameterized::class)

and consider this class as a single 'test`, which means - 'put all test from this class into this shard'

Example:
@RunWith(JUnitRunner::class)
class A {
    fun test1() {...}
    fun test2() {...}
    fun test3() {...}
}

@RunWith(Parameterized::class)
class B(param1, param2, ...) {
    fun test1() {...}

    companion object {
        @JvmStatic
        @Parameterized.Parameters
        fun data() = listOf(
                arrayOf(...),
                arrayOf(...),
                arrayOf(...)
        )
    }
}

@RunWith(JUnitRunner::class)
class C {
    fun test1() {...}
    fun test2() {...}
    fun test3() {...}
}

@RunWith(JUnitParamsRunner::class)
class D {
    @Parameters(value = [a,b,c,d]
    fun test1() {...}
}

For apk with classes like above Flank will have a list of tests* to shard:
(*of course 'test' from flank's perspective)

  • class A => 3 tests (foo.A#test1, foo.A#test2, foo.A#test3)
  • class B (parameterized) => 1 test (foo.B)
  • class C => 3 test (foo.C#test1, foo.C#test2, foo.C#test3)
  • class D (parameterized) => 1 test (foo.D)

So it gives 3 + 1 + 3 + 1 = 8 and let's say we want to have 2 shards so flank will calculate shards like

  1. shard_1 = [foo.A#test1, foo.A#test2, foo.A#test3, foo.B]
  2. shard_2 = [foo.C#test1, foo.C#test2, foo.C#test3, foo.D]

and will print (same numbers are used to print cache hit)

8 tests / 2 shards

To get outcome Flank uses FTL API which provides results for all the tests, regardless if it's parameterized or not. So flank creates outcome for all the tests (A - 3 tests, B - 3 tests, C - 3 tests, D - 4 tests and that all sums up to 13)

┌─────────┬──────────────────────┬──────────────────────┐
│ OUTCOME │      MATRIX ID       │     TEST DETAILS     │
├─────────┼──────────────────────┼──────────────────────┤
│ success │ matrix-funnyid       │ 13 test cases passed │
└─────────┴──────────────────────┴──────────────────────┘

Conclusions:

  1. add sharding support for parameterized tests (It's a dragon! We could create an issue to make a research at least) And I think it's impossible to do (but I might be wrong)
  2. flank could explicitly inform a user that there are parameterized test detected and test per shard number may not be valid
  3. inform differently?

@bootstraponline @jan-gogo @Sloox @piotradamczyk5 @adamfilipow92
I'd like to get your opinion

@bootstraponline
Copy link
Contributor

  • add sharding support for parameterized tests (it's a dragon! we could create an issue to make a research at least)

I think we already researched this and determined that parameterized tests can only be run by class?

  • flank could explicitly inform a user that there are parameterized test detected and test per shard number may not be valid

It might be more accurate to say classes per shard when detecting parameterized tests?

If there are no parameterized tests then it's true to say we are doing per method sharding. If there are parameterized tests, then we do a hybrid of class sharding + per test sharding.

@pawelpasterz
Copy link
Contributor

pawelpasterz commented Aug 24, 2020

So basically we have 2 cases:

  1. 'normal' tests per shards (no parameterized)
27 tests / 9 shards
  1. with parameterized
12 classes / 9 shards

I am wondering if it won't confuse users. Sometimes there are tests, sometimes classes, I've already received one feedback and proposal if we could implement it simpler:

Smart Flank cache hit: 50% (5 / 10)
  Shard times: 112s, 120s*
...
...
*exact calculation was impossible due to parameterized tests

That would also require to add * in other places, so the output would look like

RunTests

 Smart Flank cache hit: 0% (0 / 9)
  Shard times: 45s, 45s, 45s*


 Smart Flank cache hit: 0% (0 / 9)
  Shard times: 45s, 45s*, 45s


 Smart Flank cache hit: 0% (0 / 9)
  Shard times: 45s*, 45s*, 45s

  Uploading ...

  27 tests* / 9 shards

*exact calculation was impossible due to parameterized tests

The above approach is definitely much easier to implement. It's hard for me to evaluate the value of * approach vs class per shard

I think it will be valuable to get some feedback from the community? Or maybe you have some input

@bootstraponline
Copy link
Contributor

*exact calculation was impossible due to parameterized tests

😂 let's not add this wording to Flank

I suggest we consult with our lead UX designer.

@bootstraponline
Copy link
Contributor

Flank knows exactly how many tests and classes are being sent to firebase test lab, maybe we could do something like this:

50 tests + 12 parameterized classes / 9 shards

@pawelpasterz
Copy link
Contributor

pawelpasterz commented Aug 24, 2020

*exact calculation was impossible due to parameterized tests

😂 let's not add this wording to Flank

I suggest we consult with our lead UX designer.

I am not UX ¯_(ツ)_/¯ that was first that came to my mind :)

@pawelpasterz
Copy link
Contributor

Flank knows exactly how many tests and classes are being sent to firebase test lab, maybe we could do something like this:

50 tests + 12 parameterized classes / 9 shards

I like it

@kozaxinan
Copy link
Contributor Author

In the time of smart flank calculation, if you know there are parameterized tests, they can be excluded in smark flank success.

Instead of 6 normal 1 parameterized test

Smart Flank cache hit: 85% (6 / 7)
  Shard times: 45s, 45s, 45s*

*exact calculation was impossible due to parameterized tests
Smart Flank cache hit: 100% (6 / 6) and 1 parameterized test class.
  Shard times: 45s, 45s, 45s*

*exact calculation was impossible due to parameterized tests

@bootstraponline
Copy link
Contributor

A shard may contain both individual test methods and/or test classes. We should be able to make smart flank aware of classes and cache the times.

@adamfilipow92 adamfilipow92 self-assigned this Aug 26, 2020
@pawelpasterz
Copy link
Contributor

pawelpasterz commented Aug 27, 2020

Screenshot 2020-08-27 at 06 51 04
Screenshot 2020-08-27 at 07 20 21

How about the case when either class count or test count equals 0? Do we want to print it that way:

0 tests + 12 parameterized classes / 9 shards
50 tests + 0 parameterized classes / 9 shards

or should we skip it? Or skip it only for the class count?

@pawelpasterz
Copy link
Contributor

pawelpasterz commented Aug 27, 2020

A shard may contain both individual test methods and/or test classes. We should be able to make smart flank aware of classes and cache the times.

👍 , I was wondering if w should add the default time for class. For a test default time is 120s. I assume that a parameterized class has at least two sets of parameters, therefore default time for class can be 2x test time = 240s
WDYT?

@bootstraponline
Copy link
Contributor

default time for class can be 2x test time = 240s

default time for classes makes sense to me, as I expect this might be different compared to individual test methods.

0 tests + 12 parameterized classes / 9 shards
50 tests + 0 parameterized classes / 9 shards

I think if there's a 0 we should skip it.

12 parameterized classes / 9 shards
50 tests / 9 shards

@adamfilipow92
Copy link
Contributor

adamfilipow92 commented Aug 27, 2020

We have a parameter --default-test-time so in a case where we have set this param I see three potential solutions

  1. Use value from --default-test-time
  2. Multiply value from --default-test-time by 2
  3. Add command to set default test time for parametrized test

WDYT?
@bootstraponline @pawelpasterz

@bootstraponline
Copy link
Contributor

We have a parameter --default-test-time so in a case where we have set this param I see three potential solutions

We probably want a --default-class-time, which by default can equal the default test time. Then users can independently set default times for parameterized classes vs individual test methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants