Test count and smart sharding count dont match #986

kozaxinan · 2020-08-12T10:59:48Z

Describe the bug

A clear and concise description of what the bug is.

Test result shows app module has 349 tests but smart sharding show and calculate with 232.

To Reproduce

Steps to reproduce the behavior:

It happens with out project. i dont have a repo to share.

Expected behavior

A clear and concise description of what you expected to happen.

Test result and smart sharding should have same count.

Details (please complete the following information):

Have you tested on the latest Flank snapshot?

Post the output of flank --version.

Flank 20.08.1
In the repo, there are Ignored, suppressed and parameterized tests.
With no orchectrator

The text was updated successfully, but these errors were encountered:

bootstraponline · 2020-08-12T14:11:47Z

A shard can have multiple tests. The only time shard count matches test count is if every test is in exactly one shard.

kozaxinan · 2020-08-12T14:39:39Z

if the printing for smart sharding is for total number of test that will be put in shard, I expect them to be same. In original post, all numbers are for test count not for shard bucket count.

gw runFlank -PdumpShards prints

Smart Flank cache hit: 99% (229 / 232)
  Shard times: 104s, 104s, 104s, 104s, 105s, 105s, 120s, 120s, 120s


 Smart Flank cache hit: 100% (1 / 1)
  Shard times: 1s


 Smart Flank cache hit: 100% (1 / 1)
  Shard times: 34s


 Smart Flank cache hit: 50% (1 / 2)
  Shard times: 13s, 120s


 Smart Flank cache hit: 100% (16 / 16)
  Shard times: 25s

Saved 5 shards to android_shards.json

At the end of flank run, results shows 349 tests for app, and 1, 1, 2 and 16 for additional apks.

Shouldn't smart flank use 349 for calculation of shard bucket?

kozaxinan · 2020-08-12T15:34:00Z

And what is actually bizarre is we should have 542 tests 😄

bootstraponline · 2020-08-12T15:59:17Z

interesting, yeah there's probably a bug somewhere 🤔 I'll let the team take a deeper look!

kozaxinan · 2020-08-13T07:53:55Z

Here is obfuscated android_shards.json and flank.yml

flank.zip

pawelpasterz · 2020-08-20T19:45:59Z

tl;dr

Flank does not support parameterized tests sharding. Every class with parameterized is considered as one test during shard calculation.

Flank is using DEX parser to decompile apks and gather info about all the tests inside. As for now, Flank is unable to determine how many times a test in parameterized class is invoked. Due to this fact scans apks for any class with an annotation that contains JUnitParamsRunner or Parameterized:

@RunWith(JUnitParamsRunner::class)
...
@RunWith(Parameterized::class)

and consider this class as a single 'test`, which means - 'put all test from this class into this shard'

Example:

@RunWith(JUnitRunner::class)
class A {
    fun test1() {...}
    fun test2() {...}
    fun test3() {...}
}

@RunWith(Parameterized::class)
class B(param1, param2, ...) {
    fun test1() {...}

    companion object {
        @JvmStatic
        @Parameterized.Parameters
        fun data() = listOf(
                arrayOf(...),
                arrayOf(...),
                arrayOf(...)
        )
    }
}

@RunWith(JUnitRunner::class)
class C {
    fun test1() {...}
    fun test2() {...}
    fun test3() {...}
}

@RunWith(JUnitParamsRunner::class)
class D {
    @Parameters(value = [a,b,c,d]
    fun test1() {...}
}

For apk with classes like above Flank will have a list of tests* to shard:
(*of course 'test' from flank's perspective)

class A => 3 tests (foo.A#test1, foo.A#test2, foo.A#test3)
class B (parameterized) => 1 test (foo.B)
class C => 3 test (foo.C#test1, foo.C#test2, foo.C#test3)
class D (parameterized) => 1 test (foo.D)

So it gives 3 + 1 + 3 + 1 = 8 and let's say we want to have 2 shards so flank will calculate shards like

shard_1 = [foo.A#test1, foo.A#test2, foo.A#test3, foo.B]
shard_2 = [foo.C#test1, foo.C#test2, foo.C#test3, foo.D]

and will print (same numbers are used to print cache hit)

8 tests / 2 shards

To get outcome Flank uses FTL API which provides results for all the tests, regardless if it's parameterized or not. So flank creates outcome for all the tests (A - 3 tests, B - 3 tests, C - 3 tests, D - 4 tests and that all sums up to 13)

┌─────────┬──────────────────────┬──────────────────────┐
│ OUTCOME │      MATRIX ID       │     TEST DETAILS     │
├─────────┼──────────────────────┼──────────────────────┤
│ success │ matrix-funnyid       │ 13 test cases passed │
└─────────┴──────────────────────┴──────────────────────┘

Conclusions:

add sharding support for parameterized tests (It's a dragon! We could create an issue to make a research at least) And I think it's impossible to do (but I might be wrong)
flank could explicitly inform a user that there are parameterized test detected and test per shard number may not be valid
inform differently?

@bootstraponline @jan-gogo @Sloox @piotradamczyk5 @adamfilipow92
I'd like to get your opinion

bootstraponline · 2020-08-20T20:40:03Z

add sharding support for parameterized tests (it's a dragon! we could create an issue to make a research at least)

I think we already researched this and determined that parameterized tests can only be run by class?

flank could explicitly inform a user that there are parameterized test detected and test per shard number may not be valid

It might be more accurate to say classes per shard when detecting parameterized tests?

If there are no parameterized tests then it's true to say we are doing per method sharding. If there are parameterized tests, then we do a hybrid of class sharding + per test sharding.

pawelpasterz · 2020-08-24T20:00:37Z

So basically we have 2 cases:

'normal' tests per shards (no parameterized)

27 tests / 9 shards

with parameterized

12 classes / 9 shards

I am wondering if it won't confuse users. Sometimes there are tests, sometimes classes, I've already received one feedback and proposal if we could implement it simpler:

Smart Flank cache hit: 50% (5 / 10)
  Shard times: 112s, 120s*
...
...
*exact calculation was impossible due to parameterized tests

That would also require to add * in other places, so the output would look like

RunTests

 Smart Flank cache hit: 0% (0 / 9)
  Shard times: 45s, 45s, 45s*


 Smart Flank cache hit: 0% (0 / 9)
  Shard times: 45s, 45s*, 45s


 Smart Flank cache hit: 0% (0 / 9)
  Shard times: 45s*, 45s*, 45s

  Uploading ...

  27 tests* / 9 shards

*exact calculation was impossible due to parameterized tests

The above approach is definitely much easier to implement. It's hard for me to evaluate the value of * approach vs class per shard

I think it will be valuable to get some feedback from the community? Or maybe you have some input

bootstraponline · 2020-08-24T20:39:22Z

*exact calculation was impossible due to parameterized tests

😂 let's not add this wording to Flank

I suggest we consult with our lead UX designer.

bootstraponline · 2020-08-24T20:42:36Z

Flank knows exactly how many tests and classes are being sent to firebase test lab, maybe we could do something like this:

50 tests + 12 parameterized classes / 9 shards

pawelpasterz · 2020-08-24T21:02:37Z

*exact calculation was impossible due to parameterized tests

😂 let's not add this wording to Flank

I suggest we consult with our lead UX designer.

I am not UX ¯_(ツ)_/¯ that was first that came to my mind :)

pawelpasterz · 2020-08-24T21:03:06Z

Flank knows exactly how many tests and classes are being sent to firebase test lab, maybe we could do something like this:

50 tests + 12 parameterized classes / 9 shards

I like it

kozaxinan · 2020-08-25T09:06:12Z

In the time of smart flank calculation, if you know there are parameterized tests, they can be excluded in smark flank success.

Instead of 6 normal 1 parameterized test

Smart Flank cache hit: 85% (6 / 7)
  Shard times: 45s, 45s, 45s*

*exact calculation was impossible due to parameterized tests

Smart Flank cache hit: 100% (6 / 6) and 1 parameterized test class.
  Shard times: 45s, 45s, 45s*

*exact calculation was impossible due to parameterized tests

bootstraponline · 2020-08-25T10:54:53Z

A shard may contain both individual test methods and/or test classes. We should be able to make smart flank aware of classes and cache the times.

pawelpasterz · 2020-08-27T04:55:34Z

How about the case when either class count or test count equals 0? Do we want to print it that way:

0 tests + 12 parameterized classes / 9 shards
50 tests + 0 parameterized classes / 9 shards

or should we skip it? Or skip it only for the class count?

pawelpasterz · 2020-08-27T05:02:40Z

A shard may contain both individual test methods and/or test classes. We should be able to make smart flank aware of classes and cache the times.

👍 , I was wondering if w should add the default time for class. For a test default time is 120s. I assume that a parameterized class has at least two sets of parameters, therefore default time for class can be 2x test time = 240s
WDYT?

bootstraponline · 2020-08-27T11:43:44Z

default time for class can be 2x test time = 240s

default time for classes makes sense to me, as I expect this might be different compared to individual test methods.

0 tests + 12 parameterized classes / 9 shards
50 tests + 0 parameterized classes / 9 shards

I think if there's a 0 we should skip it.

12 parameterized classes / 9 shards
50 tests / 9 shards

adamfilipow92 · 2020-08-27T14:39:12Z

We have a parameter --default-test-time so in a case where we have set this param I see three potential solutions

Use value from --default-test-time
Multiply value from --default-test-time by 2
Add command to set default test time for parametrized test

WDYT?
@bootstraponline @pawelpasterz

bootstraponline · 2020-09-04T12:50:56Z

We have a parameter --default-test-time so in a case where we have set this param I see three potential solutions

We probably want a --default-class-time, which by default can equal the default test time. Then users can independently set default times for parameterized classes vs individual test methods.

kozaxinan added the Bug label Aug 12, 2020

bootstraponline added the Tiger 🐯 label Aug 13, 2020

pawelpasterz self-assigned this Aug 13, 2020

adamfilipow92 self-assigned this Aug 26, 2020

pawelpasterz mentioned this issue Aug 28, 2020

fix: Shards and Tests count do not match #1059

Merged

2 tasks

pawelpasterz closed this as completed in #1059 Sep 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test count and smart sharding count dont match #986

Test count and smart sharding count dont match #986

kozaxinan commented Aug 12, 2020

bootstraponline commented Aug 12, 2020

kozaxinan commented Aug 12, 2020

kozaxinan commented Aug 12, 2020

bootstraponline commented Aug 12, 2020

kozaxinan commented Aug 13, 2020

pawelpasterz commented Aug 20, 2020 •

edited

Loading

bootstraponline commented Aug 20, 2020

pawelpasterz commented Aug 24, 2020 •

edited

Loading

bootstraponline commented Aug 24, 2020

bootstraponline commented Aug 24, 2020

pawelpasterz commented Aug 24, 2020 •

edited

Loading

pawelpasterz commented Aug 24, 2020

kozaxinan commented Aug 25, 2020

bootstraponline commented Aug 25, 2020

pawelpasterz commented Aug 27, 2020 •

edited

Loading

pawelpasterz commented Aug 27, 2020 •

edited

Loading

bootstraponline commented Aug 27, 2020

adamfilipow92 commented Aug 27, 2020 •

edited

Loading

bootstraponline commented Sep 4, 2020

Test count and smart sharding count dont match #986

Test count and smart sharding count dont match #986

Comments

kozaxinan commented Aug 12, 2020

bootstraponline commented Aug 12, 2020

kozaxinan commented Aug 12, 2020

kozaxinan commented Aug 12, 2020

bootstraponline commented Aug 12, 2020

kozaxinan commented Aug 13, 2020

pawelpasterz commented Aug 20, 2020 • edited Loading

tl;dr

Example:

bootstraponline commented Aug 20, 2020

pawelpasterz commented Aug 24, 2020 • edited Loading

bootstraponline commented Aug 24, 2020

bootstraponline commented Aug 24, 2020

pawelpasterz commented Aug 24, 2020 • edited Loading

pawelpasterz commented Aug 24, 2020

kozaxinan commented Aug 25, 2020

bootstraponline commented Aug 25, 2020

pawelpasterz commented Aug 27, 2020 • edited Loading

pawelpasterz commented Aug 27, 2020 • edited Loading

bootstraponline commented Aug 27, 2020

adamfilipow92 commented Aug 27, 2020 • edited Loading

bootstraponline commented Sep 4, 2020

pawelpasterz commented Aug 20, 2020 •

edited

Loading

pawelpasterz commented Aug 24, 2020 •

edited

Loading

pawelpasterz commented Aug 24, 2020 •

edited

Loading

pawelpasterz commented Aug 27, 2020 •

edited

Loading

pawelpasterz commented Aug 27, 2020 •

edited

Loading

adamfilipow92 commented Aug 27, 2020 •

edited

Loading