Improve how we test OSS vs Default distribution #46378

alpar-t · 2019-09-05T12:58:49Z

This issue replaces #30903.

Currently we have a tests.distribution that switches the distribution flavor for some tests. The default is OSS and we have CI jobs that test with default.
This approach doesn't make it clear what's being tested and needlessly re-runs all the tests that use the integ test distribution between the runs.
With how we currently have CI set up ( e.x. matrix bwc jobs ) there's no performance benefit either.

We should have a single job in CI and have specific sub-projects for testing OSS vs DEFAULT distribution for the projects for which it makes sense.
The suggestion to use projects instead of tasks is so that we can run them in parallel.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-09-05T12:58:51Z

Pinging @elastic/es-core-infra

rjernst · 2019-09-11T17:54:05Z

We already have different tests for oss vs default distribution. The tests here are specifically to make sure the oss tests (those in qa/) also work with the default distribution. I'm hesitant to make the qa/ projects more complex to have additional subprojects for that purpose. Can you expand on the problem trying to be solved here? IIRC the CI job testing this only runs under qa/, so we don't rerun any integ-test-zip jobs.

alpar-t · 2019-09-12T08:30:49Z

I don't like the fact that we have a special case for this. like you said in some cases we have dedicated tests and in others we have a system property to change existing tests, especially for something that needs to run all the time.

We should, as a rule, always have dedicated tasks for tests unless the setting we are changing are applicable to all the tests ( like the runtime jdk, or additional jvm arguments ).

The complexity this adds to CI is worse than it would add to Gradle.
I think all it takes is adding subprojects to the qa projects and wrapping the existing Gradle file like:

['OSS', 'DEFAULT'].each {
   subproject(it) {
       ...
   }
}

We have plenty of precedent for something like this and it has the added benefit that one can get a better picture of what we are testing by looking just at the build files.

rjernst · 2019-09-12T18:34:43Z

Doing subprojects like that complicate every qa project. It means creating a jar of the tests, depending on that jar in the subprojects, and extracting the jar along with test setup to use the extracted dir for the test classpath. This is similar to the logic in sql qa tests, which was difficult to get right and very messy.

I agree with you in general about dedicated tasks, but in this case I think it is a special case. We have qa tests under x-pack that test xpack behavior. We have qa tests at the root testing oss behavior. We won't want the inverse (running xpack qa tests with oss distribution); we only want this limited set of tests to also be run with the default distribution as a safety precaution. It is very unlikely any of these tests would break from running with the default distribution, and thus running them outside of CI is not something I think is necessary. As long as the reproduction line includes the distribution flag, I dont' see a problem with this special case testing only existing in CI.

mark-vieira · 2019-09-13T21:53:15Z

This kind of thing is definitely a judgement call. To what degree to we introduce distinct tasks/projects for every variant of testing we perform. An example of this is FIPS, we obviously don't create a second version of every test task for every project simply enables FIPS, instead we use a flag passed to the build.

I think the main objection is that we don't want to use build flags when possible because they aren't well documented, and are mostly opaque. I think we can improve this though by making build parameters a first-class thing. This would mean declaring this in a static way such that if you pass an unknown one we can do things like "hey, you passed test.distribution did you mean tests.distribution, document them in a way you could run a task to list them all, or describe what they do, similar to gradle tasks, etc.

I think the number of test suites involved in the oss/default axis is significant enough to prefer some kind of build flag due to all the additional projects/tasks that would have to be created otherwise. So I'd lean towards sticking with, and improving the existing experience.

alpar-t · 2019-10-07T15:30:06Z

I have been in the situation where I had to test this locally and it's a very poor experience, especially since there's no way to run them all.
Granted this was a change in the build code for which the distribution probably mattered more.

We are also not quite consistent with the default for tests.distributions as qa default to oss and docs to default.

Also switching the distribution doesn't make sense for every project in qa some can use the integ test distribution just fine.

The additional jobs also create overhead for test triage and increase complexity of the CI setup which I would vert much like to avoid.
If we think we are starting to test too much in intake and PR checks and it's taking too long I would prefer a more structured approach, where these tests have dedicated tasks and we add something like extendedCheck which depends on them, but would very much like to avoid doing this as it's can very easily turn into a place where tests go to die.

@rjernst to your point on complexity of setting the projects up, I think that's something we have to address at some point, by coming up with a proper abstraction that makes it easy to set up since it's a pattern we already use. Security also has tests like that.

rjernst · 2019-10-07T20:45:28Z

I have been in the situation where I had to test this locally and it's a very poor experience, especially since there's no way to run them all.

Why would you need to run them all? This is a situation where the "default" (not default distribution, but the distribution we run with when no extra parameters are given) should be sufficient for almost any change. I've never even come across something that would fail. We only want to run with the different distribution as a secondary check to ensure something isn't horribly wrong (eg we somehow omitted an oss module in our default distribution). We have several other checks to ensure that kind of thing doesn't happen (eg packaging tests check which modules are installed), so I don't think we should overcomplicate all qa tests for this extremely unlikely scenario.

rjernst · 2019-10-07T20:47:26Z

We are also not quite consistent with the default for tests.distributions

I agree docs should really use the oss distro, although if x-pack docs ever move into a single docs directory that would change back.

alpar-t · 2019-10-08T05:15:11Z

I was working on the testclusters back-port, and only had back-ported qa/full-cluster-restart. The requirements to bring up say a 6.2.0 distribution are very different, so I had to run each test and make sure none o them break after making changes. I do agree that is fairly specific.

Perhaps a good compromise would be to keep the separate CI jobs, but have multiple tasks within the same project for the different tests.
That would make a cleaner separation of what all the additional tests are than what the current directory based one.

mark-vieira · 2020-12-18T20:59:09Z

I don't think we have much in the way of consensus on whether this is really an issue or not. Given that, I'm going to close this for now.

alpar-t added >enhancement :Delivery/Build Build or test infrastructure labels Sep 5, 2019

alpar-t mentioned this issue Sep 5, 2019

Provide a way for individual projects to specify which distributions they can run/test against #30903

Closed

alpar-t mentioned this issue Sep 12, 2019

Integration tests should sometimes cleanly shut nodes down #35585

Open

rjernst added the Team:Core/Infra Meta label for core/infra team label May 4, 2020

mark-vieira added Team:Delivery Meta label for Delivery team and removed Team:Core/Infra Meta label for core/infra team labels Nov 11, 2020

mark-vieira closed this as completed Dec 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve how we test OSS vs Default distribution #46378

Improve how we test OSS vs Default distribution #46378

alpar-t commented Sep 5, 2019

elasticmachine commented Sep 5, 2019

rjernst commented Sep 11, 2019

alpar-t commented Sep 12, 2019

rjernst commented Sep 12, 2019

mark-vieira commented Sep 13, 2019

alpar-t commented Oct 7, 2019

rjernst commented Oct 7, 2019

rjernst commented Oct 7, 2019

alpar-t commented Oct 8, 2019

mark-vieira commented Dec 18, 2020

Improve how we test OSS vs Default distribution #46378

Improve how we test OSS vs Default distribution #46378

Comments

alpar-t commented Sep 5, 2019

elasticmachine commented Sep 5, 2019

rjernst commented Sep 11, 2019

alpar-t commented Sep 12, 2019

rjernst commented Sep 12, 2019

mark-vieira commented Sep 13, 2019

alpar-t commented Oct 7, 2019

rjernst commented Oct 7, 2019

rjernst commented Oct 7, 2019

alpar-t commented Oct 8, 2019

mark-vieira commented Dec 18, 2020