Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experiments: test_new_simple is flaky #4401

Closed
pmrowla opened this issue Aug 17, 2020 · 4 comments · Fixed by #4414 or #4432
Closed

experiments: test_new_simple is flaky #4401

pmrowla opened this issue Aug 17, 2020 · 4 comments · Fixed by #4414 or #4432
Assignees
Labels
A: experiments Related to dvc exp bug Did we break something?

Comments

@pmrowla
Copy link
Contributor

pmrowla commented Aug 17, 2020

Need to investigate flaky test, probably hitting some gitpython related race condition again

context:
https://travis-ci.com/github/iterative/dvc/jobs/372602915
https://travis-ci.com/github/iterative/dvc/jobs/372071149

@pmrowla pmrowla added bug Did we break something? A: experiments Related to dvc exp labels Aug 17, 2020
@pmrowla pmrowla self-assigned this Aug 17, 2020
This was referenced Aug 17, 2020
@skshetry skshetry mentioned this issue Aug 18, 2020
2 tasks
@skshetry
Copy link
Member

skshetry commented Aug 18, 2020

Another failure on a different test:

=================================== FAILURES ===================================
_____________________________ test_show_experiment _____________________________
[gw2] linux -- Python 3.7.1 /home/travis/virtualenv/python3.7.1/bin/python
tmp_dir = PosixTmpDir('/tmp/pytest-of-travis/pytest-0/popen-gw2/test_show_experiment0')
scm = Git: '/tmp/pytest-of-travis/pytest-0/popen-gw2/test_show_experiment0/.git'
dvc = Repo: '/tmp/pytest-of-travis/pytest-0/popen-gw2/test_show_experiment0'
    def test_show_experiment(tmp_dir, scm, dvc):
        tmp_dir.gen("copy.py", COPY_SCRIPT)
        tmp_dir.gen("params.yaml", "foo: 1")
        dvc.run(
            cmd="python copy.py params.yaml metrics.yaml",
            metrics_no_cache=["metrics.yaml"],
            params=["foo"],
            name="foo",
        )
        scm.add(["copy.py", "params.yaml", "metrics.yaml", "dvc.yaml", "dvc.lock"])
        scm.commit("baseline")
        baseline_rev = scm.get_rev()
        timestamp = datetime.fromtimestamp(
            scm.repo.rev_parse(baseline_rev).committed_date
        )
    
        dvc.reproduce(PIPELINE_FILE, experiment=True, params=["foo=2"])
        results = dvc.experiments.show()
    
        expected_baseline = {
            "metrics": {"metrics.yaml": {"foo": 1}},
            "params": {"params.yaml": {"foo": 1}},
            "queued": False,
            "timestamp": timestamp,
        }
        expected_params = {"foo": 2}
    
        assert set(results.keys()) == {"workspace", baseline_rev}
        experiments = results[baseline_rev]
        assert len(experiments) == 2
        for rev, exp in experiments.items():
            if rev == "baseline":
                assert exp == expected_baseline
            else:
                assert exp["metrics"]["metrics.yaml"] == expected_params
                assert exp["params"]["params.yaml"] == expected_params
>               assert exp["timestamp"] > timestamp
E               assert datetime.datetime(2020, 8, 18, 10, 1, 1) > datetime.datetime(2020, 8, 18, 10, 1, 1)

Could be a one-time failure though, but maybe this assumption about time should not even be here?

@skshetry
Copy link
Member

Happened again, but with a different message: https://travis-ci.com/github/iterative/dvc/jobs/375383416

self = BaselineMismatchError(<git.Commit "d31b10fcd1a8e1191e215399a2a978db245349d5">, '232a7cbd9b448ce97013eb3db486fc3e2a104b00')
rev = <git.Commit "d31b10fcd1a8e1191e215399a2a978db245349d5">
expected = '232a7cbd9b448ce97013eb3db486fc3e2a104b00'
    def __init__(self, rev, expected):
>       rev_str = f"{rev[:7]}" if rev is not None else "dangling commit"
E       TypeError: 'Commit' object is not subscriptable

@skshetry skshetry reopened this Aug 19, 2020
skshetry added a commit to skshetry/dvc that referenced this issue Aug 20, 2020
It is for short period of time till it gets resolved
See: iterative#4401
skshetry added a commit that referenced this issue Aug 20, 2020
It is for short period of time till it gets resolved
See: #4401
@pmrowla
Copy link
Contributor Author

pmrowla commented Aug 20, 2020

I've narrowed down the issue to gitpython Repo.is_dirty() misbehaving, but still not sure of the actual cause. In the flaky runs, for some reason is_dirty() is incorrectly returning False.

Inserting this code into experiments._commit

        if not self.scm.is_dirty():
            tree = self.exp_dvc.tree
            logger.debug("--- workspace params.yaml:")
            with tree.open("params.yaml") as fobj:
                data = fobj.read()
            logger.debug(f"{data}")

            rev = self.scm.get_rev()
            tree = self.scm.get_tree(rev)
            logger.debug("--- baseline params.yaml:")
            with tree.open(os.path.join(self.exp_dvc.root_dir, "params.yaml")) as fobj:
                data = fobj.read()
            logger.debug(f"{data}")

            logger.debug(f"non-dirty exp workspace @ '{rev}'")
            raise UnchangedExperimentError(self.scm.get_rev())

gives

2020-08-20 07:38:35,961 DEBUG: Commiting experiment '484f6348df10e5f532c32e264e5278730ff648f379791b54ecf134caf8582cac'
2020-08-20 07:38:35,996 DEBUG: --- workspace params.yaml:
2020-08-20 07:38:35,996 DEBUG: foo: 2
2020-08-20 07:38:35,998 DEBUG: --- baseline params.yaml:
2020-08-20 07:38:36,000 DEBUG: foo: 1
2020-08-20 07:38:36,001 DEBUG: non-dirty exp workspace @ '627722c61614e0af2aa6f0a4fdf7abd5b81961f5'
2020-08-20 07:38:36,003 DEBUG: Experiment 'f75f01e0ed462d7c07ffd8c16ae87632f19c6a49' identical to baseline '627722c61614e0af2aa6f0a4fdf7abd5b81961f5'
Reproduced experiment '627722c'.

so params.yaml should be dirty here

edit: the issue here was that exp_dvc.tree.open("params.yaml") opens cwd / params.yaml and not the experiments workspace params.yaml, is_dirty() was actually working properly

@pmrowla
Copy link
Contributor Author

pmrowla commented Aug 20, 2020

So the underlying issue is related to input collection when we populate the executor tmp dir. The gitpython index/tree doesn't always behave consistently for merge commits, so sometimes the uploaded tree did not contain the correct changes.

Abstracting the tmp dir input collection to use git stash apply and uploading directly from the dirty experiments workspace (the same way we do output collection now) rather than relying on merge commit git trees.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp bug Did we break something?
Projects
None yet
2 participants