Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable dvc fetching of experiments cache data #4649

Closed
yogi81 opened this issue Sep 30, 2020 · 12 comments
Closed

Enable dvc fetching of experiments cache data #4649

yogi81 opened this issue Sep 30, 2020 · 12 comments
Labels
A: experiments Related to dvc exp feature request Requesting a new feature

Comments

@yogi81
Copy link

yogi81 commented Sep 30, 2020

dvc version 1.7.9

Currently something is not working here:

I have pulled all of commits with git fetch in main git repository.
Also I defined a remote of .dvc/experiments repository a remote .dvc/experiments repository and did a git fetch there

I have created all of the branches from remote .dvc/experiments repository in local .dvc/experiments repository

I did also dvc fetch --run-cache

dvc exp show
shows the table with experiments but metric columns are empty. This is not the case for the remote repository.

Running dvc exp show -v shows the problem

The appropriate .dvc/cache entries are not copied other when doing dvc fetch.

2020-09-30 18:21:33,771 DEBUG: Assuming '/home/roman/projects/m/.dvc/cache/be/6cd7ff3b43a7e068b23c0cd7e218a1' is unchanged since it is read-only
2020-09-30 18:21:35,286 DEBUG: cache '/home/roman/projects/m/.dvc/cache/36/7d98915736bfb3478a4bcd4cc608eb' expected 'HashInfo(name='md5', value='367d98915736bfb3478a4bcd4cc608eb', dir_info=None)' actual 'None'
2020-09-30 18:21:35,286 DEBUG: failed to read 'results.yaml' on 'fb6bcf5-773408db2a77a913661dcf3f78f16158b655c4a2bea9bb6e457f503de2897234'

Traceback (most recent call last):
File "/home/roman/.local/lib/python3.6/site-packages/dvc/repo/metrics/show.py", line 75, in _read_metrics
val = load_yaml(metric, tree=tree)
File "/home/roman/.local/lib/python3.6/site-packages/dvc/utils/serialize/_yaml.py", line 18, in load_yaml
return _load_data(path, parser=parse_yaml, tree=tree)
File "/home/roman/.local/lib/python3.6/site-packages/dvc/utils/serialize/_common.py", line 19, in _load_data
with open_fn(path, encoding="utf-8") as fd:
File "/home/roman/.local/lib/python3.6/site-packages/dvc/tree/repo.py", line 159, in open
return dvc_tree.open(path, mode=mode, encoding=encoding, **kwargs)
File "/home/roman/.local/lib/python3.6/site-packages/dvc/tree/dvc.py", line 77, in open
raise FileNotFoundError
FileNotFoundError

@yogi81
Copy link
Author

yogi81 commented Sep 30, 2020

@pmrowla

@pmrowla
Copy link
Contributor

pmrowla commented Oct 1, 2020

This is not a supported use case and expected behavior. Experiments are considered local-only right now and the associated cache data can't be pushed/pulled between machines. What you would need to do here is dvc exp checkout ... and promote/commit an experiment into your main repository on the remote machine. Then it can be fetched/pulled as usual in the main repository on your local machine.

@yogi81
Copy link
Author

yogi81 commented Oct 1, 2020

I understand that it is not supported right now. But it needs to be supported in the future.

Already now I have more then 20 experiments and their results on a remote machine and it is not feasible to check out and transfer each of them one by one.

Why Is this not a supported use case?

@pared
Copy link
Contributor

pared commented Oct 1, 2020

@yogi81
Experiments are not intended to be shared outside the local machine. The assumed workflow is that once interesting change/sets of params (whatever experiments have been used to play with) is found, the user will commit the change into git and publish only the most relevant change.

What is your use case? Why do you need to share the experiments? It is possible that we are missing some use cases, so describing them is highly useful, especially in the case of new features.

There is also an issue to preserve more than 1 experiment, which might be relevant to your use case:
#4448

@yogi81
Copy link
Author

yogi81 commented Oct 1, 2020

Here is my use case (as me being a dvc user ;) )
Some experiments can not be run on the local machine because of ressource contraints/huge dataset that is why they are executed on other machines. For example (DGX2 ;)

When you have multiple computers you need to be able to download/push the experiment results for local/remote examination.

Also when you work in a team of machine learning researchers it is good to be able to give the team members access to the experiment results you have done for further examination/use. It is good to be able to get all of the experiments you have done and not only the single selected ones :)

@yogi81
Copy link
Author

yogi81 commented Oct 1, 2020

By they way, I can achieve this feature I want with following steps:

  1. rsync of .dvc/cache folder
  2. git fetch of remote .dvc/experiments folder to local .dvc/experiments
  3. cp .dvc/experiments/.git/refs/remotes/<remote_name>/heads/* .dvc/experiments/.git/refs//heads/*

After this all experiments can be accessed also locally.

It would be much more efficient if this feature would be implemented in dvc, because then I would not need to rsync the whole dvc/cache folder anymore but only experiment relevant cache entries would be copied.

@yogi81
Copy link
Author

yogi81 commented Oct 1, 2020

And by the way the #4448 goes in my direction but proposed solution is much more complicated because of the need of solving a problem how to save multiple experiments in one git commit, which does not need to be done at all if you would just copy experiment branches from remove/.dvc/experiments folder to local/.dvc/experiments folder , and also retrieve experiments dvc cache entries

@pared
Copy link
Contributor

pared commented Oct 1, 2020

When you have multiple computers you need to be able to download/push the experiment results for local/remote examination.

That makes perfect sense.

So, how usually "choosing" the way to develop the project further looks like in your case?
Multiple researchers take a look at a few experiments and discuss which one will become a baseline?
Would you like te to save your experiments for the future?

@pared pared added the feature request Requesting a new feature label Oct 1, 2020
@yogi81
Copy link
Author

yogi81 commented Oct 1, 2020

It is always a good idea to be able to access previous results. So yes, they need to be saved somewhere.

The process is as following:
The results are presented to the team, team members access experiments also do their own additional experiments/evaluations. A particular model is selected and provided to our stakeholders. Usually when there are additional requests from the stakeholders it is important to look at experiments again and provide a different model.

@efiop efiop added the A: experiments Related to dvc exp label Oct 1, 2020
@efiop
Copy link
Contributor

efiop commented Oct 1, 2020

@yogi81 Also just wanted to clarify that experiments are under active development and are not intended for the common public yet, so you are using them at your own risk. This is not even a beta release yet, experiments are not documented in official docs at dvc.org, hidden behind a special config option and even CLI help is also hidden. So there might be features missing and some stuff might even break. Stay tuned. We really appreciate your feedback as an early adopter! 🙏

@yogi81
Copy link
Author

yogi81 commented Oct 1, 2020

@efiop
Thank you very much for developing this great feature! I am aware that it is in pre alfa state :) I am happy to contribute where I can.

@dberenbaum
Copy link
Contributor

While sharing experiments is still evolving, exp push and exp pull have been supported for awhile now, which I think should address this issue. Closing but feel free to reopen if I missed something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp feature request Requesting a new feature
Projects
None yet
Development

No branches or pull requests

5 participants