Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Release #40

Merged
merged 47 commits into from
Aug 17, 2022
Merged

New Release #40

merged 47 commits into from
Aug 17, 2022

Conversation

nicolas-kuechler
Copy link
Owner

@nicolas-kuechler nicolas-kuechler commented May 6, 2022

Potentially Breaking Changes (Migration Guide)

If you want to migrate an old project to the most recent version, there are a few breaking changes that you have to consider.
In the future, I would like to prevent breaking changes as much as possible but this PR changes the overall structure significantly and so I did not want to introduce a lot of overhead to also support the previous version.

  • A change in the design extension (separation from Ansible) has led to some of the filters not being available anymore. If you've used the json_query filter to find another machine DNS in a multi-instance experiment, then you should check the example04-multi.yml for the correct usage of quotation marks.

  • Renaming of folders does_config and does_results to doe-suite-config and doe-suite-results

  • Change of config folder structure -> the folder should be generated using the new process and then you can migrate old roles.

  • The def get_output_dir(self, etl_info) function of the Loader changed the signature.

General Workflow (Usability)

We renamed the folder does_config to doe-suite-config to match the repo's name, same with does_results to doe-suite-results.

Makefile

A big usability feature is the introduction of a Makefile.
The complete interaction with the doe-suite should be using make
Before commands were getting more and more complex to remember and you most likely relied on the bash history to start experiments etc.
Now all the important commands are available as a make target and you can simply use make or make help in the root of the doe-suite repo to see an overview of all commands:

Running Experiments
  make run suite=<SUITE> id=new                       - run the experiments in the suite
  make run suite=<SUITE> id=<ID>                      - continue with the experiments in the suite with <ID> (often id=last)
  make run suite=<SUITE> id=<ID> cloud=<CLOUD>        - run suite on non-default cloud ([aws], euler)
  make run suite=<SUITE> id=<ID> expfilter=<REGEX>    - run only subset of experiments in suite where name matches the <REGEX>
Clean
  make clean                                          - terminate running cloud instances belonging to the project and local cleanup
  make clean-result                                   - delete all results in doe-suite-results except for the last (complete) suite run per suite
Running ETL Locally
  make etl suite=<SUITE> id=<ID>                      - run the etl pipeline of the suite (locally) to process results (often id=last)
  make etl-design suite=<SUITE> id=<ID>               - same as `make etl ...` but uses the pipeline from the suite design instead of results
  make etl-all                                        - run etl pipelines of all results
  make etl-super config=<CONFIG> out=<PATH>           - run the super etl to combine results of multiple suites  (for <CONFIG> e.g., demo_plots)
Clean ETL
  make etl-clean suite=<SUITE> id=<ID>                - delete etl results from specific suite (can be regenerated with make etl ...)
  make etl-clean-all                                  - delete etl results from all suites (can be regenerated with make etl-all)
Gather Information
  make info                                           - list available suite designs
  make status suite=<SUITE> id=<ID>                   - show the status of a specific suite run (often id=last)
Design of Experiment Suites
  make design suite=<SUITE>                           - list all the run commands defined by the suite
  make design-validate suite=<SUITE>                  - validate suite design and show with default values
Setting up a Suite
  make new                                            - initialize doe-suite-config from a template
Running Tests
  make test                                           - running all suites (seq) and comparing results to expected (on aws)
  make euler-test cloud=euler                         - running all single instance suites on euler and compare results to expected
  make etl-test-all                                   - re-run all etl pipelines and compare results to current state (useful after update of etl step)

Multi-User Setup

We improve the usability for multiple people working on the same project.
Previously, prj_id and ssh_key_name, and also the Euler username were variables set in group_vars/all. As a result, when multiple people wanted to work on the same project they had to have different versions of the group_vars file.

We extracted these variables to the environment variables.
Everything that needs to be different for two people working on the same project should now be in environment variables.
For example:

export DOES_PROJECT_DIR=/home/kuenico/dev/doe-suite/demo_project
export DOES_SSH_KEY_NAME=id_rsa_zeph
export DOES_EULER_USER=kunicola
export DOES_PROJECT_ID_SUFFIX=nku

Extracting these user-specific configs to environment variables also allowed to commit the group_vars from the demo_project and overall simplify the "Getting Started Process"

Getting Started Process

We remove the repotemplate.py functionality and instead rely on cookiecutter for initializing a new doe-suite project (does_config folder).

The template of the does_config folder can be found in cookiecutter-does_config.

The cookiecutter template process can be started with make new and it considers the DOES_PROJECT_DIR environment variable to see whether a does_configfolder already exists.

Among other things, cookiecutter provides hooks that allow executing arbitrary python code before and after creating the files. This is used to replace the feature with setting up-to-date ec2 images in the config.

Github Repo

Instead of a single repo, now it's possible to define a list of repositories.
For each repo, it's now also possible to set a specific branch or commit for checkout.

Testing

After changes in the doe-suite, it was very tedious to check if the core functionality is still working.
Basically, we would have to run all examples and then manually check that the results and etl_results are as we would expect.

We replace this manual workflow with a simple command:

Running Tests
  make test                                           - running all suites sequentially and comparing results to expected

For each example experiment, we keep now a results folder demo_project/does_results/example01-minimal_$expected in the repository that shows how we expect the results from this example.
When we run the simple command above, all experiments are run sequentially and after completion, the produced results are compared with the expected.
If the two result files have differences (except for the suite id), then an error is raised.

This is the first step toward providing CI functionality for the doe-suite.

Design Enhancements

Filter Experiments

It's now possible to run only a subset of the experiments defined in a suite.
You can filter experiment names with a regex provided when running a new suite:
make run suite=<SUITE> id=new expfilter=<REGEX>

Developing Designs

The process of developing designs has become easier.
Two make targets can take the design and convert it into the list of jobs that they define

Design of Experiment Suites
  make design suite=<SUITE>                           - list all the run commands defined by the suite
  make design-validate suite=<SUITE>                  - validate suite design and show with default values

For example, running make design suite=example01-minimal results in the following list of commands on stdout:

Experiment=minimal
  run=000 host=small-0: echo "hello world. "
  run=001 host=small-0: echo "hello world! "
  run=002 host=small-0: echo "hello universe. "
  run=003 host=small-0: echo "hello universe! "

Self-Referencing Variables

In the design, it should be possible to use arbitrary nested, self-referencing variables.
For example: a refers to b b refers to c -> after resolution a uses c
This allows for writing repetitive parts of a design more concisely.

Various

We introduced the possibility of a custom range syntax. Now we removed it again because we noticed that you can already use the default jinja2 syntax for this: {{ range(10) | list }} produces [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

ETL Enhancements

Running ETL Locally

TheMakefile includes targets for running the ETL pipeline independent of the doe-suite on existing results.

Running ETL
  make etl suite=<SUITE> id=<ID>                      - run the etl pipeline of the suite (locally) to process results (often id=last)

For convenience, we also provide the id=last feature known from continuing running experiment suites.

Package for your own Steps

In the does_config folder, there is now a python package called does_config. In this python package, you can define custom ETL steps and they should then be available for the designs of this project.
The advantage over the previous solution for providing custom ETL steps is that now since you have your own poetry package for this, you can also introduce new custom dependencies that are not present in the doe-suite.

Error Handling

A failure in the ETL pipeline does not stop an experiment.
However, we should still notify the user that an error occurred and provide information on the error.

  • option to investigate: write a module that allows raising ansible warnings, on each "output progress info" should have a flag whether etl failed or worked.

We will not implement a complex error handling in this PR. The current error handling is not elegant but sufficient for debugging.

Include ETL Pipelines and ETL Stages

Sometimes we want to reuse complete ETL Pipelines or at least ETL Stages (e.g., the extract stage is always the same) in different suites or for different experiments.

Before we had to duplicate the definition of each ETL pipeline. Now with this feature, we can INCLUDE a pipeline from another design or from a folder dedicated to ETL templates: does_config/designs/etl_templates

"All" Experiments for ETL Pipeline

Before all experiments had to be listed by name to say that their results should be used in an ETL pipeline.
You can still do this but for pipelines that should just use all experiments, we can simply use * instead of the experiment name list.

ETL and Super ETL Examples

In the current designs, there are not that many examples of ETL pipelines and no example for a super ETL that allows combining results from different suites.

The super etl example should show how this can be used for a paper.

Various

  • when running etl locally, we should be able to say that etl config of current design should be used instead of the old one present at the time when the experiment was initially run.
    TODO: not present in makefile

  • for existing ETL pipelines, provide a way to re-run all ETL pipelines on the results folder. The idea is to be able to check whether after changing e.g., transformer, loader that everything still works.

  • when running etl locally, we should be able to say that etl config of current design should be used instead of the old one present at the time when the experiment was initially run.

  • support id=last in manual etl call

  • by default do not require GPU for running the design examples

@nicolas-kuechler nicolas-kuechler changed the title [WIP] ETL Extension ETL Extension Jul 22, 2022
@nicolas-kuechler nicolas-kuechler changed the title ETL Extension New Release Jul 22, 2022
@nicolas-kuechler nicolas-kuechler merged commit f2a06e1 into main Aug 17, 2022
@nicolas-kuechler nicolas-kuechler deleted the etl branch March 13, 2023 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants