Skip to content

Major release: proper clustermq support and reduced overhead in make()

Compare
Choose a tag to compare
@wlandau wlandau released this 29 Sep 22:17

Breaking changes

For the sake of reproducibility and speed, drake version 6.0.0 is more discerning in how it detects dependencies:

  1. Targets in the plan.
  2. Functions and objects in the environment.
  3. Objects and functions from packages that are explicitly namespaced with :: and :::.

In other words, there is a clearer line between what drake detects and what it does not. And it no longer dives into packages or parent environments automatically by default. The old approach

  1. Made workflows more brittle (likely to fall out of date).
  2. Was categorically inferior to packrat in terms of package reproducibility.

Unfortunately, the change also puts old workflows out of date. Sorry for the inconvenience.

Other breaking changes that put old projects out of date:

  • Avoid serialization in digest() wherever possible. This puts old drake projects out of date, but it improves speed.
  • Require R version >= 3.3.0 rather than >= 3.2.0. Tests and checks still run fine on 3.3.0, but the required version of the stringi package no longer compiles on 3.2.0.

Bug fixes

  • In the call to unlink() in clean(), set recursive and force to FALSE. This should prevent the accidental deletion of whole directories.
  • Previously, clean() deleted input-only files if no targets from the plan were cached. A patch and a unit test are included in this release.
  • loadd(not_a_target) no longer loads every target in the cache.
  • Exclude each target from its own dependency metadata in the "deps" igraph vertex attribute (fixes #503).
  • Detect inline code dependencies in knitr_in() file code chunks.
  • Remove more calls to sort(NULL) that caused warnings in R 3.3.3.
  • Fix a bug on R 3.3.3 where analyze_loadd() was sometimes quitting with "Error: attempt to set an attribute on NULL".
  • Do not call digest::digest(file = TRUE) on directories. Instead, set hashes of directories to NA. Users should still not directories as file dependencies.
  • If files are declared as dependnecies of custom triggers ("condition" and "change") include them in vis_drake_graph(). Previously, these files were missing from the visualization, but actual workflows worked just fine. Ref: https://stackoverflow.com/questions/52121537/trigger-notification-from-report-generation-in-r-drake-package
  • Work around mysterious codetools failures in R 3.3 (add a tryCatch() statement in find_globals()).

New features

  • Add a proper clustermq-based parallel backend: make(parallelism = "clustermq").
  • evaluate_plan(trace = TRUE) now adds a *_from column to show the origins of the evaluated targets. Try evaluate_plan(drake_plan(x = rnorm(n__), y = rexp(n__)), wildcard = "n__", values = 1:2, trace = TRUE).
  • Add functions gather_by() and reduce_by(), which gather on custom columns in the plan (or columns generated by evaluate_plan(trace = TRUE)) and append the new targets to the previous plan.
  • Expose the template argument of clustermq functions (e.g. Q() and workers()) as an argument of make() and drake_config().
  • Add a new code_to_plan() function to turn R scripts and R Markdown reports into workflow plan data frames.
  • Add a new drake_plan_source() function, which generates lines of code for a drake_plan() call. This drake_plan() call produces the plan passed to drake_plan_source(). The main purpose is visual inspection (we even have syntax highlighting via prettycode) but users may also save the output to a script file for the sake of reproducibility or simple reference.
  • Deprecate deps_targets() in favor of a new deps_target() function (singular) that behaves more like deps_code().

Enhancements

  • Smooth the edges in vis_drake_graph() and render_drake_graph().
  • Make hover text slightly more readable in in vis_drake_graph() and render_drake_graph().
  • Align hover text properly in vis_drake_graph() using the "title" node column.
  • Optionally collapse nodes into clusters with vis_drake_graph(collapse = TRUE).
  • Improve dependency_profile() show major trigger hashes side-by-side
    to tell the user if the command, a dependency, an input file, or an ouptut file changed since the last make().
  • Choose more appropriate places to check that the txtq package is installed.
  • Improve the help files of loadd() and readd(), giving specific usage guidance in prose.
  • Memoize all the steps of build_drake_graph() and print to the console the ones that execute.
  • Skip some tests if txtq is not installed.