Major release: proper clustermq support and reduced overhead in make()
Breaking changes
For the sake of reproducibility and speed, drake
version 6.0.0 is more discerning in how it detects dependencies:
- Targets in the plan.
- Functions and objects in the environment.
- Objects and functions from packages that are explicitly namespaced with
::
and:::
.
In other words, there is a clearer line between what drake
detects and what it does not. And it no longer dives into packages or parent environments automatically by default. The old approach
- Made workflows more brittle (likely to fall out of date).
- Was categorically inferior to
packrat
in terms of package reproducibility.
Unfortunately, the change also puts old workflows out of date. Sorry for the inconvenience.
Other breaking changes that put old projects out of date:
- Avoid serialization in
digest()
wherever possible. This puts olddrake
projects out of date, but it improves speed. - Require R version >= 3.3.0 rather than >= 3.2.0. Tests and checks still run fine on 3.3.0, but the required version of the
stringi
package no longer compiles on 3.2.0.
Bug fixes
- In the call to
unlink()
inclean()
, setrecursive
andforce
toFALSE
. This should prevent the accidental deletion of whole directories. - Previously,
clean()
deleted input-only files if no targets from the plan were cached. A patch and a unit test are included in this release. loadd(not_a_target)
no longer loads every target in the cache.- Exclude each target from its own dependency metadata in the "deps"
igraph
vertex attribute (fixes #503). - Detect inline code dependencies in
knitr_in()
file code chunks. - Remove more calls to
sort(NULL)
that caused warnings in R 3.3.3. - Fix a bug on R 3.3.3 where
analyze_loadd()
was sometimes quitting with "Error: attempt to set an attribute on NULL". - Do not call
digest::digest(file = TRUE)
on directories. Instead, set hashes of directories toNA
. Users should still not directories as file dependencies. - If files are declared as dependnecies of custom triggers ("condition" and "change") include them in
vis_drake_graph()
. Previously, these files were missing from the visualization, but actual workflows worked just fine. Ref: https://stackoverflow.com/questions/52121537/trigger-notification-from-report-generation-in-r-drake-package - Work around mysterious
codetools
failures in R 3.3 (add atryCatch()
statement infind_globals()
).
New features
- Add a proper
clustermq
-based parallel backend:make(parallelism = "clustermq")
. evaluate_plan(trace = TRUE)
now adds a*_from
column to show the origins of the evaluated targets. Tryevaluate_plan(drake_plan(x = rnorm(n__), y = rexp(n__)), wildcard = "n__", values = 1:2, trace = TRUE)
.- Add functions
gather_by()
andreduce_by()
, which gather on custom columns in the plan (or columns generated byevaluate_plan(trace = TRUE)
) and append the new targets to the previous plan. - Expose the
template
argument ofclustermq
functions (e.g.Q()
andworkers()
) as an argument ofmake()
anddrake_config()
. - Add a new
code_to_plan()
function to turn R scripts and R Markdown reports into workflow plan data frames. - Add a new
drake_plan_source()
function, which generates lines of code for adrake_plan()
call. Thisdrake_plan()
call produces the plan passed todrake_plan_source()
. The main purpose is visual inspection (we even have syntax highlighting viaprettycode
) but users may also save the output to a script file for the sake of reproducibility or simple reference. - Deprecate
deps_targets()
in favor of a newdeps_target()
function (singular) that behaves more likedeps_code()
.
Enhancements
- Smooth the edges in
vis_drake_graph()
andrender_drake_graph()
. - Make hover text slightly more readable in in
vis_drake_graph()
andrender_drake_graph()
. - Align hover text properly in
vis_drake_graph()
using the "title" node column. - Optionally collapse nodes into clusters with
vis_drake_graph(collapse = TRUE)
. - Improve
dependency_profile()
show major trigger hashes side-by-side
to tell the user if the command, a dependency, an input file, or an ouptut file changed since the lastmake()
. - Choose more appropriate places to check that the
txtq
package is installed. - Improve the help files of
loadd()
andreadd()
, giving specific usage guidance in prose. - Memoize all the steps of
build_drake_graph()
and print to the console the ones that execute. - Skip some tests if
txtq
is not installed.