Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird errors and warnings when creating drake plan using a function. #1237

Closed
2 of 3 tasks
januz opened this issue Apr 9, 2020 · 3 comments
Closed
2 of 3 tasks

Weird errors and warnings when creating drake plan using a function. #1237

januz opened this issue Apr 9, 2020 · 3 comments

Comments

@januz
Copy link

januz commented Apr 9, 2020

Prework

  • Read and abide by drake's code of conduct.
  • Search for duplicates among the existing issues, both open and closed.
  • If you think your issue has a quick and definite solution, consider posting to Stack Overflow under the drake-r-package tag. (If you anticipate extended follow-up and discussion, you are already in the right place!)

Description

I want to package my analysis workflow using drake as a research compendium / R package like in this example. This approach (at least so far) involves defining the drake plan in a function. While this worked well with earlier versions of drake, I now run into the following errors/problems:

  1. when using strings in knitr_in() and file_out() in a call to rmarkdown::render(), I get the error:
file_out() files in imported functions are illegal. Detected files: report.md
  1. when instead using variables, I get the warning:
1: Detected knitr_in(!!file_rmd). File paths in file_in(), file_out(), and knitr_in() must be literal strings, not variables. For example, file_in("file1.csv", "file2.csv") is legal, but file_in(paste0(filename_variable, ".csv")) is not. Details: https://books.ropensci.org/drake/plans.html#static-files 
2: Detected file_out(!!file_md). File paths in file_in(), file_out(), and knitr_in() must be literal strings, not variables. For example, file_in("file1.csv", "file2.csv") is legal, but file_in(paste0(filename_variable, ".csv")) is not. Details: https://books.ropensci.org/drake/plans.html#static-files 

Reproducible example

Based on drake's mtcars example,

  1. using strings
random_rows <- function(data, n) {
  data[sample.int(n = nrow(data), size = n, replace = TRUE), ]
}

simulate <- function(n) {
  data <- random_rows(data = datasets::mtcars, n = n)
  data.frame(
    x = data$wt,
    y = data$mpg
  )
}

reg1 <- function(d) {
  lm(y ~ + x, data = d)
}

reg2 <- function(d) {
  d$x2 <- d$x ^ 2
  lm(y ~ x2, data = d)
}

get_plan <- function() {
  drake::drake_plan(
    report = rmarkdown::render(
      input = knitr_in("report.Rmd"),
      output_file = file_out("report.md"),
      output_dir = ".",
      quiet = TRUE
    ),
    small = simulate(48),
    large = simulate(64),
    regression1_small = reg1(small),
    regression1_large = reg1(large),
    regression2_small = reg2(small),
    regression2_large = reg2(large),
    summ_regression1_small =
      suppressWarnings(summary(regression1_small$residuals)),
    summ_regression1_large =
      suppressWarnings(summary(regression1_large$residuals)),
    summ_regression2_small =
      suppressWarnings(summary(regression2_small$residuals)),
    summ_regression2_large =
      suppressWarnings(summary(regression2_large$residuals)),
    coef_regression1_small =
      suppressWarnings(summary(regression1_small))$coefficients,
    coef_regression1_large =
      suppressWarnings(summary(regression1_large))$coefficients,
    coef_regression2_small =
      suppressWarnings(summary(regression2_small))$coefficients,
    coef_regression2_large =
      suppressWarnings(summary(regression2_large))$coefficients
  )
}

my_plan <- get_plan()
drake::make(my_plan)
  1. using variables
random_rows <- function(data, n) {
  data[sample.int(n = nrow(data), size = n, replace = TRUE), ]
}

simulate <- function(n) {
  data <- random_rows(data = datasets::mtcars, n = n)
  data.frame(
    x = data$wt,
    y = data$mpg
  )
}

reg1 <- function(d) {
  lm(y ~ + x, data = d)
}

reg2 <- function(d) {
  d$x2 <- d$x ^ 2
  lm(y ~ x2, data = d)
}

get_plan <- function() {
  file_rmd <- "report.Rmd"
  file_md <- "report.md"

  drake::drake_plan(
    report = rmarkdown::render(
      input = knitr_in(!!file_rmd),
      output_file = file_out(!!file_md),
      output_dir = ".",
      quiet = TRUE
    ),
    small = simulate(48),
    large = simulate(64),
    regression1_small = reg1(small),
    regression1_large = reg1(large),
    regression2_small = reg2(small),
    regression2_large = reg2(large),
    summ_regression1_small =
      suppressWarnings(summary(regression1_small$residuals)),
    summ_regression1_large =
      suppressWarnings(summary(regression1_large$residuals)),
    summ_regression2_small =
      suppressWarnings(summary(regression2_small$residuals)),
    summ_regression2_large =
      suppressWarnings(summary(regression2_large$residuals)),
    coef_regression1_small =
      suppressWarnings(summary(regression1_small))$coefficients,
    coef_regression1_large =
      suppressWarnings(summary(regression1_large))$coefficients,
    coef_regression2_small =
      suppressWarnings(summary(regression2_small))$coefficients,
    coef_regression2_large =
      suppressWarnings(summary(regression2_large))$coefficients
  )
}

my_plan <- get_plan()
drake::make(my_plan)

Desired result

When defining my_plan directly instead of by calling a function, I don't get an error/warning. I expected the same behavior when using a function.

Session info

R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] drake_7.12.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4        txtq_0.2.0        prettyunits_1.1.1 crayon_1.3.4      digest_0.6.25     R6_2.4.1         
 [7] backports_1.1.6   storr_1.2.1       magrittr_1.5      evaluate_0.14     stringi_1.4.6     rlang_0.4.5      
[13] progress_1.2.2    renv_0.6.0-125    filelock_1.0.2    vctrs_0.2.4       rmarkdown_2.1     tools_3.6.0      
[19] stringr_1.4.0     igraph_1.2.5      hms_0.5.3         yaml_2.2.1        xfun_0.12         parallel_3.6.0   
[25] compiler_3.6.0    pkgconfig_2.0.3   base64url_1.4     htmltools_0.4.0   knitr_1.28       
@wlandau
Copy link
Member

wlandau commented Apr 9, 2020

When drake analyzes get_plan() for dependencies, it sees file_out() and knitr_in(), but it does not look for drake_plan() specifically. So it thinks get_plan() just an ordinary function with file_out() and knitr_in() inside. In the general case, file_out() and knitr_in() are not supposed to be inside functions because they cause cycles in the dependency graph.

As a workaround, you can wrap the drake_plan() call inside no_deps() to prevent drake from analyzing it.

library(drake)
  
get_plan <- function() {
  drake_plan(
    report = rmarkdown::render(
      input = knitr_in("report.Rmd"),
      output_file = file_out("report.md"),
      output_dir = ".",
      quiet = TRUE
    )
  )
}

deps_code(get_plan)
#> Warning: Could not open report.Rmd to detect dependencies.
#> Error: file_out() files in imported functions are illegal. Detected files:
#>   report.md

get_plan <- function() {
  no_deps(
    drake_plan(
      report = rmarkdown::render(
        input = knitr_in("report.Rmd"),
        output_file = file_out("report.md"),
        output_dir = ".",
        quiet = TRUE
      )
    )
  )
}

deps_code(get_plan)
#> # A tibble: 0 x 2
#> # … with 2 variables: name <chr>, type <chr>

Created on 2020-04-09 by the reprex package (v0.3.0)

@januz
Copy link
Author

januz commented Apr 9, 2020

Thank you for the explanations, @wlandau-lilly! And thank you for including this as a fix in the latest Github version. Just so I understand a little better—including no_deps() does not alter the within-workflow dependency detection, it's only needed to prevent drake from analyzing get_plan()? So the plan resulting from using the function is identical to defining the plan outside of a function without no_deps()? Thank you!

@wlandau
Copy link
Member

wlandau commented Apr 10, 2020

Yes, you will get the same plan and the same dependency structure in your case. (Unless you call get_plan() inside one of the commands of your targets, which I do not recommend.) But as always, I recommend checking vis_drake_graph() so you are totally confident on the dependency graph.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants