If a dynamic target fails, can I avoid remaking those subtargets that succeeded? #1209

psadil · 2020-03-09T17:52:10Z

Prework

Read and abide by drake's code of conduct.
Search for duplicates among the existing issues, both open and closed.
If you think your question has a quick and definite answer, consider posting to Stack Overflow under the drake-r-package tag. (If you anticipate extended follow-up and discussion, you are already in the right place!)

Question

I'm working on a plan in which a dynamic target sometimes doesn't finish (e.g., because of an error in a subtarget). This dynamic target has many subtargets, and drake often makes many of those subtargets successfully. When rerunning the plan, is it possible to avoid remaking the already made subtargets?

The example below was generated with reprex, but -- since I didn't figure out how to automatically stop the second subtarget -- getting the outputs requires manually running the code and then stopping it while it's processing the second subtarget. Please let me know if that's confusing.

library(drake)

foo <- function(num) {
  print("I'm running...")
  out <- num
  if(num > 1) {
    print("if first run, user should cancel!")
    Sys.sleep(5)
  } 
  return(out)
} 

plan <- drake_plan(
  numbers = seq_len(2),
  result = target(
    foo(numbers), 
    dynamic = map(numbers))
)

make(plan, seed = 123)
#> ▶ target numbers
#> ▶ dynamic result
#> > subtarget result_0b3474bd
#> [1] "I'm running..."
#> > subtarget result_b2a5c9b8
#> [1] "I'm running..."
#> [1] "if first run, user should cancel!"

# but now cancel (e.g., ctrl+c, stop button in Rstudio, restart computer)

# the first subtarget is in the cache
cached()
#> [1] "numbers"         "result_0b3474bd"
# and can be readd
readd(cached()[2], character_only = TRUE)
#> [1] 1 

# but it looks like it's remade when trying again
make(plan, seed = 123)
#> ▶ target numbers
#> ▶ dynamic result
#> > subtarget result_0b3474bd
#> [1] "I'm running..."
#> > subtarget result_b2a5c9b8
#> [1] "I'm running..."
#> [1] "if first run, user should cancel!"
#> ■ finalize result

^{Created on 2020-03-09 by the reprex package (v0.3.0)}

Session info

devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.1 (2019-07-05)
#>  os       Ubuntu 18.04.4 LTS          
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       America/New_York            
#>  date     2020-03-09                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version     date       lib source                         
#>  assertthat    0.2.1       2019-03-21 [1] CRAN (R 3.6.0)                 
#>  backports     1.1.5       2019-10-02 [1] CRAN (R 3.6.1)                 
#>  base64url     1.4         2018-05-14 [1] CRAN (R 3.6.0)                 
#>  callr         3.4.2       2020-02-12 [1] CRAN (R 3.6.1)                 
#>  cli           2.0.2       2020-02-28 [1] CRAN (R 3.6.1)                 
#>  crayon        1.3.4       2017-09-16 [1] CRAN (R 3.6.0)                 
#>  desc          1.2.0       2018-05-01 [1] CRAN (R 3.6.0)                 
#>  devtools      2.2.2       2020-02-17 [1] CRAN (R 3.6.1)                 
#>  digest        0.6.25      2020-02-23 [1] CRAN (R 3.6.1)                 
#>  drake       * 7.11.0.9000 2020-03-05 [1] Github (ropensci/drake@a628f6c)
#>  ellipsis      0.3.0       2019-09-20 [1] CRAN (R 3.6.1)                 
#>  evaluate      0.14        2019-05-28 [1] CRAN (R 3.6.0)                 
#>  fansi         0.4.1       2020-01-08 [1] CRAN (R 3.6.1)                 
#>  filelock      1.0.2       2018-10-05 [1] CRAN (R 3.6.1)                 
#>  fs            1.3.2       2020-03-05 [1] CRAN (R 3.6.1)                 
#>  glue          1.3.1       2019-03-12 [1] CRAN (R 3.6.0)                 
#>  highr         0.8         2019-03-20 [1] CRAN (R 3.6.0)                 
#>  hms           0.5.3       2020-01-08 [1] CRAN (R 3.6.1)                 
#>  htmltools     0.4.0       2019-10-04 [1] CRAN (R 3.6.1)                 
#>  igraph        1.2.4.2     2019-11-27 [1] CRAN (R 3.6.1)                 
#>  knitr         1.28        2020-02-06 [1] CRAN (R 3.6.1)                 
#>  magrittr      1.5         2014-11-22 [1] CRAN (R 3.6.0)                 
#>  memoise       1.1.0       2017-04-21 [1] CRAN (R 3.6.0)                 
#>  pillar        1.4.3       2019-12-20 [1] CRAN (R 3.6.1)                 
#>  pkgbuild      1.0.6       2019-10-09 [1] standard (@1.0.6)              
#>  pkgconfig     2.0.3       2019-09-22 [1] standard (@2.0.3)              
#>  pkgload       1.0.2       2018-10-29 [1] CRAN (R 3.6.0)                 
#>  prettyunits   1.1.1       2020-01-24 [1] CRAN (R 3.6.1)                 
#>  processx      3.4.2       2020-02-09 [1] CRAN (R 3.6.1)                 
#>  progress      1.2.2       2019-05-16 [1] CRAN (R 3.6.0)                 
#>  ps            1.3.2       2020-02-13 [1] CRAN (R 3.6.1)                 
#>  R6            2.4.1       2019-11-12 [1] CRAN (R 3.6.1)                 
#>  Rcpp          1.0.3       2019-11-08 [1] standard (@1.0.3)              
#>  remotes       2.1.1       2020-02-15 [1] CRAN (R 3.6.1)                 
#>  rlang         0.4.5       2020-03-01 [1] CRAN (R 3.6.1)                 
#>  rmarkdown     2.1         2020-01-20 [1] CRAN (R 3.6.1)                 
#>  rprojroot     1.3-2       2018-01-03 [1] CRAN (R 3.6.0)                 
#>  sessioninfo   1.1.1       2018-11-05 [1] CRAN (R 3.6.0)                 
#>  storr         1.2.1       2018-10-18 [1] CRAN (R 3.6.0)                 
#>  stringi       1.4.6       2020-02-17 [1] CRAN (R 3.6.1)                 
#>  stringr       1.4.0       2019-02-10 [1] CRAN (R 3.6.0)                 
#>  testthat      2.3.2       2020-03-02 [1] CRAN (R 3.6.1)                 
#>  tibble        2.1.3       2019-06-06 [1] CRAN (R 3.6.0)                 
#>  txtq          0.2.0       2019-10-15 [1] CRAN (R 3.6.1)                 
#>  usethis       1.5.1       2019-07-04 [1] CRAN (R 3.6.0)                 
#>  vctrs         0.2.3       2020-02-20 [1] CRAN (R 3.6.1)                 
#>  withr         2.1.2       2018-03-15 [1] CRAN (R 3.6.0)                 
#>  xfun          0.12        2020-01-13 [1] CRAN (R 3.6.1)                 
#>  yaml          2.2.1       2020-02-01 [1] CRAN (R 3.6.1)                 
#> 
#> [1] /home/psadil/R/x86_64-pc-linux-gnu-library/3.6
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library

Thanks!

The text was updated successfully, but these errors were encountered:

wlandau · 2020-03-09T20:17:02Z

This is happening because drake takes shortcuts to check sub-targets. For the sake of speed, drake checks triggers for the whole dynamic target rather than each sub-target individually. That means in order to skip sub-targets, the metadata for result must already exist in the cache, which unfortunately does not happen until all the sub-targets run at least once. I will need to think more on what we can do about this. (Maybe we can store the metadata early?)

For now, you can select keep_going = TRUE so the dynamic target gets finalized and future make()s can skip sub-targets. (See below.)

Your issues are extremely helpful! They are identifying huge problems in drake I did not even know existed. Please continue to post them as they arise.

library(drake)
  
foo <- function(num) {
  if(num > 1) {
    print("num is too high!")
    stop("num is too high!")
    Sys.sleep(2)
  } else {
    print("num is low enough.")
  }
  num
} 

plan <- drake_plan(
  numbers = seq_len(2),
  result = target(
    foo(numbers), 
    dynamic = map(numbers)
  )
)

make(plan, keep_going = TRUE)
#> ▶ target numbers
#> ▶ dynamic result
#> > subtarget result_0b3474bd
#> [1] "num is low enough."
#> > subtarget result_b2a5c9b8
#> [1] "num is too high!"
#> x fail result_b2a5c9b8
#> ■ finalize result

make(plan, keep_going = TRUE)
#> ▶ dynamic result
#> > subtarget result_b2a5c9b8
#> [1] "num is too high!"
#> x fail result_b2a5c9b8
#> ■ finalize result

^{Created on 2020-03-09 by the reprex package (v0.3.0)}

psadil · 2020-03-10T00:01:46Z

That sounds good. Thanks for the clarification (and again the package/support!). In the meantime, that keep_going = TRUE tip will be pretty helpful.

wlandau · 2020-03-10T19:19:04Z

Implementation strategy

It is not enough to simply store the dynamic target's metadata ahead of time. On its own, that would falsely validate sub-targets that already exist. We also need to encode a representation of the dynamic target's trigger state into the names of the sub-targets (similar to the recovery key). With all that in place, we should be able to make the correct build decisions without the computational burden of checking every sub-target's metadata list.

wlandau · 2020-03-10T19:23:12Z

Unfortunately, #1209 (comment) will invalidate everyone's sub-targets. Still, the long-run time savings are worth it.

wlandau · 2020-03-11T13:42:00Z

Come to think of it, #1209 (comment) has an even bigger problem: it throws out the user's choice of custom triggers. So that's out of the question now.

I do not like it, but I think we will have to go with some kind of ad hoc tracking mechanism to make this happen.

Dynamic branching, dynamic files, triggers, data recovery, high-performance computing: together, these features have turned out to seriously exacerbate drake's conceptual complexity. It will take a lot of refactoring to streamline things out again and some careful planning and testing to fix this issue.

wlandau · 2020-03-11T13:52:39Z

For this issue, we might be able to leverage data recovery somehow. make(recover = TRUE) does not currently recover sub-targets in this case, but maybe it could. After that, we could think about making recover = TRUE the default.

wlandau · 2020-03-11T13:59:34Z

The downside is that we would be repurposing a system design that was not exactly designed for the task. Also, It requires checking sub-target metadata, which could be slow.

wlandau · 2020-03-11T14:27:45Z

So we need a new tracking mechanism. Again, I do not like the extra conceptual complexity, but it seems like the only way keep drake running fast without interfering with other parts of the design. Proposal:

When a sub-targets completes successfully, make a note of it in an ad hoc storr namespace (a "dynamic progress namespace"). The namespace name should begin with the prefix "dyn-" and contain the name and ~~recovery key~~ (a special key that only considers the triggers you are actually using) of the parent target. For speed, we should only store keys, not data. (We already do for set_progress().)
If the parent target gets the chance to finalize, delete the whole namespace, as well as any other namespaces that start with "dyn-<target_name>-". Note the hyphens here. Hyphens are illegal in target names, and that prevents us from accidentally removing another target's dynamic progress namespace.
Suppose we call make() again and the parent did not finalize last time. Then, there should be keys in the dynamic progress namespace. If the static dependencies remain unchanged (and if the condition trigger is not activated) we should be able to use those keys to avoid registering sub-targets that succeeded before.

wlandau · 2020-03-11T15:05:39Z

Important to note: the proposal above does not invalidate everyone's dynamic targets!

wlandau · 2020-03-12T01:52:41Z

Another advantage of #1209 (comment) over #1209 (comment) is that if we went with the latter, the number of storr keys would explode. As it is, dynamic branching leaves a lot of unused keys behind, which users should regularly clean out with clean(list = cached_unplanned(plan)). (I need to make that more obvious.)

kendonB · 2020-03-12T02:50:58Z

e72a3f7 seems to work! Nice

wlandau · 2020-03-12T02:53:37Z

Glad to hear it. I actually just realized we need the full recovery key after all, so please hold off until I submit and merge a PR before you use it for serious work.

wlandau · 2020-03-12T04:42:11Z

On second thought, let's revert c60ce32 to keep everyone's targets valid. That bug is annoying but does no tangible harm.

wlandau · 2020-03-23T14:43:44Z

Rethinking the implementation here. Because of richfitz/storr#121, the ad hoc storr namespaces do not really go away. Yes, we clear them, but the folders are still there, and they could add up.

drake_cache()$list_namespaces()
[1] "dyn-y-926684c7" "dyn-y-d8df0a05" "dyn-y-f4c3c1c6" "memoize"        "meta"          
[6] "objects"        "progress"       "recover"        "session"

What we need is a single namespace and more descriptive keys.

wlandau-lilly · 2020-03-23T14:50:53Z

On reflection, a single namespace would decrease performance. Let's just remove the folders in the special case of RDS storrs. The problem will resolve itself after richfitz/storr#122 is merged.

wlandau-lilly · 2020-03-23T15:10:20Z

Need to revert caf6084 after richfitz/storr#122 is merged and the new storr is on CRAN. caf6084 does the highest-performant thing, but it assumes knowledge of the internal file structure of RDS storrs, which is not ideal if storr's internals change. (But I do trust Rich to preserve back compatibility.)

ercbk · 2020-04-29T20:53:54Z

I'm having this problem. Are you still waiting a storr version to be on CRAN in order to merge the fix? I tried drake 7.12.0.9000, but I'm still having to re-build subtargets.

wlandau · 2020-04-30T11:56:12Z

That's odd, and it should not be happening in 7.12.0.9000. I tested the patch pretty aggressively, and tests are still passing on my end. Would you post a reprex so I can take a look?

drake/tests/testthat/test-9-dynamic.R

Lines 2168 to 2386 in 935f95a

    
           test_with_dir("parent not finalized, sub-targets stay up to date (#1209)", { 
        
             skip_on_cran() 
        
             plan <- drake_plan( 
        
               numbers = seq(0L, 2L), 
        
               result = target(stopifnot(numbers <= 1L), dynamic = map(numbers)) 
        
             ) 
        
             expect_error(make(plan)) 
        
             config <- drake_config(plan) 
        
             jb <- justbuilt(config) 
        
             expect_equal(length(jb), 3L) 
        
             namespace <- drake_meta_("result", config)$dynamic_progress_namespace 
        
             jb2 <- config$cache$list(namespace = namespace) 
        
             expect_equal(sort(setdiff(jb, "numbers")), sort(jb2)) 
        
             expect_error(make(plan)) 
        
             config <- drake_config(plan) 
        
             expect_equal(length(justbuilt(config)), 0L) 
        
             jb2 <- config$cache$list(namespace = namespace) 
        
             expect_equal(sort(setdiff(jb, "numbers")), sort(jb2)) 
        
             plan <- drake_plan( 
        
               numbers = seq(0L, 2L), 
        
               result = target(stopifnot(numbers <= 999L), dynamic = map(numbers)) 
        
             ) 
        
             make(plan) 
        
             expect_equal(length(justbuilt(config)), 4L) 
        
             expect_true(all(jb2 %in% justbuilt(config))) 
        
           }) 
        
           test_with_dir("un-finalized sub-targets and cmd trigger (#1209)", { 
        
             skip_on_cran() 
        
             plan <- drake_plan( 
        
               numbers = seq(0L, 2L), 
        
               result = target(stopifnot(numbers <= 1L), dynamic = map(numbers)) 
        
             ) 
        
             config <- drake_config(plan) 
        
             expect_error(make(plan)) 
        
             expect_equal(length(justbuilt(config)), 3L) 
        
             # trigger activation 
        
             plan <- drake_plan( 
        
               numbers = seq(0L, 2L), 
        
               result = target(stopifnot(numbers <= 1.1), dynamic = map(numbers)) 
        
             ) 
        
             expect_error(make(plan)) 
        
             expect_equal(length(justbuilt(config)), 2L) 
        
             # change of trigger 
        
             expect_error(make(plan, trigger = trigger(command = FALSE))) 
        
             expect_equal(length(justbuilt(config)), 2L) 
        
             expect_error(make(plan, trigger = trigger(command = FALSE))) 
        
             expect_equal(length(justbuilt(config)), 0L) 
        
             # trigger suppression 
        
             plan <- drake_plan( 
        
               numbers = seq(0L, 2L), 
        
               result = target(stopifnot(numbers <= 999L), dynamic = map(numbers)) 
        
             ) 
        
             make(plan, trigger = trigger(command = FALSE)) 
        
             expect_equal(length(justbuilt(config)), 2L) 
        
             expect_true("result" %in% justbuilt(config)) 
        
           }) 
        
           test_with_dir("un-finalized sub-targets, seed trigger (#1209)", { 
        
             skip_on_cran() 
        
             # trigger suppression 
        
             plan <- drake_plan( 
        
               numbers = seq(0L, 2L), 
        
               result = target(stopifnot(numbers <= 1L), dynamic = map(numbers)) 
        
             ) 
        
             config <- drake_config(plan) 
        
             expect_error(make(plan)) 
        
             expect_error(make(plan, trigger = trigger(seed = FALSE))) 
        
             expect_equal(length(justbuilt(config)), 2L) 
        
             expect_error(make(plan)) 
        
             # trigger activation 
        
             plan <- drake_plan( 
        
               numbers = seq(0L, 2L), 
        
               result = target( 
        
                 stopifnot(numbers <= 1L), 
        
                 dynamic = map(numbers), 
        
                 seed = -9999 
        
               ) 
        
             ) 
        
             expect_error(make(plan)) 
        
             expect_equal(length(justbuilt(config)), 2L) 
        
           }) 
        
           test_with_dir("un-finalized sub-targets, format trigger (#1209)", { 
        
             skip_on_cran() 
        
             skip_if_not_installed("qs") 
        
             # trigger suppression 
        
             plan <- drake_plan( 
        
               numbers = seq(0L, 2L), 
        
               result = target(stopifnot(numbers <= 1L), dynamic = map(numbers)) 
        
             ) 
        
             config <- drake_config(plan) 
        
             expect_error(make(plan)) 
        
             expect_error(make(plan, trigger = trigger(seed = FALSE))) 
        
             expect_equal(length(justbuilt(config)), 2L) 
        
             # trigger activation 
        
             expect_error(make(plan)) 
        
             plan <- drake_plan( 
        
               numbers = seq(0L, 2L), 
        
               result = target( 
        
                 stopifnot(numbers <= 1L), 
        
                 dynamic = map(numbers), 
        
                 format = "qs" 
        
               ) 
        
             ) 
        
             expect_error(make(plan)) 
        
             expect_equal(length(justbuilt(config)), 2L) 
        
           }) 
        
           test_with_dir("dynamic_progress_prekey() default (#1209)", { 
        
             skip_on_cran() 
        
             z <- 1 
        
             nums <- seq(0L, 2L) 
        
             plan <- drake_plan( 
        
               result = target( 
        
                 stopifnot(nums + z <= 1L), 
        
                 dynamic = map(nums) 
        
               ) 
        
             ) 
        
             config <- drake_config(plan) 
        
             meta <- drake_meta_("result", config) 
        
             x <- dynamic_progress_prekey("result", meta, config) 
        
             expect_true(is.na(x$change_hash)) 
        
             x$change_hash <- "blank" 
        
             expect_false(any(is.na(x))) 
        
             chr <- nchar(as.character(x)) 
        
             expect_equal(sum(chr < 1L), 2L) 
        
           }) 
        
           test_with_dir("dynamic_progress_prekey() suppressed (#1209)", { 
        
             skip_on_cran() 
        
             z <- 1 
        
             nums <- seq(0L, 2L) 
        
             plan <- drake_plan( 
        
               result = target( 
        
                 stopifnot(nums + z <= 1L), 
        
                 dynamic = map(nums), 
        
                 trigger = trigger( 
        
                   command = FALSE, 
        
                   depend = FALSE, 
        
                   file = FALSE, 
        
                   seed = FALSE, 
        
                   format = FALSE 
        
                 ) 
        
               ) 
        
             ) 
        
             config <- drake_config(plan) 
        
             meta <- drake_meta_("result", config) 
        
             x <- dynamic_progress_prekey("result", meta, config) 
        
             ns <- setdiff(names(x), c("mode", "condition")) 
        
             for (n in ns) { 
        
               expect_true(is.na(x[[n]])) 
        
             } 
        
           }) 
        
           test_with_dir("dynamic_progress_prekey() special (#1209)", { 
        
             skip_on_cran() 
        
             skip_if_not_installed("fst") 
        
             numbers <- seq(0L, 2L) 
        
             y2 <- 123 
        
             z <- 1 
        
             file.create("x") 
        
             file.create("y") 
        
             plan <- drake_plan( 
        
               result = target({ 
        
                 file_in("x") 
        
                 stopifnot(numbers + z <= 1L) 
        
                 }, 
        
                 trigger = trigger( 
        
                   condition = x + 1, 
        
                   mode = "blacklist", 
        
                   change = y2 
        
                 ), 
        
                 format = "fst", 
        
                 dynamic = map(numbers) 
        
               ) 
        
             ) 
        
             config <- drake_config(plan) 
        
             meta <- drake_meta_("result", config) 
        
             x <- dynamic_progress_prekey("result", meta, config) 
        
             expect_false(any(is.na(x))) 
        
             chr <- nchar(as.character(x)) 
        
             expect_equal(sum(chr < 1L), 1L) 
        
           }) 
        
           test_with_dir("ad hoc RDS storr namespace folders are removed (#1209)", { 
        
             skip_on_cran() 
        
             plan <- drake_plan(x = 1:2, y = target(stopifnot(x < 1.5), dynamic = map(x))) 
        
             cache <- storr::storr_rds(tempfile()) 
        
             expect_error(make(plan, cache = cache)) 
        
             config <- drake_config(plan, cache = cache) 
        
             expect_equal(length(justbuilt(config)), 2) 
        
             expect_true(any(grepl("dyn-y-", cache$list_namespaces()))) 
        
             ns <- grep("dyn-y-", cache$list_namespaces(), value = TRUE) 
        
             keys <- cache$list(ns) 
        
             expect_equal(length(keys), 1) 
        
             plan <- drake_plan(x = 1:2, y = target(x, dynamic = map(x))) 
        
             make(plan, cache = cache) 
        
             expect_false(any(grepl("dyn-y-", cache$list_namespaces()))) 
        
             keys <- cache$list(ns) 
        
             expect_equal(length(keys), 0) 
        
           }) 
        
           test_with_dir("ad hoc namespaces and non-RDS storrs (#1209)", { 
        
             skip_on_cran() 
        
             plan <- drake_plan(x = 1:2, y = target(stopifnot(x < 1.5), dynamic = map(x))) 
        
             cache <- storr::storr_environment() 
        
             expect_error(make(plan, cache = cache)) 
        
             config <- drake_config(plan, cache = cache) 
        
             expect_equal(length(justbuilt(config)), 2) 
        
             expect_true(any(grepl("dyn-y-", cache$list_namespaces()))) 
        
             ns <- grep("dyn-y-", cache$list_namespaces(), value = TRUE) 
        
             keys <- cache$list(ns) 
        
             expect_equal(length(keys), 1) 
        
             plan <- drake_plan(x = 1:2, y = target(x, dynamic = map(x))) 
        
             make(plan, cache = cache) 
        
             keys <- cache$list(ns) 
        
             expect_equal(length(keys), 0) 
        
           })

ercbk · 2020-04-30T13:42:44Z

I'm not sure how to simulate my target failure. I'm actually planning on creating an issue on the failure later on. I wanted to see if the target would finish though. It might narrow down the cause possibilities.
I can point you to the repo and I can push the .drake folder. I tried the example above and that works as expected. Any suggestions on how to go about this?

wlandau · 2020-04-30T17:01:34Z

It is best if we narrow down the exact conditions/behaviors that lead to incorrectly rebuilding sub-targets. Knowing the plan is a good start, but we also need to know when you are make()ing it and what you are changing in between make()s. The cause might not be #1209. Also, how small and fast can you make your plan and still reproduce the issue?

wlandau · 2020-04-30T19:45:29Z

And for what it's worth, r_make() resolves much of the brittleness of make(). I recommend having a look at https://books.ropensci.org/drake/projects.html#safer-interactivity.

ercbk · 2020-04-30T20:49:26Z

Regarding "when", are you talking about the actual times?
My sequence of changes has gone something like this:

I had this portion of my project in a local, unversioned directory while I got drake working. The primary targets of the drake_plan get repeated for each sample size. So eventually, the targets will be repeated 4 times for n = 100, 800, 2000, and 5000. The plan ran fine for n = 100, n = 800, so I moved it to the versioned directory.
The versioned directory is a separate renv environment. I ran make and got the "progress_bar" error. I thought the error might have something to do with a couple of the future_map .progress arguments I had in the code (even though I had them set to FALSE), so I removed them. Of course, that had nothing to do with it. I read the issue; installed the progress package (and maybe some other package I forget); and it ran, but that triggered a rebuild of targets unfortunately.
n=800 targets were built but it failed on the transition to the n =2000, ncv_results_2000 target. Even though the build fails, the instances still run the code, so I terminate the instances and start new ones. I re-run make, and the first 2 n =2000 subtargets get built. The 3rd subtarget fails.
I've since attempted to start make with some new arguments to try and deal with the fail, but I'm pretty sure I haven't changed any of the other code in the functions or plan (update: I've check commits local and remote, and I haven't changed anything else). Now, when I try and restart though, it tries to re-build the first subtarget. I think I tried to run it with the previous make arguments and it still tried to re-build the first target.
I, also, added the timeout argument to makeClusterPSOCK, added another setting in a saved PuTTY session, and the ips have changed of course, but I wouldn't think any of that would affect what's going on with the subtargets.

Here's the new make with the added arguments:

make(
      plan,
      verbose = 1,
      session_info = FALSE,
      retries = 2,
      lock_envir = FALSE,
      history = FALSE,
      log_progress = FALSE,
      jobs_preprocess = 7
)

and a pic of that last fail
https://www.dropbox.com/s/ryx92zaq3vlqgxx/Screenshot%20%2853%29.png?dl=0

Need to build a couple charts for another project, but I'll look into r_make either later tonight or tomorrow morning.

current session info

- Session info ------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.6.2 (2019-12-12)
 os       Windows 10 x64              
 system   x86_64, mingw32             
 ui       RStudio                     
 language (EN)                        
 collate  English_United States.1252  
 ctype    English_United States.1252  
 tz       America/New_York            
 date     2020-04-30                  

- Packages ----------------------------------------------------------------------------------------------
 package     * version     date       lib source                         
 assertthat    0.2.1       2019-03-21 [1] CRAN (R 3.6.2)                 
 backports     1.1.6       2020-04-05 [1] CRAN (R 3.6.3)                 
 base64url     1.4         2018-05-14 [1] CRAN (R 3.6.3)                 
 cli           2.0.1       2020-01-08 [1] CRAN (R 3.6.2)                 
 clipr         0.7.0       2019-07-23 [1] CRAN (R 3.6.2)                 
 codetools     0.2-16      2018-12-24 [1] CRAN (R 3.6.2)                 
 crayon        1.3.4       2017-09-16 [1] CRAN (R 3.6.2)                 
 data.table  * 1.12.8      2019-12-09 [1] CRAN (R 3.6.2)                 
 desc          1.2.0       2018-05-01 [1] CRAN (R 3.6.1)                 
 details     * 0.2.1       2020-01-12 [1] CRAN (R 3.6.3)                 
 digest        0.6.25      2020-02-23 [1] CRAN (R 3.6.2)                 
 dplyr       * 0.8.4       2020-01-31 [1] CRAN (R 3.6.2)                 
 drake       * 7.12.0.9000 2020-04-29 [1] Github (ropensci/drake@935f95a)
 dtplyr      * 1.0.1       2020-01-23 [1] CRAN (R 3.6.2)                 
 fansi         0.4.1       2020-01-08 [1] CRAN (R 3.6.2)                 
 filelock      1.0.2       2018-10-05 [1] CRAN (R 3.6.3)                 
 furrr       * 0.1.0       2018-05-16 [1] CRAN (R 3.6.1)                 
 future      * 1.16.0      2020-01-16 [1] CRAN (R 3.6.2)                 
 globals       0.12.5      2019-12-07 [1] CRAN (R 3.6.1)                 
 glue          1.4.0       2020-04-03 [1] CRAN (R 3.6.2)                 
 hms           0.5.3       2020-01-08 [1] CRAN (R 3.6.2)                 
 httr          1.4.1       2019-08-05 [1] CRAN (R 3.6.1)                 
 igraph        1.2.5       2020-03-19 [1] CRAN (R 3.6.3)                 
 jsonlite      1.6.1       2020-02-02 [1] CRAN (R 3.6.2)                 
 knitr         1.28        2020-02-06 [1] CRAN (R 3.6.2)                 
 listenv       0.8.0       2019-12-05 [1] CRAN (R 3.6.2)                 
 magrittr      1.5         2014-11-22 [1] CRAN (R 3.6.2)                 
 packrat       0.5.0       2018-11-14 [1] CRAN (R 3.6.1)                 
 pacman        0.5.1       2019-03-11 [1] CRAN (R 3.6.2)                 
 pillar        1.4.3       2019-12-20 [1] CRAN (R 3.6.2)                 
 pkgconfig     2.0.3       2019-09-22 [1] CRAN (R 3.6.2)                 
 png           0.1-7       2013-12-03 [1] CRAN (R 3.6.0)                 
 prettyunits   1.1.1       2020-01-24 [1] CRAN (R 3.6.2)                 
 progress      1.2.2       2019-05-16 [1] CRAN (R 3.6.1)                 
 purrr         0.3.3       2019-10-18 [1] CRAN (R 3.6.2)                 
 R6            2.4.1       2019-11-12 [1] CRAN (R 3.6.2)                 
 Rcpp          1.0.3       2019-11-08 [1] CRAN (R 3.6.2)                 
 renv          0.9.3-30    2020-02-22 [1] Github (rstudio/renv@916923a)  
 reticulate    1.14        2019-12-17 [1] CRAN (R 3.6.2)                 
 rlang         0.4.5       2020-03-01 [1] CRAN (R 3.6.3)                 
 rprojroot     1.3-2       2018-01-03 [1] CRAN (R 3.6.1)                 
 rstudioapi    0.11        2020-02-07 [1] CRAN (R 3.6.2)                 
 sessioninfo   1.1.1       2018-11-05 [1] CRAN (R 3.6.2)                 
 storr         1.2.1       2018-10-18 [1] CRAN (R 3.6.3)                 
 tibble        2.1.3       2019-06-06 [1] CRAN (R 3.6.2)                 
 tidyselect    1.0.0       2020-01-27 [1] CRAN (R 3.6.2)                 
 txtq          0.2.0       2019-10-15 [1] CRAN (R 3.6.3)                 
 vctrs         0.2.4       2020-03-10 [1] CRAN (R 3.6.3)                 
 withr         2.1.2       2018-03-15 [1] CRAN (R 3.6.1)                 
 xfun          0.12        2020-01-13 [1] CRAN (R 3.6.2)                 
 xml2          1.2.2       2019-08-09 [1] CRAN (R 3.6.1)                 

[1] C:/Users/tbats/Documents/R/Projects/nested-cross-validation-comparison/renv/library/R-3.6/x86_64-w64-mingw32
[2] C:/Users/tbats/AppData/Local/Temp/RtmpGMIymj/renv-system-library

wlandau · 2020-05-01T12:23:28Z

Kudos on persevering to this point. Changes to make() arguments should not invalidate targets except for seed and format. Unfortunately, I cannot replicate how you interacted with your project in 1-4, which is why reprexes on downsized examples are helpful.

You can use the deps_profile() function to see what drake thinks about the state of dependencies. If it thinks at least one upstream function or target changed since last time, it will tell you. Here is a reprex to demonstrate.

library(drake)
  
f <- function(x) {
  x + 1
}

plan <- drake_plan(x = 1, y = f(x))

make(plan)
#> ▶ target x
#> ▶ target y

deps_profile(y, plan)
#> # A tibble: 5 x 4
#>   name     changed old                new               
#>   <chr>    <lgl>   <chr>              <chr>             
#> 1 command  FALSE   "7974fa383d985540" "7974fa383d985540"
#> 2 depend   FALSE   "4ce8d6b05dc7e1f9" "4ce8d6b05dc7e1f9"
#> 3 file_in  FALSE   ""                 ""                
#> 4 file_out FALSE   ""                 ""                
#> 5 seed     FALSE   "965938315"        "965938315"

# Change a function.
f <- function(x) {
  x + 2
}

# Register the change in the cache.
make(plan, skip_targets = TRUE)

# The `change` column is now `TRUE` in the `depend` row. 
deps_profile(y, plan)
#> # A tibble: 5 x 4
#>   name     changed old                new               
#>   <chr>    <lgl>   <chr>              <chr>             
#> 1 command  FALSE   "7974fa383d985540" "7974fa383d985540"
#> 2 depend   TRUE    "4ce8d6b05dc7e1f9" "cd6bb9a6ec1eea7a"
#> 3 file_in  FALSE   ""                 ""                
#> 4 file_out FALSE   ""                 ""                
#> 5 seed     FALSE   "965938315"        "965938315"

^{Created on 2020-05-01 by the reprex package (v0.3.0)}

By the way, I see you are using custom PSOCK clusters for parallel computing. drake has built-in high-performance computing, which may be more convenient and could give you more parallel efficiency in some cases: https://books.ropensci.org/drake/hpc.html. Since you are running Linux, it is straightforward to use clustermq's multicore backend. And if you have a computing cluster with a resource manager, even better. Sketch:

library(drake)
# install.packages("clustermq")
options(clustermq.scheduler = "multicore")
make(plan, parallelism = "clustermq", jobs = 4)

ercbk · 2020-05-02T20:45:45Z

Reprex doesn't find anything: files or functions (unless I package::function). I've tried inside the project environment and outside of it with the working directory set the project directory. Don't understand it. It's worked fine before. Doesn't find the plan-kj.R file with source either.

library(drake)

error_FUN <- function(y_obs, y_hat){
      y_obs <- unlist(y_obs)
      y_hat <- unlist(y_hat)
      Metrics::mae(y_obs, y_hat)
}

method <- "kj"
algorithms <- list("glmnet", "rf")
repeats <- seq(1:5)
grid_size <- 100

plan <- drake_plan(
      # model functions for each algorithm
      mod_FUN_list = create_models(algorithms),
      # data used to estimate out-of-sample error
      # noise_sd, seed settings are the defaults
      large_dat = mlbench_data(n = 10^5,
                               noise_sd = 1,
                               seed = 2019),
      # sample size = 100
      sim_dat_100 = mlbench_data(100),
      # hyperparameter grids for each algorithm
      # This probably doesn't need to be a "dynamic" target since mtry is only concerned about the number of columns in data (see script), but I'll do it anyways
      params_list_100 = create_grids(sim_dat_100,
                                     algorithms,
                                     size = grid_size),
      # create a separate ncv data object for each repeat value
      ncv_dat_100 = create_ncv_objects(sim_dat_100,
                                       repeats,
                                       method),
      # runs nested-cv and compares ncv error with out-of-sample error
      # outputs: ncv error, oos error, delta error, chosen algorithm, chosen hyperparameters 
      ncv_results_100 = target(
            run_ncv(ncv_dat_100,
                    sim_dat_100,
                    large_dat,
                    mod_FUN_list,
                    params_list_100,
                    error_FUN,
                    method),
            dynamic = map(ncv_dat_100)
      ),
      # add index columns to identify the results according to sample size and number of repeats
      perf_results_100 = tibble(n = 100, repeats = repeats) %>%
            bind_cols(ncv_results_100),
      
      # repeat for the rest of the sample sizes
      # sample size = 800
      sim_dat_800 = mlbench_data(800),
      params_list_800 = create_grids(sim_dat_800,
                                     algorithms,
                                     size = grid_size),
      ncv_dat_800 = create_ncv_objects(sim_dat_800,
                                       repeats,
                                       method),
      ncv_results_800 = target(
            run_ncv(ncv_dat_800,
                    sim_dat_800,
                    large_dat,
                    mod_FUN_list,
                    params_list_800,
                    error_FUN,
                    method),
            dynamic = map(ncv_dat_800)
      ),
      perf_results_800 = tibble(n = 800, repeats = repeats) %>%
            bind_cols(ncv_results_800),
      
      # sample size = 2000
      sim_dat_2000 = mlbench_data(2000),
      params_list_2000 = create_grids(sim_dat_2000,
                                      algorithms,
                                      size = grid_size),
      ncv_dat_2000 = create_ncv_objects(sim_dat_2000,
                                        repeats,
                                        method),
      ncv_results_2000 = target(
            run_ncv(ncv_dat_2000,
                    sim_dat_2000,
                    large_dat,
                    mod_FUN_list,
                    params_list_2000,
                    error_FUN,
                    method),
            dynamic = map(ncv_dat_2000)
      ),
      perf_results_2000 = tibble(n = 2000, repeats = repeats) %>%
            bind_cols(ncv_results_2000)
      
)


drake::deps_profile(ncv_results_2000, plan)
#> Error in deps_profile_impl(target = ncv_results_2000, config = config): no recorded metadata for target ncv_results_2000.
rlang::last_error()
#> Error: Can't show last error because no error was recorded yet
rlang::last_trace()
#> Error: Can't show last error because no error was recorded yet
deps_profile(ncv_results_2000_7d80d14d, plan)
#> Error in deps_profile_impl(target = ncv_results_2000_7d80d14d, config = config): no recorded metadata for target ncv_results_2000_7d80d14d.
rlang::last_error()
#> Error: Can't show last error because no error was recorded yet
rlang::last_trace()
#> Error: Can't show last error because no error was recorded yet
drake::loadd(ncv_results_2000_7d80d14d)
#> Error in loadd_handle_empty_targets(targets = targets, cache = cache, : object 'ncv_results_2000_7d80d14d' not found

^{Created on 2020-05-02 by the reprex package (v0.3.0)}

Here's what happens when I excute the script myself (starting after plan). Apologize for the readability.

drake::deps_profile(ncv_results_2000, plan)
# Error: Tibble columns must have compatible sizes.
# * Size 5: Existing data.
# * Size 2: Column `old`.
# i Only values of size one are recycled.
# Run `rlang::last_error()` to see where the error occurred.
# In addition: Warning message:
#       In old_values != new_values :
#       longer object length is not a multiple of shorter object length


rlang::last_error()
# <error/tibble_error_incompatible_size>
#       Tibble columns must have compatible sizes.
# * Size 5: Existing data.
# * Size 2: Column `old`.
# i Only values of size one are recycled.
# Backtrace:
#       1. drake::deps_profile(ncv_results_2000, plan)
# 4. drake::deps_profile_impl(target = ncv_results_2000, config = config)
# 5. drake:::weak_tibble(...)
# 6. tibble::tibble(...)
# 7. tibble:::tibble_quos(xs[!is_null], .rows, .name_repair)
# 8. tibble:::vectbl_recycle_rows(res, first_size, j, given_col_names[[j]])
# Run `rlang::last_trace()` to see the full context.


rlang::last_trace()
# <error/tibble_error_incompatible_size>
#       Tibble columns must have compatible sizes.
# * Size 5: Existing data.
# * Size 2: Column `old`.
# i Only values of size one are recycled.
# Backtrace:
#       x
# 1. \-drake::deps_profile(ncv_results_2000, plan)
# 2.   +-base::eval(call)
# 3.   | \-base::eval(call)
# 4.   \-drake::deps_profile_impl(target = ncv_results_2000, config = config)
# 5.     \-drake:::weak_tibble(...)
# 6.       \-tibble::tibble(...)
# 7.         \-tibble:::tibble_quos(xs[!is_null], .rows, .name_repair)
# 8.           \-tibble:::vectbl_recycle_rows(res, first_size, j, given_col_names[[j]])


deps_profile(ncv_results_2000_7d80d14d, plan)
# Error: Tibble columns must have compatible sizes.
# * Size 5: Existing data.
# * Size 2: Column `old`.
# i Only values of size one are recycled.
# Run `rlang::last_error()` to see where the error occurred.
# In addition: Warning message:
#       In old_values != new_values :
#       longer object length is not a multiple of shorter object length


rlang::last_error()
# <error/tibble_error_incompatible_size>
#       Tibble columns must have compatible sizes.
# * Size 5: Existing data.
# * Size 2: Column `old`.
# i Only values of size one are recycled.
# Backtrace:
#       1. drake::deps_profile(ncv_results_2000_7d80d14d, plan)
# 4. drake::deps_profile_impl(...)
# 5. drake:::weak_tibble(...)
# 6. tibble::tibble(...)
# 7. tibble:::tibble_quos(xs[!is_null], .rows, .name_repair)
# 8. tibble:::vectbl_recycle_rows(res, first_size, j, given_col_names[[j]])
# Run `rlang::last_trace()` to see the full context.


rlang::last_trace()
# <error/tibble_error_incompatible_size>
#       Tibble columns must have compatible sizes.
# * Size 5: Existing data.
# * Size 2: Column `old`.
# i Only values of size one are recycled.
# Backtrace:
#       x
# 1. \-drake::deps_profile(ncv_results_2000_7d80d14d, plan)
# 2.   +-base::eval(call)
# 3.   | \-base::eval(call)
# 4.   \-drake::deps_profile_impl(...)
# 5.     \-drake:::weak_tibble(...)
# 6.       \-tibble::tibble(...)
# 7.         \-tibble:::tibble_quos(xs[!is_null], .rows, .name_repair)
# 8.           \-tibble:::vectbl_recycle_rows(res, first_size, j, given_col_names[[j]])


drake::loadd(ncv_results_2000_7d80d14d)
# ncv_results_2000_7d80d14d
# # A tibble: 1 x 7
# method oos_error ncv_error delta_error chosen_algorithm  mtry trees
# <chr>      <dbl>     <dbl>       <dbl> <chr>            <int> <int>
#       1 kj          1.39      1.36      0.0214 rf                   5  1325

drake::loadd(ncv_results_2000)
# Error in loadd_handle_empty_targets(targets = targets, cache = cache,  : 
# object 'ncv_results_2000' not found

ercbk · 2020-05-03T12:51:46Z

Just fyi, my .drake file is a couple layers above my make and plan scripts. Wonder if that's whats messing with the reprex function.

wlandau · 2020-05-04T12:37:30Z

Reprex doesn't find anything: files or functions (unless I package::function). I've tried inside the project environment and outside of it with the working directory set the project directory. Don't understand it. It's worked fine before. Doesn't find the plan-kj.R file with source either.

It may be frustrating, yes, but it would really help to identify and fix the problem if it is possible to whittle down your project into to something that fits into reprex and is easier to understand and run.

Just fyi, my .drake file is a couple layers above my make and plan scripts. Wonder if that's whats messing with the reprex function.

Part of what I am requesting is a script that creates an entirely new .drake cache that recreates the problem. Once we can reproduce it from scratch using automated code, we have a much better shot of figuring out what is going on. Otherwise, it is difficult to speculate on the more human-related ad hoc steps you may have taken to set up your project.

ercbk · 2020-05-04T13:20:43Z

Yeah, I agree. Unfortunately, it's a complicated project that takes days to run on my desktop even for the small datasets. I'll guess I'll try to run the whole thing with very, very small datasets to get the runtime down to something manageable, and see if I can recreate the issue once I get reprex to work. In terms of reducing the complexity of the code, I'm not sure how that can be done here. I'll talk to my duck, though, and see if we can figure out something. :)
Any ideas about what's happening with deps_profile or are the error messages not much help without the reprex?

wlandau · 2020-05-04T15:20:10Z

Any ideas about what's happening with deps_profile or are the error messages not much help without the reprex?

Again, based on what we have to go on right now, I am not sure. But I do have a suspicion, and I just pushed a patch to try to deal with it: 50eb9f5. You might have better luck if you install the update (though I cannot make guarantees).

ercbk · 2020-05-06T02:29:46Z

The good news is that I've figured out how reprex determines its working directory. You have to specify the outfile argument, else it works out of temp directory. Guess I've only used it for simple situations in the past, because I didn't remember that that's how that worked. The bad news is that, if I'm reading this correctly, drake thinks that every target needs rebuilt. Still working on a simplified version of the project. I think I interrupted make twice in row once I saw it was trying to rebuild targets. Wonder if that made things worse.

library(drake)

source("performance-experiment/Kuhn-Johnson/plan-kj.R")

deps_profile(ncv_results_2000, plan)
#> # A tibble: 5 x 4
#>   name     changed old              new               
#>   <chr>    <lgl>   <chr>            <chr>             
#> 1 command  TRUE    4f18907a711e6c41 "d958fb47b0b8f88d"
#> 2 depend   NA      <NA>             "aef603d261217ef0"
#> 3 file_in  NA      <NA>             ""                
#> 4 file_out NA      <NA>             ""                
#> 5 seed     NA      <NA>             "540153646"

deps_profile(ncv_results_2000_7d80d14d, plan)
#> # A tibble: 5 x 4
#>   name     changed old              new               
#>   <chr>    <lgl>   <chr>            <chr>             
#> 1 command  TRUE    4f18907a711e6c41 "ef46db3751d8e999"
#> 2 depend   NA      <NA>             ""                
#> 3 file_in  NA      <NA>             ""                
#> 4 file_out NA      <NA>             ""                
#> 5 seed     FALSE   2136092035       "2136092035"

loadd(ncv_results_2000_7d80d14d)
ncv_results_2000_7d80d14d
#> # A tibble: 1 x 7
#>   method oos_error ncv_error delta_error chosen_algorithm  mtry trees
#>   <chr>      <dbl>     <dbl>       <dbl> <chr>            <int> <int>
#> 1 kj          1.39      1.36      0.0214 rf                   5  1325

outdated(plan)
#>  [1] "large_dat"         "mod_FUN_list"      "ncv_dat_100"      
#>  [4] "ncv_dat_2000"      "ncv_dat_800"       "ncv_results_100"  
#>  [7] "ncv_results_2000"  "ncv_results_800"   "params_list_100"  
#> [10] "params_list_2000"  "params_list_800"   "perf_results_100" 
#> [13] "perf_results_2000" "perf_results_800"  "sim_dat_100"      
#> [16] "sim_dat_2000"      "sim_dat_800"

deps_profile(ncv_results_800, plan)
#> # A tibble: 5 x 4
#>   name     changed old                new               
#>   <chr>    <lgl>   <chr>              <chr>             
#> 1 command  FALSE   "55016b093d194572" "55016b093d194572"
#> 2 depend   TRUE    "26874d8b848c3d63" "4b6794fb258a5773"
#> 3 file_in  FALSE   ""                 ""                
#> 4 file_out FALSE   ""                 ""                
#> 5 seed     FALSE   "1942602105"       "1942602105"
deps_profile(ncv_results_100, plan)
#> # A tibble: 5 x 4
#>   name     changed old              new               
#>   <chr>    <lgl>   <chr>            <chr>             
#> 1 command  TRUE    4f18907a711e6c41 "ab37a0ecea2b1f23"
#> 2 depend   NA      <NA>             "5bd1d20abf0b48f9"
#> 3 file_in  NA      <NA>             ""                
#> 4 file_out NA      <NA>             ""                
#> 5 seed     NA      <NA>             "395977714"

drake_history()
#> Error in parse(text = command): <text>:2:24: unexpected symbol
#> 1: run_ncv(ncv_dat_100, sim_dat_100, large_dat, mod_FUN_list, params_list, 
#> 2:     error_FUN, method) map
#>                           ^

^{Created on 2020-05-05 by the reprex package (v0.3.0)}

wlandau · 2020-05-06T16:18:08Z

For everything you showed except ncv_results_800, it looks like you changed the command in the plan. For ncv_results_800, drake thinks at least one of your dependency targets/functions/globals changed at some point, which is trickier to identify. Both these things trigger updates to all the sub-targets.

ercbk · 2020-05-10T22:38:57Z

These are the first 4 targets in my plan, and the profiles say their dependencies changed. It doesn't make any sense. The first 3 target's dependencies are constants that wouldn't be changed. The commands for n=100, n = 2000 didn't change either. There's too much change here for me not to have remembered it and it doesn't show-up in the commits. So is there anything else to look at or do before I try to recreate this in miniature?
Btw I don't see anyway to reduce the complexity of the code, but the plan is to create a public repo with very slimmed down data objects that you can clone. That way it shouldn't take long to run and you'll be able to create the .drake file. This is assuming I can recreate the fail which I'm not confident will happen since I think the size of the objects might be the cause. Will that work for you? Just to double-check — you looking at my current .drake file will not help, right?

method <- "kj"
algorithms <- list("glmnet", "rf")
repeats <- seq(1:5)
grid_size <- 100

plan <- drake_plan(
   # model functions for each algorithm
   mod_FUN_list = create_models(algorithms),
   # data used to estimate out-of-sample error
   # noise_sd, seed settings are the defaults
   large_dat = mlbench_data(n = 10^5,
                            noise_sd = 1,
                            seed = 2019),
   # sample size = 100
   sim_dat_100 = mlbench_data(100),

library(drake)

source("performance-experiment/Kuhn-Johnson/plan-kj.R")

vis_drake_graph(plan)

deps_profile(mod_FUN_list, plan)
#> # A tibble: 5 x 4
#>   name     changed old                new               
#>   <chr>    <lgl>   <chr>              <chr>             
#> 1 command  FALSE   "896777bfc4467875" "896777bfc4467875"
#> 2 depend   TRUE    "19a0f5400146eab4" "26cafb820b726b9a"
#> 3 file_in  FALSE   ""                 ""                
#> 4 file_out FALSE   ""                 ""                
#> 5 seed     FALSE   "787681411"        "787681411"

deps_profile(large_dat, plan)
#> # A tibble: 5 x 4
#>   name     changed old                new               
#>   <chr>    <lgl>   <chr>              <chr>             
#> 1 command  FALSE   "f7b4ac0ab068d769" "f7b4ac0ab068d769"
#> 2 depend   TRUE    "1a4e85c7d355a240" ""                
#> 3 file_in  FALSE   ""                 ""                
#> 4 file_out FALSE   ""                 ""                
#> 5 seed     FALSE   "1889768483"       "1889768483"

deps_profile(sim_dat_100, plan)
#> # A tibble: 5 x 4
#>   name     changed old                new               
#>   <chr>    <lgl>   <chr>              <chr>             
#> 1 command  FALSE   "31129d94b2c9b515" "31129d94b2c9b515"
#> 2 depend   TRUE    "1a4e85c7d355a240" ""                
#> 3 file_in  FALSE   ""                 ""                
#> 4 file_out FALSE   ""                 ""                
#> 5 seed     FALSE   "90873426"         "90873426"

deps_profile(params_list_100, plan)
#> # A tibble: 5 x 4
#>   name     changed old                new               
#>   <chr>    <lgl>   <chr>              <chr>             
#> 1 command  FALSE   "7cc8ecbd68b25f69" "7cc8ecbd68b25f69"
#> 2 depend   TRUE    "85798dc734adfacd" "1c4e453d1ffbc8d4"
#> 3 file_in  FALSE   ""                 ""                
#> 4 file_out FALSE   ""                 ""                
#> 5 seed     FALSE   "1602630628"       "1602630628"

^{Created on 2020-05-10 by the reprex package (v0.3.0)}

wlandau · 2020-05-11T02:51:43Z

So is there anything else to look at or do before I try to recreate this in miniature?

When was the last time you built the project from scratch? And how often do you restart your R session. It is best to start make() with a fresh global environment (especially if you're not using the custom envir argument). r_make() guarantees that the global environment is completely fresh and is created in the same way every time you run the pipeline. This is critically important for a project as large and complicated as yours. So a low-overhaul thing to try is to put these lines and plan-kj.R into a _drake.R file and run a fresh copy of your project with r_make() and see if things stay up to date.

Certain hidden circularities can also crop. For example, the following plan is self-invalidating. I tried to catch most of this with lock_envir = FALSE in drake version 7, but weird stuff could theoretically crop up with environment variables etc.

library(drake)

a <- 1
      
plan <- drake_plan(
  x = a,
  y = assign("a", x + 1, envir = globalenv())
)

make(plan, lock_envir = FALSE)
#> > target x
#> > target y

make(plan, lock_envir = FALSE)
#> > target x
#> > target y

^{Created on 2020-05-11 by the reprex package (v0.3.0)}

The existing repo has heavy package requirements and the targets still have heavy runtimes. Specifically, ncv_results_2000 is very slow for an example used for debugging purposes. The targets before ncv_results_2000, however, do stay up to date when I try to run things.

Also, I notice you are using custom future multicore processing which I skipped for the sake of convenience. You might have a look at drake's built in high-performance computing, which has multicore and cluster-powered capabilities: https://books.ropensci.org/drake/hpc.html.

ercbk · 2020-05-19T02:39:46Z

I haven't rebuilt the project from scratch since moving the project to the version controlled directory.
The ncv_results_2000 target failed at night and I didn't rerun it until the next morning. I'm pretty sure I closed R and started with a fresh session the next morning.
Along with the lines you highlighted for _drake.R, if I want verbose = 1, lock_envir = FALSE, then I need to put those args into a drake_config(), correct? And do I run r_make in the console with _drake.R in the project root directory?
I wouldn't mind trying clustermq, but I'm not sure all customization, template file stuff and SLURM schedulers, etc., applies to my project. As is, I'm only parallelizing within targets. If I knew my targets wouldn't fail, then I'd consider starting 8 instances to parallelize building the targets. Or did I miss something? Is there something that would make the within-target loops run more efficiently?

wlandau · 2020-05-19T18:12:50Z

Along with the lines you highlighted for _drake.R, if I want verbose = 1, lock_envir = FALSE, then I need to put those args into a drake_config(), correct? And do I run r_make in the console with _drake.R in the project root directory?

Yes to both.

I wouldn't mind trying clustermq, but I'm not sure all customization, template file stuff and SLURM schedulers, etc., applies to my project. As is, I'm only parallelizing within targets. If I knew my targets wouldn't fail, then I'd consider starting 8 instances to parallelize building the targets. Or did I miss something? Is there something that would make the within-target loops run more efficiently?

clustermq does have a multicore backend for multicore parallelism on non-Windows machines. (For Windows machines, drake does also have its own future backend.) . Other than that, there's always a tradeoff between parallelism within targets and parallelism among targets. If you know not very many targets are going to run at the same time but there is a lot to do within each target, then within-target parallelism seems reasonable. But if a lot of targets are conditionally independent and you want some targets to start while others are still running, drake's built-in parallelism can help. This was probably already intuitive to you, but it is the main thing I think about when writing a new pipeline that needs HPC.

ercbk · 2020-05-22T00:35:56Z

Reran the slimmed down version of the project in a separate repo with same environment and pretty much worked like a dream. Did have one small hiccup in the beginning though. I got a config error on my first run when there wasn't a .drake directory yet — see here and here. Restarted R, re-ran r_make, and everything ran fine. Is there an initialization step that I missed?

Had a connection error after some sub-target builds. Re-ran r_make and it picked up where it left off. No rebuilding of already-successfully-built sub-targets. Added a n=3000 section to the plan, And again, no triggered rebuilds and it finished smoothly.

Next step, I guess, will be to delete the .drake directory in the real repo, create a _drake.R and try again.

wlandau · 2020-05-22T16:47:43Z

I could not reproduce those errors running https://github.com/ercbk/temp-nested-cv. I did get connection errors, but those disappeared when I commented out the custom future configuration you had. Looks like you're setting up drake properly.

ercbk · 2020-05-22T16:59:49Z

What custom future configuration are you talking about? The saved PuTTY session settings?

wlandau · 2020-05-22T17:18:44Z

https://github.com/ercbk/temp-nested-cv/blob/153e428f35a99ab8a70ac95fbc23d9bb4721c2ef/_drake.R#L33-L63

ercbk · 2020-05-22T17:38:18Z

Okay, I assume there's a way to do the same thing using the methods you describe in the hpc chapter of your book, but I'm not grasping it. Could you show me the code youe using to connect to the instances?

wlandau · 2020-05-22T18:02:31Z

I was just using the repo you linked: https://github.com/ercbk/temp-nested-cv. I could not connect to your instances, so I commented out https://github.com/ercbk/temp-nested-cv/blob/153e428f35a99ab8a70ac95fbc23d9bb4721c2ef/_drake.R#L33-L63 to try to run locally. The pipeline successfully started from a fresh cache (no "argument 'config' missing with no default" errors on my end).

wlandau · 2020-05-22T18:04:56Z

To parallelize among targets using the existing setup you have, you could keep https://github.com/ercbk/temp-nested-cv/blob/153e428f35a99ab8a70ac95fbc23d9bb4721c2ef/_drake.R#L33-L63, disable within-target parallelism, and use the following drake_config():

drake_config(
  plan,
  parallelism = "future",
  jobs = 2, # or however many workers you want
  verbose = 1,
  lock_envir = FALSE,
  jobs_preprocess = 7
)

ercbk · 2020-05-22T18:14:34Z

OH. When you said you got rid of the connection errors, I thought you had some better way to stabilize my connection to compute this remotely. I see.

psadil added the type: question label Mar 9, 2020

wlandau-lilly added topic: performance topic: reproducibility status: priority type: trouble and removed type: question labels Mar 9, 2020

wlandau added type: bug and removed type: trouble labels Mar 10, 2020

kendonB mentioned this issue Mar 10, 2020

make rebuilds dynamic targets even when some subtargets get built before an interrupt #1211

Closed

wlandau-lilly added a commit that referenced this issue Mar 11, 2020

Sketch #1209 (comment)

937b81c

wlandau mentioned this issue Mar 12, 2020

Major bugfix: skip up-to-date dynamic sub-targets even when the parent did not finalize in the last make() #1216

Merged

4 tasks

wlandau closed this as completed in e72a3f7 Mar 12, 2020

wlandau reopened this Mar 23, 2020

wlandau-lilly closed this as completed in caf6084 Mar 23, 2020

wlandau-lilly added a commit that referenced this issue May 4, 2020

Try to deal with deps_profile() issues from #1209 (comment)

50eb9f5

wlandau mentioned this issue May 22, 2020

Incorrectly up-to-date dynamic targets #1260

Closed

3 tasks

kendonB mentioned this issue Oct 21, 2020

subtargets don't get recorded as made if interrupted before finalized with cross ropensci/targets#200

Closed

6 tasks

If a dynamic target fails, can I avoid remaking those subtargets that succeeded? #1209

If a dynamic target fails, can I avoid remaking those subtargets that succeeded? #1209

Comments

psadil commented Mar 9, 2020

Prework

Question

wlandau commented Mar 9, 2020 • edited Loading

psadil commented Mar 10, 2020

wlandau commented Mar 10, 2020

Implementation strategy

wlandau commented Mar 10, 2020

wlandau commented Mar 11, 2020 • edited Loading

wlandau commented Mar 11, 2020

wlandau commented Mar 11, 2020

wlandau commented Mar 11, 2020 • edited Loading

wlandau commented Mar 11, 2020

wlandau commented Mar 12, 2020 • edited Loading

kendonB commented Mar 12, 2020

wlandau commented Mar 12, 2020

wlandau commented Mar 12, 2020

wlandau commented Mar 23, 2020

wlandau-lilly commented Mar 23, 2020

wlandau-lilly commented Mar 23, 2020 • edited by wlandau Loading

ercbk commented Apr 29, 2020

wlandau commented Apr 30, 2020

ercbk commented Apr 30, 2020

wlandau commented Apr 30, 2020

wlandau commented Apr 30, 2020

ercbk commented Apr 30, 2020 • edited Loading

wlandau commented May 1, 2020

ercbk commented May 2, 2020 • edited Loading

ercbk commented May 3, 2020

wlandau commented May 4, 2020 • edited Loading

ercbk commented May 4, 2020 • edited Loading

wlandau commented May 4, 2020

ercbk commented May 6, 2020 • edited Loading

wlandau commented May 6, 2020

ercbk commented May 10, 2020

wlandau commented May 11, 2020

ercbk commented May 19, 2020 • edited Loading

wlandau commented May 19, 2020

ercbk commented May 22, 2020 • edited Loading

wlandau commented May 22, 2020 • edited Loading

ercbk commented May 22, 2020 • edited Loading

wlandau commented May 22, 2020

ercbk commented May 22, 2020 • edited Loading

wlandau commented May 22, 2020

wlandau commented May 22, 2020

ercbk commented May 22, 2020 • edited Loading

wlandau commented Mar 9, 2020 •

edited

Loading

wlandau commented Mar 11, 2020 •

edited

Loading

wlandau commented Mar 11, 2020 •

edited

Loading

wlandau commented Mar 12, 2020 •

edited

Loading

wlandau-lilly commented Mar 23, 2020 •

edited by wlandau

Loading

ercbk commented Apr 30, 2020 •

edited

Loading

ercbk commented May 2, 2020 •

edited

Loading

wlandau commented May 4, 2020 •

edited

Loading

ercbk commented May 4, 2020 •

edited

Loading

ercbk commented May 6, 2020 •

edited

Loading

ercbk commented May 19, 2020 •

edited

Loading

ercbk commented May 22, 2020 •

edited

Loading

wlandau commented May 22, 2020 •

edited

Loading

ercbk commented May 22, 2020 •

edited

Loading

ercbk commented May 22, 2020 •

edited

Loading

ercbk commented May 22, 2020 •

edited

Loading