-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed Target reload in memory #1253
Comments
Sounds like strange behavior for sure. Could be similar to #882. If you post a small example that reproduces the issue, I can take a look. |
Is there a way to report what is the environment as the drake make is executing? |
OK ....kind of have a simple example where it is storing the whole environment....very small example: library(magrittr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
b1<-rbind(dplyr::storms,dplyr::storms)
b2<-rbind(b1, b1)
b3<-rbind(b2, b2)
b4<-rbind(b3, b3)
b5<-rbind(b4, b4)
b6<-rbind(b5, b5)
saveRDS(b6, "b6.rds")
saveRDS(b1, "b1.rds")
testread <- function(x){
withCallingHandlers(
expr = {
testFILE <- readRDS("b6.rds")
assign("TT", testFILE)
b <- readRDS(x)
return(b)
},
# Error Functions -------------------------------------------------------------------
error = function(e){
stop("Error")
},
# Warning Functions -----------------------------------------------------------------
warning = function(w){
invokeRestart("muffleWarning")
}
)
}
TESTPlan <- drake::drake_plan(y = {
t1 = testread("b6.rds");
t2 = testread(drake::file_in("b1.rds"));
rbind(t1, t2)
}
)
drake::make(TESTPlan, log_make = "test1.log",
verbose = 2)
#> ℹ Install the progress package to see a progress bar when verbose = 2.
unlink("b1.rds")
drake::make(TESTPlan, log_make = "test1.log",
verbose = 2)
#> ℹ Install the progress package to see a progress bar when verbose = 2.
#> Warning: missing file_in() files:
#> b1.rds
#> Error: target y failed.
#> diagnose(y)error$message:
#> Error
#> diagnose(y)error$calls:
#> 1. ├─global::testread(drake::file_in("b1.rds"))
#> 2. │ ├─base::withCallingHandlers(...)
#> 3. │ └─base::readRDS(x)
#> 4. │ └─base::gzfile(file, "rb")
#> 5. └─base::.handleSimpleError(...)
#> 6. └─h(simpleError(msg, call))
#> 7. └─base::stop("Error")
df <- file.info(list.files(".drake/data/", full.names = TRUE))
mostRecentFile <- rownames(df)[which.max(df$mtime)]
objData <- readRDS(mostRecentFile)
objDataError <- objData$error$calls
objDataErrorEnv <- environment(objDataError[["calls"]][[5]][[2]])
check_obj <- ls(objDataErrorEnv)
for (i in seq_along(check_obj)){
print(object.size(mget(check_obj[i], envir = objDataErrorEnv)), units = "MB")
print(paste0(check_obj[i], " - ", object.size(mget(check_obj[i], envir = objDataErrorEnv))))
}
#> 53.8 Mb
#> [1] "testFILE - 56391616"
#> 53.8 Mb
#> [1] "TT - 56391608"
#> 0 Mb
#> [1] "x - 392"
unlink("b6.rds")
unlink("test1.log")
```
<sup>Created on 2020-05-13 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>
` |
Thanks, that example helps a lot. Now fixed. For onlookers having data issues in general, check out https://books.ropensci.org/drake/plans.html#special-data-formats-for-targets. |
@wlandau |
Glad to hear it. In most cases, you can avoid richfitz/remake#178 by choosing a more efficient storage format for your target. Over the last few years, I have been updating |
Thank you for all your hard work on this package. |
The "Repacking large object" is for |
....I think the issue is if the target fails it doesn't store as another format....see below: ` startTime <- proc.time()
library(magrittr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
b1 <- rbind(dplyr::storms,dplyr::storms)
b2 <- rbind(b1, b1)
b3 <- rbind(b2, b2)
b4 <- rbind(b3, b3)
b5 <- rbind(b4, b4)
b6 <- rbind(b5, b5)
b7 <- rbind(b6, b6)
b8 <- rbind(b7, b7)
saveRDS(b8, "b8.rds")
saveRDS(b7, "b7.rds")
testread <- function(x){
withCallingHandlers(
expr = {
testFILE <- readRDS("b8.rds")
testFILE2 <- readRDS("b8.rds")
testFILE3 <- readRDS("b8.rds")
testFILE4 <- readRDS("b8.rds")
testFILE5 <- readRDS("b8.rds")
testFILE6 <- readRDS("b8.rds")
testFILE7 <- readRDS("b8.rds")
assign("TT", testFILE)
b <- readRDS(x)
return(b)
},
# Error Functions -------------------------------------------------------------------
error = function(e){
stop("Error")
},
# Warning Functions -----------------------------------------------------------------
warning = function(w){
invokeRestart("muffleWarning")
}
)
}
TESTPlan <- drake::drake_plan(y = drake::target({
t1 = readRDS("b8.rds");
t2 = testread(drake::file_in("b7.rds"));
rbind(t1, t2)
},format = "fst")
)
drake::make(TESTPlan, log_make = "test1.log",
verbose = 2)
#> ℹ Install the progress package to see a progress bar when verbose = 2.
#> Warning: You selected fst format for target y, so drake will convert it from
#> class c("tbl_df", "tbl", "data.frame") to a plain data frame.
unlink("b7.rds")
drake::make(TESTPlan, log_make = "test1.log",
verbose = 2)
#> ℹ Install the progress package to see a progress bar when verbose = 2.
#> Warning: missing file_in() files:
#> b7.rds
#> Repacking large object
#> Error: target y failed.
#> diagnose(y)error$message:
#> Error
#> diagnose(y)error$calls:
#> 1. ├─global::testread(drake::file_in("b7.rds"))
#> 2. │ ├─base::withCallingHandlers(...)
#> 3. │ └─base::readRDS(x)
#> 4. │ └─base::gzfile(file, "rb")
#> 5. └─base::.handleSimpleError(...)
#> 6. └─h(simpleError(msg, call))
#> 7. └─base::stop("Error")
endTime <- proc.time()
endTime - startTime
#> user system elapsed
#> 133.114 7.890 140.670
unlink("b8.rds")
unlink("test1.log") Created on 2020-05-14 by the reprex package (v0.3.0) ` |
Do you still get "repacking large object" even after 07d652c? When I run #1253 (comment), I do not see "repacking large object".
I do not think I understand. Would you elaborate? When a target fails, there is no return value to store, so |
Ok sorry let me try to clarify. In my last example I used the previous version to show you the issue. I'm only trying to let you know others might have had the same issue when a target failed (even if you changed format to "fst" or other) it would have still cause the writing of the stack trace, and could "repacking large object" .... |
That's great to hear, thanks.
I see now, thanks for clarifying. Yes, formats like
Tonight I finally had time to seriously revisit the thread. (With all the super high-intensity COVID-19 work my team is doing, this whole week has been brutal.) Sorry I was slow on the uptake. Your observations are completely consistent with @klarchristian's last point in richfitz/remake#178 (comment). And now we solved it! So thanks for your insight and persistence. |
Prework
drake
's code of conduct.drake-r-package
tag. (If you anticipate extended follow-up and discussion, you are already in the right place!)Question
What would you like to know?
I running some additional test and when a target fails, does it reload the failed target (environment and other information) before executing trying to make target?
I have a controller function that outputs my target small tibble (1row x 1col), this controller function calls like 20 other functions.
When I get a failure it seems to be creating a large rds object in the cache like 50 to 100MB.
When I rerun the drake make the amount of memory used is all most doubled.
If i get a successful run then the next run uses the normal amount.
I will see if I can create a good example ....just was curious if there something be reload from the failure, and if so how can i disable that?
Thanks
The text was updated successfully, but these errors were encountered: