Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed Target reload in memory #1253

Closed
2 of 3 tasks
sclewis23 opened this issue May 11, 2020 · 12 comments
Closed
2 of 3 tasks

Failed Target reload in memory #1253

sclewis23 opened this issue May 11, 2020 · 12 comments

Comments

@sclewis23
Copy link

sclewis23 commented May 11, 2020

Prework

  • Read and abide by drake's code of conduct.
  • Search for duplicates among the existing issues, both open and closed.
  • If you think your question has a quick and definite answer, consider posting to Stack Overflow under the drake-r-package tag. (If you anticipate extended follow-up and discussion, you are already in the right place!)

Question

What would you like to know?
I running some additional test and when a target fails, does it reload the failed target (environment and other information) before executing trying to make target?
I have a controller function that outputs my target small tibble (1row x 1col), this controller function calls like 20 other functions.
When I get a failure it seems to be creating a large rds object in the cache like 50 to 100MB.

When I rerun the drake make the amount of memory used is all most doubled.
If i get a successful run then the next run uses the normal amount.

I will see if I can create a good example ....just was curious if there something be reload from the failure, and if so how can i disable that?
Thanks

@wlandau
Copy link
Member

wlandau commented May 12, 2020

Sounds like strange behavior for sure. Could be similar to #882. If you post a small example that reproduces the issue, I can take a look.

@sclewis23
Copy link
Author

sclewis23 commented May 13, 2020

Is there a way to report what is the environment as the drake make is executing?
....I'm also seeing this on failures: "Repacking large object" which takes a long time...then when I rerun the plan it is slow to load
When I look at the .drake/data/ a large rds file is created....
looking at the object it is a large list, with messages, error, and seed
the error is a list (simple error) but is 1.5GB.....
having looked at the simple error ....i found an environment object in the with all the objects used in the sub function failure.

@sclewis23
Copy link
Author

OK ....kind of have a simple example where it is storing the whole environment....very small example:

library(magrittr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
b1<-rbind(dplyr::storms,dplyr::storms)
b2<-rbind(b1, b1)
b3<-rbind(b2, b2)
b4<-rbind(b3, b3)
b5<-rbind(b4, b4)
b6<-rbind(b5, b5)
saveRDS(b6, "b6.rds")
saveRDS(b1, "b1.rds")

testread <- function(x){
  withCallingHandlers(
    expr = {
      testFILE <- readRDS("b6.rds")
      assign("TT", testFILE)
      b <- readRDS(x)
      return(b)
      
    },
    
    # Error Functions -------------------------------------------------------------------
    error = function(e){
      stop("Error")
    },
    # Warning Functions -----------------------------------------------------------------
    warning = function(w){
      invokeRestart("muffleWarning")
    }
  )
}



TESTPlan <- drake::drake_plan(y = {
  t1 = testread("b6.rds");
  t2 = testread(drake::file_in("b1.rds"));
  rbind(t1, t2)
}
)
                                
                                


drake::make(TESTPlan, log_make = "test1.log",
            verbose = 2)
#> ℹ Install the progress package to see a progress bar when verbose = 2.

unlink("b1.rds")



drake::make(TESTPlan, log_make = "test1.log",
            verbose = 2)
#> ℹ Install the progress package to see a progress bar when verbose = 2.
#> Warning: missing file_in() files:
#>   b1.rds
#> Error: target y failed.
#> diagnose(y)error$message:
#>   Error
#> diagnose(y)error$calls:
#>   1. ├─global::testread(drake::file_in("b1.rds"))
#>   2. │ ├─base::withCallingHandlers(...)
#>   3. │ └─base::readRDS(x)
#>   4. │   └─base::gzfile(file, "rb")
#>   5. └─base::.handleSimpleError(...)
#>   6.   └─h(simpleError(msg, call))
#>   7.     └─base::stop("Error")

df <- file.info(list.files(".drake/data/", full.names = TRUE))
mostRecentFile <- rownames(df)[which.max(df$mtime)]
objData <- readRDS(mostRecentFile)
objDataError <- objData$error$calls
objDataErrorEnv <- environment(objDataError[["calls"]][[5]][[2]])
check_obj <- ls(objDataErrorEnv)
for (i in seq_along(check_obj)){
  print(object.size(mget(check_obj[i], envir = objDataErrorEnv)), units = "MB")
  print(paste0(check_obj[i], " - ", object.size(mget(check_obj[i], envir = objDataErrorEnv))))
}
#> 53.8 Mb
#> [1] "testFILE - 56391616"
#> 53.8 Mb
#> [1] "TT - 56391608"
#> 0 Mb
#> [1] "x - 392"

unlink("b6.rds")
unlink("test1.log")
```

<sup>Created on 2020-05-13 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>
`

@wlandau
Copy link
Member

wlandau commented May 13, 2020

Thanks, that example helps a lot. Now fixed.

For onlookers having data issues in general, check out https://books.ropensci.org/drake/plans.html#special-data-formats-for-targets.

@sclewis23
Copy link
Author

@wlandau
This fixed the issue.
As I was looking for other solutions I saw a old issue on richfitz/remake :
https://github.com/richfitz/remake/issues/178
I think that klarchristian is running into the same thing as I was.
Just wanted to let you know.
I also wanted to know when you would be updating the CRAN version, do you have a release cycle?
Thanks again for fixing this so quickly.

@wlandau
Copy link
Member

wlandau commented May 14, 2020

Glad to hear it.

In most cases, you can avoid richfitz/remake#178 by choosing a more efficient storage format for your target.

Over the last few years, I have been updating drake on CRAN about once a month. It's been almost 2 months since the last release because COVID-19 work has taken most of my time recently. That kind of work has been so unpredictable that I am not exactly sure when the next release will go out, but within the next couple of weeks is the goal.

@sclewis23
Copy link
Author

Thank you for all your hard work on this package.
In the remake issue 178, https://github.com/richfitz/remake/issues/178#issuecomment-550030590 he also indicates when a failure happens the "Repacking large object" happens, and it doesn't matter what type of storage format you choose...

@wlandau
Copy link
Member

wlandau commented May 14, 2020

The "Repacking large object" is for storr-specific operations. The drake formats at https://books.ropensci.org/drake/plans.html#special-data-formats-for-targets are designed to bypass storr and its storage inefficiencies entirely.

@sclewis23
Copy link
Author

....I think the issue is if the target fails it doesn't store as another format....see below:

`

startTime <- proc.time()
library(magrittr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
b1 <- rbind(dplyr::storms,dplyr::storms)
b2 <- rbind(b1, b1)
b3 <- rbind(b2, b2)
b4 <- rbind(b3, b3)
b5 <- rbind(b4, b4)
b6 <- rbind(b5, b5)
b7 <- rbind(b6, b6)
b8 <- rbind(b7, b7)

saveRDS(b8, "b8.rds")
saveRDS(b7, "b7.rds")

testread <- function(x){
  withCallingHandlers(
    expr = {
      testFILE <- readRDS("b8.rds")
      testFILE2 <- readRDS("b8.rds")
      testFILE3 <- readRDS("b8.rds")
      testFILE4 <- readRDS("b8.rds")
      testFILE5 <- readRDS("b8.rds")
      testFILE6 <- readRDS("b8.rds")
      testFILE7 <- readRDS("b8.rds")
      assign("TT", testFILE)
      b <- readRDS(x)
      return(b)
      
    },
    
    # Error Functions -------------------------------------------------------------------
    error = function(e){
      stop("Error")
    },
    # Warning Functions -----------------------------------------------------------------
    warning = function(w){
      invokeRestart("muffleWarning")
    }
  )
}



TESTPlan <- drake::drake_plan(y = drake::target({
  t1 = readRDS("b8.rds");
  t2 = testread(drake::file_in("b7.rds"));
  rbind(t1, t2)
},format = "fst")
)

drake::make(TESTPlan, log_make = "test1.log",
            verbose = 2)
#> ℹ Install the progress package to see a progress bar when verbose = 2.
#> Warning: You selected fst format for target y, so drake will convert it from
#> class c("tbl_df", "tbl", "data.frame") to a plain data frame.


unlink("b7.rds")

drake::make(TESTPlan, log_make = "test1.log",
            verbose = 2)
#> ℹ Install the progress package to see a progress bar when verbose = 2.
#> Warning: missing file_in() files:
#>   b7.rds
#> Repacking large object
#> Error: target y failed.
#> diagnose(y)error$message:
#>   Error
#> diagnose(y)error$calls:
#>   1. ├─global::testread(drake::file_in("b7.rds"))
#>   2. │ ├─base::withCallingHandlers(...)
#>   3. │ └─base::readRDS(x)
#>   4. │   └─base::gzfile(file, "rb")
#>   5. └─base::.handleSimpleError(...)
#>   6.   └─h(simpleError(msg, call))
#>   7.     └─base::stop("Error")

endTime <- proc.time()
endTime - startTime
#>    user  system elapsed 
#> 133.114   7.890 140.670

unlink("b8.rds")
unlink("test1.log")

Created on 2020-05-14 by the reprex package (v0.3.0)

`

@wlandau
Copy link
Member

wlandau commented May 15, 2020

Do you still get "repacking large object" even after 07d652c? When I run #1253 (comment), I do not see "repacking large object".

....I think the issue is if the target fails it doesn't store as another format....

I do not think I understand. Would you elaborate?

When a target fails, there is no return value to store, so drake only stores the metadata, which is now small and should not require a special format. The old target value, if it exists, is still in the cache.

@sclewis23
Copy link
Author

Ok sorry let me try to clarify.
I'm not having any issues after your fix.
The update in your latest works great.

In my last example I used the previous version to show you the issue.

I'm only trying to let you know others might have had the same issue when a target failed (even if you changed format to "fst" or other) it would have still cause the writing of the stack trace, and could "repacking large object" ....
Having read through klarchristian (richfitz/remake#178, near end of thread ) issue, it was indicated that he saw the "repacking large object" when he got failed target.
Once you save out the failed target (previous version) , it would also slow down the next Drake make because it would reload that.

@wlandau
Copy link
Member

wlandau commented May 15, 2020

I'm not having any issues after your fix.
The update in your latest works great.

That's great to hear, thanks.

I'm only trying to let you know others might have had the same issue when a target failed (even if you changed format to "fst" or other) it would have still cause the writing of the stack trace, and could "repacking large object"

I see now, thanks for clarifying. Yes, formats like "fst" only apply to values of targets that succeed. Regardless of format, the metadata always gets stored in the default format in the storr. That's when we run into that problem with large data you reported and see "repacking large object".

Having read through klarchristian (richfitz/remake#178, near end of thread ) issue, it was indicated that he saw the "repacking large object" when he got failed target.
Once you save out the failed target (previous version) , it would also slow down the next Drake make because it would reload that.

Tonight I finally had time to seriously revisit the thread. (With all the super high-intensity COVID-19 work my team is doing, this whole week has been brutal.) Sorry I was slow on the uptake. Your observations are completely consistent with @klarchristian's last point in richfitz/remake#178 (comment). And now we solved it! So thanks for your insight and persistence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants