Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.noexport argument is ignored #56

Closed
koenniem opened this issue Nov 18, 2020 · 5 comments
Closed

.noexport argument is ignored #56

koenniem opened this issue Nov 18, 2020 · 5 comments
Labels
Milestone

Comments

@koenniem
Copy link

koenniem commented Nov 18, 2020

As far as I can see, the .noexport argument of foreach::foreach is ignored by doFuture.

This issue was brought to my attention when it automatically tried to export a variable called data in my global environment to a function which uses a variable called data (but is not the same). While you could technically let it export it without serious consequences, this variable is 700Mb and is thus inefficient. Using the .noexport = "data" did not have any effect when using the doFuture adapter, but does work with doParallel and doSNOW.

I have no clue how to fix it, but in the code of doFuture and globals I can see .noexport is taken as an argument but never used elsewhere.

@HenrikBengtsson
Copy link
Collaborator

Thanks for reporting. Can you please provide a minimal reproducible example for this that I can work with?

@koenniem
Copy link
Author

Sure thing.

Consider we have two functions:

  • foreach_fun which implements foreach and does 10 loops over a list of 10 letters.
  • collapse takes a list of letters from foreach_fun and uses paste to collapse it. In reality this function can do anything, the error persists.
foreach_fun <- function() {

	foreach::foreach(i = 1:10, .noexport = "data") %dopar% {
		
		data <- letters[1:10]

		collapse(data)
	}
}

collapse <- function(data) {
	data <- paste0(data, collapse = ",")
	data
}

registerDoFuture()
plan(future::multisession)
foreach_fun()
plan(sequential)

And this works fine. Now, suppose we have a variable in our global environment called data for an entirely unrelated reason.

# Make a very large data frame, 763 MB in size
# Not used for anything
data <- as.data.frame(matrix(nrow = 1e6, ncol = 200))

If you run foreach_fun again, you get the following error

Error in getGlobalsAndPackages(expr, envir = globals_envir, globals = TRUE) : The total size of the 3 globals that need to be exported for the future expression (‘{; doFuture::registerDoFuture(); lapply(seq_along(...future.x_ii), FUN = function(jj) {; ...future.x_jj <- ...future.x_ii[[jj]]; i <- NULL; ...future.env <- environment(); ...; }, error = identity); }); }’) is 762.95 MiB. This exceeds the maximum allowed size of 500.00 MiB (option 'future.globals.maxSize'). There are three globals: ‘data’ (762.95 MiB of class ‘list’), ‘collapse’ (5.81 KiB of class ‘function’) and ‘...future.x_ii’ (0 bytes of class ‘NULL’).

For reasons that are beyond me, doFuture tries to export data as well because it sees that it is used by collapse, even though this is not the same variable of course. In the process, the .noexport argument is ignored even though it should prevent data from being exported. Moreover, export collapse explicitly using .export does not prevent this either.

doParallel and doSNOW

Two adapters, doParallel and doSNOW don't have this problem, probably because they look for variables to export automatically like doFuture does.

cl <- parallel::makeCluster(parallel::detectCores())
doParallel::registerDoParallel(cl)
# doSNOW::registerDoSNOW(cl) # also works

foreach_fun <- function() {
	
	foreach::foreach(i = 1:10, .export = "collapse") %dopar% {
		
		data <- letters[1:10]
		
		collapse(data)
	}
}

foreach_fun()

doParallel::stopImplicitCluster()
registerDoSEQ()
parallel::stopCluster(cl)

Why?

My intuition is that .noexport is ignored by doFuture. For example, in the function getGlobalsAndPackages_doFuture in both globals.R and doFuture.R, the .noexport function is taken as an argument but then nothing is done with it.

@HenrikBengtsson
Copy link
Collaborator

HenrikBengtsson commented Nov 19, 2020

Thanks again for reporting on this. Indeed, the doFuture adapter completely ignored the .noexport argument. This has been fixed in the develop branch, cf. 974e61f. Until next release is on CRAN, you can grab the develop branch version using:

remotes::install_github("HenrikBengtsson/doFuture", ref="develop")

Now, it turns out there's another problem too unrelated to the doFuture package. The data object is incorrectly identified as a global variable, which it shouldn't. This needs to be fixed in the globals package - tracking it in futureverse/globals#69. When fixed there, you should no longer have to turn to .noexport as a workaround. The bug there is that despite argument data, the globals package still thinks data in the function body is a global variable;

collapse <- function(data) {
	data <- paste0(data, collapse = ",")
	data
}

If you switch to, say,

collapse <- function(data) {
	data2 <- paste0(data, collapse = ",")
	data2
}

then data is no longer picked up as a global variable and you don't need that .noexport = "data". There have been a few other globals reports related to this problem in the past.

@HenrikBengtsson
Copy link
Collaborator

FYI, globals 0.14.0 on CRAN. Fixes the problem where data is incorrectly picked up as a global and exported. So, with the new globals version there's no need for .noexport.

@HenrikBengtsson
Copy link
Collaborator

doFuture 0.11.0, which acknowledges argument .noexport, is now on CRAN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants