Unable to allocate TMP for items in parallel batch counting #5169

matthewgson · 2021-09-20T15:47:35Z

I encountered an issue similar to #4295 but it seems slightly different.

I'm working with a data.table of 890 million rows and 114 columns.

When I do groupby with hour and minute variables

intraday <- dt[, .(
      Nobs = .N,
      col1 = mean(col1, na.rm = TRUE),
      col2 = mean(col2, na.rm = TRUE)
    ), keyby = .(hour(datetime), minute(datetime)]

The following error occurs:

Detected that j uses these columns: qtys_all,vols_all,qtys_f,vols_f,qtys_bd,vols_bd,qtys_mm,vols_mm,qtys_cu,vols_cu,qtys_pc,vols_pc
Finding groups using forderv ... forder.c received 890185979 rows and 2 columns
Error in forderv(byval, sort = keyby, retGrp = TRUE) :
  Unable to allocate TMP for my_n=890185979 items in parallel batch counting

I successfully have done this operation before, but only thing I added was .N. part.
It worked as I removed this part.

intraday <- dt[, .(
      col1 = mean(col1, na.rm = TRUE),
      col2 = mean(col2, na.rm = TRUE)
    ), keyby = .(hour(datetime), minute(datetime)]

Detected that j uses these columns: qtys_all,vols_all,qtys_f,vols_f,qtys_bd,vols_bd,qtys_mm,vols_mm,qtys_cu,vols_cu,qtys_pc,vols_pc
Finding groups using forderv ... forder.c received 890185979 rows and 2 columns
3.230s elapsed (22.3s cpu)
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu)
lapply optimization is on, j unchanged as 'list(mean(qtys_all, na.rm = T), se_mean(qtys_all), mean(vols_all, na.rm = T), se_mean(vols_all), mean(qtys_f, na.rm = T), se_mean(qtys_f), mean(vols_f, na.rm = T), se_mean(vols_f), mean(qtys_bd, na.rm = T), '
GForce is on, left j unchanged
Old mean optimization changed j from 'list(mean(qtys_all, na.rm = T), se_mean(qtys_all), mean(vols_all,     na.rm = T), se_mean(vols_all), mean(qtys_f, na.rm = T), se_mean(qtys_f),     mean(vols_f, na.rm = T), se_mean(vols_f), mean(qtys_bd, na.rm = T),     se_mean(qtys_bd), mean(vols_bd, na.rm = T), se_mean(vols_bd),     mean(qtys_mm, na.rm = T), se_mean(qtys_mm), mean(vols_mm,         na.rm = T), se_mean(vols_mm), mean(qtys_cu, na.rm = T),     se_mean(qtys_cu), mean(vols_cu, na.rm = T), se_mean(vols_cu),     mean(qtys_pc, na.rm = T), se_mean(qtys_pc), mean(vols_pc,         na.rm = T), se_mean(vols_pc))' to 'list(.External(Cfastmean, qtys_all, T), se_mean(qtys_all), .External(Cfastmean, vols_all, T), se_mean(vols_all), .External(Cfastmean, qtys_f, T), se_mean(qtys_f), .External(Cfastmean, vols_f, T), se_mean(vols_f), '
Making each group and running j (GForce FALSE) ...

  collecting discontiguous groups took 1571.924s for 48 groups
  eval(j) took 279.962s for 48 calls
00:04:59 elapsed (00:20:27 cpu)

My server has 1TB memory and I believe there's no memory issue here though I need a thorough check.

Here's sessionInfo()

> sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] matrixStats_0.59.0 tictoc_1.0.1       forcats_0.5.1      stringr_1.4.0
 [5] dplyr_1.0.7        purrr_0.3.4        readr_2.0.1        tidyr_1.1.3
 [9] tibble_3.1.4       ggplot2_3.3.5      tidyverse_1.3.1    fst_0.9.4
[13] data.table_1.14.0

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.1 haven_2.4.3      colorspace_2.0-2 vctrs_0.3.8
 [5] generics_0.1.0   utf8_1.2.2       rlang_0.4.11     pillar_1.6.2
 [9] glue_1.4.2       withr_2.4.2      DBI_1.1.1        dbplyr_2.1.1
[13] modelr_0.1.8     readxl_1.3.1     lifecycle_1.0.0  munsell_0.5.0
[17] gtable_0.3.0     cellranger_1.1.0 rvest_1.0.1      tzdb_0.1.2
[21] parallel_4.0.5   fansi_0.5.0      broom_0.7.9      Rcpp_1.0.7
[25] scales_1.1.1     backports_1.2.1  jsonlite_1.7.2   fs_1.5.0
[29] hms_1.1.0        stringi_1.7.4    grid_4.0.5       cli_3.0.1
[33] tools_4.0.5      magrittr_2.0.1   crayon_1.4.1     pkgconfig_2.0.3
[37] ellipsis_0.3.2   xml2_1.3.2       reprex_2.0.1     lubridate_1.7.10
[41] assertthat_0.2.1 httr_1.4.2       rstudioapi_0.13  R6_2.5.1
[45] compiler_4.0.5

The text was updated successfully, but these errors were encountered:

MichaelChirico · 2021-09-21T01:58:42Z

Can you check if maybe it's related to #5077? Updating to the latest dev version would solve the issue if so.

matthewgson · 2021-09-21T12:11:33Z

Definitely, I'll update and run the code once again. I'll see if 1.14.1 fixes this issue.

matthewgson · 2021-09-21T12:17:44Z

@MichaelChirico You're right, it works on 1.14.1 version. Thanks!

MichaelChirico · 2021-09-21T14:52:40Z

awesome, glad to hear it!

matthewgson changed the title ~~Error Performing Aggregation on Large data.table~~ Unable to allocate TMP for items in parallel batch counting Sep 20, 2021

MichaelChirico closed this as completed Sep 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to allocate TMP for items in parallel batch counting #5169

Unable to allocate TMP for items in parallel batch counting #5169

matthewgson commented Sep 20, 2021

MichaelChirico commented Sep 21, 2021

matthewgson commented Sep 21, 2021 •

edited

Loading

matthewgson commented Sep 21, 2021

MichaelChirico commented Sep 21, 2021

Unable to allocate TMP for items in parallel batch counting #5169

Unable to allocate TMP for items in parallel batch counting #5169

Comments

matthewgson commented Sep 20, 2021

MichaelChirico commented Sep 21, 2021

matthewgson commented Sep 21, 2021 • edited Loading

matthewgson commented Sep 21, 2021

MichaelChirico commented Sep 21, 2021

matthewgson commented Sep 21, 2021 •

edited

Loading