-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to allocate counts or TMP when assigning g in gforce #4295
Comments
I would imagine Do you get the same issue if you try and sort the data by As a workaround, do |
Will changing the |
Disabling gforce or using My server memory is 1TB and should be enough to process. |
how about |
|
Well, the error is thrown from Lines 114 to 115 in b1b1832
However, I also see the Line 6 in b1b1832
It looks like it will lead to an overflow for 1423657324 * 2... |
@renkun-ken Would you mind to have a test? Just cut the row count of your data to I'm expecting the first case (1073741823L row) works but the second will fail. |
I believe that's the cause. The overflow leads to an infinite memory allocation... Rcpp::cppFunction("size_t test(int x) {
return x*2*sizeof(int);
}")
test(1073741823L)
#> [1] 8589934584
test(1073741824L)
#> [1] 1.844674e+19
test(1423657324L)
#> [1] 1.844674e+19 Created on 2020-03-10 by the reprex package (v0.3.0) |
Yes, |
@renkun-ken It would be great if you can verify that PR #4297 does fix this issue, when you have time, of course. |
@shrektan Thanks! I'll verify it soon. |
@shrektan sadly too hard to fetch the repo
tried many times but no luck |
The network issue I encountered before... so I packaged the source and have sent it to your email (I assume you are able to access your personal email). |
Thanks for your nice packaging! My git fetch worked magically as I was sleeping last night. I retried and can confirm that the bug is fixed with the PR. Good work! BTW, are you suspecting that there are some other similar |
Here are the other 60 or so places with a similar pattern in the code (haven't combed through it)
|
Also BTW, for such bugs that require very large data to reproduce, do we need to build test cases for them? For this bug, a minimal reproducible example is library(data.table)
n <- 1500000000
ngrp <- 4000
dt <- data.table(group = sample.int(ngrp, n, replace = TRUE), x = runif(n))
res <- dt[, .(
xmedian = median(x, na.rm = TRUE)
), keyby = group] but |
You can get a faster example by getting rid of the random functions.... just use The difficulty is the requirement of a very large amount of memory... |
I tried dt <- data.table(group = rep(seq_len(ngrp), each = n / ngrp), x = numeric(n)) and dt <- data.table(group = rep(seq_len(ngrp), each = n / ngrp), x = rep(rnorm(1000), each = n / 1000)) Both are much faster but cannot produce the error. 👀 |
Instead of |
I got a similar problem. When running my R script in a Windows SQL Server (R version 3.5.2 and data.table 1.12.0), the following error occurs: The script joins two data tables (X and Y) with an Thank you already in advance. |
I don't have experience in MSSQLServer with R but I don't think it will solve your problem. Is it expensive to give a try? |
I gave it a try and it did not work out. Do you know what might help here? Could it have to do something with the dependencies between data.table and bit64, since dependencies are sometimes distorted in MSSQL? |
In my opinion, it should have nothing to do with bit64 because bit64 has stopped updating for 3 years. I have some (limited) suggestions for you:
|
Turning off multiple threads did not work. Sometimes, the error changes to "invalid BXL stream", what can be attributed to having not enough memory. Would you know if the error |
@scharlatan0139 yes, it does look exactly like an error caused by lack of RAM. |
@shrektan , Is the rightway to import your fix is remotes::install_github("Rdatatable/data.table#fix4295") I tried this but got an invalid repo error |
@Debasis5 It should be remotes::install_github("rdatatable/data.table#4297") or remotes::install_github("Rdatatable/data.table@fix4295") |
I'm working with a data.table of
1423657324
rows and14
columns.There's an integer group (
grp
) with 3790 unique values.When I do the following
The following error occurs:
This does not occur when a shorter version of the data (1/3 of the size) is done with the same aggregation.
The text was updated successfully, but these errors were encountered: