Failed to allocate counts or TMP when assigning g in gforce #4295

renkun-ken · 2020-03-10T03:30:42Z

I'm working with a data.table of 1423657324 rows and 14 columns.

There's an integer group (grp) with 3790 unique values.

When I do the following

stats <- ft[, .(
      col1 = median(col1, na.rm = TRUE),
      col2 = median(col2, na.rm = TRUE),
      col3 = median(col3, na.rm = TRUE)
    ), keyby = grp]

The following error occurs:

Error in gforce(thisEnv, jsub, o__, f__, len__, irows) :
  Internal error: Failed to allocate counts or TMP when assigning g in gforce

This does not occur when a shorter version of the data (1/3 of the size) is done with the same aggregation.

The text was updated successfully, but these errors were encountered:

MichaelChirico · 2020-03-10T03:48:02Z

I would imagine median could be very memory hungry, this is why you usually see approx_quantile in "big data" SQL and median itself is dropped entirely.

Do you get the same issue if you try and sort the data by col1? If not, it may be that gmedian is trying to do all three columns at once.

As a workaround, do ft[ , .N, keyby = .(grp, col1)] and get the median from the frequency table, since you said there are far fewer unique values.

shrektan · 2020-03-10T03:55:46Z

Will changing the median to stats::median help? It should prevent data.table from optimizing with gforce.

renkun-ken · 2020-03-10T04:16:56Z

Disabling gforce or using stats::median will not trigger this error. But I'm still curious about why this occurs?

My server memory is 1TB and should be enough to process.

MichaelChirico · 2020-03-10T04:25:31Z

how about stats::median(c(col1, col2, col3)) (just to check if tripling the memory footprint has an impact)

renkun-ken · 2020-03-10T05:59:14Z

stats::median(c(col1, col2, col3)) works quite normally and nothing special occurs.

shrektan · 2020-03-10T06:17:15Z

Well, the error is thrown from

data.table/src/gsumm.c

Lines 114 to 115 in b1b1832

    
           int *counts = calloc(nBatch*highSize, sizeof(int));  // TODO: cache-line align and make highSize a multiple of 64 
        
           int *TMP   = malloc(nrow*2*sizeof(int));

However, I also see the nrow is declared as an int

data.table/src/gsumm.c

Line 6 in b1b1832

    
           static int nrow = 0;         // length of underlying x; same as length(ghigh) and length(glow)

It looks like it will lead to an overflow for 1423657324 * 2...

shrektan · 2020-03-10T06:21:37Z

@renkun-ken Would you mind to have a test? Just cut the row count of your data to 1073741823L / 1073741824L respectively and try your original code.

I'm expecting the first case (1073741823L row) works but the second will fail.

shrektan · 2020-03-10T06:53:31Z

I believe that's the cause. The overflow leads to an infinite memory allocation...

Rcpp::cppFunction("size_t test(int x) {
                    return x*2*sizeof(int);
                  }")
test(1073741823L)
#> [1] 8589934584
test(1073741824L)
#> [1] 1.844674e+19
test(1423657324L)
#> [1] 1.844674e+19

^{Created on 2020-03-10 by the reprex package (v0.3.0)}

renkun-ken · 2020-03-10T13:35:40Z

Yes, 1073741823L case works perfectly while 1073741824L case does not work as you expected.

shrektan · 2020-03-12T05:16:13Z

@renkun-ken It would be great if you can verify that PR #4297 does fix this issue, when you have time, of course.

renkun-ken · 2020-03-12T09:20:28Z

@shrektan Thanks! I'll verify it soon.

renkun-ken · 2020-03-12T16:44:54Z

@shrektan sadly too hard to fetch the repo

remote: Enumerating objects: 1431, done.
remote: Counting objects: 100% (1431/1431), done.
remote: Compressing objects: 100% (181/181), done.
Timeout, server github.com not responding. KiB | 2.00 KiB/s  
fatal: the remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed

tried many times but no luck

shrektan · 2020-03-12T16:52:06Z

The network issue I encountered before... so I packaged the source and have sent it to your email (I assume you are able to access your personal email).

renkun-ken · 2020-03-13T01:36:08Z

Thanks for your nice packaging! My git fetch worked magically as I was sleeping last night.

I retried and can confirm that the bug is fixed with the PR. Good work!

BTW, are you suspecting that there are some other similar int problems like this?

MichaelChirico · 2020-03-13T01:47:31Z

Here are the other 60 or so places with a similar pattern in the code (haven't combed through it)

grep -Enr "[^.][0-9]+\s*[*]" src --include=*.c | grep -Ev "^src/[a-z]+[.]c:[0-9]+:\s*[/][/]"
src/forder.c:260:    memset(thiscounts, 0, 256*sizeof(int));
src/forder.c:380:  dmask = dround ? 1 << (8*dround-1) : 0;
src/forder.c:425:  memset(stat,   0, 257*sizeof(uint64_t));
src/forder.c:728:  if (!TMP || !UGRP /*|| TMP%64 || UGRP%64*/) STOP(_("Failed to allocate TMP or UGRP or they weren't cache line aligned: nth=%d"), nth);
src/forder.c:1012:          memcpy(my_starts, my_starts_copy, 256*sizeof(uint16_t));  // restore starting offsets
src/forder.c:1051:  uint8_t  *ugrps =  malloc(nBatch*256*sizeof(uint8_t));
src/frank.c:93:          dans[xorder[j]-1] = (2*xstart[i]+xlen[i]-1)/2.0;
src/frank.c:126:        int offset = 2*xstart[i]+xlen[i]-2;
src/gsumm.c:115:    int *TMP   = malloc(nrow*2*sizeof(int));
src/gsumm.c:132:      int *restrict my_tmp = TMP + b*2*batchSize;
src/gsumm.c:135:        int *p = my_tmp + 2*my_counts[w]++;
src/gsumm.c:146:        const int *restrict p = TMP + b*2*batchSize + start*2;
src/fsort.c:125:  if (batchSize < 1024) batchSize = 1024; // simple attempt to work reasonably for short vector. 1024*8 = 2 4kb pages
src/fsort.c:178:                       (int)(nBatch*MSBsize*sizeof(R_xlen_t)/(1024*1024)),
src/fsort.c:179:                       (int)(nBatch*MSBsize*sizeof(R_xlen_t)/(4*1024*nBatch)),
src/assign.c:21:  SETLENGTH(x,50+n*2*sizeof(void *)/4);  // 1*n for the names, 1*n for the VECSXP itself (both are over allocated).
src/assign.c:575:        char *s5 = (char*) malloc(strlen(tc2) + 5); //4 * '_' + \0
src/bmerge.c:496:                 ival.d-xval.d == rollabs /*#1007*/))
src/bmerge.c:510:                   xval.d-ival.d == rollabs /*#1007*/))
src/fread.c:194:  char *ptr = buf + 501 * flip;
src/fread.c:282:  const char *mostConsumed = start; // tests 1550* includes both 'na' and 'nan' in nastrings. Don't stop after 'na' if 'nan' can be consumed too.
src/fread.c:375:    ans = (double) tp.tv_sec + 1e-9 * (double) tp.tv_nsec;
src/fread.c:379:    ans = (double) tv.tv_sec + 1e-6 * (double) tv.tv_usec;
src/fread.c:434:  mmp_copy = (char *)malloc((size_t)fileSize + 1/* extra \0 */);
src/fread.c:596:    acc = 10*acc + digit;
src/fread.c:628:    acc = 10*acc + digit;
src/fread.c:693:    acc = 10*acc + digit;
src/fread.c:727:      acc = 10*acc + digit;
src/fread.c:914:      E = 10*E + digit;
src/fread.c:1256:      int nbit = 8*sizeof(char *); // #nocov
src/fread.c:1655:    if (jump0size*100*2 < sz) nJumps=100;  // 100 jumps * 100 lines = 10,000 line sample
src/fread.c:1656:    else if (jump0size*10*2 < sz) nJumps=10;
src/fread.c:1663:    else DTPRINT(_("(%"PRIu64" bytes from row 1 to eof) / (2 * %"PRIu64" jump0size) == %"PRIu64"\n"),
src/fread.c:1664:                 (uint64_t)sz, (uint64_t)jump0size, (uint64_t)(sz/(2*jump0size)));
src/fread.c:1687:    if (ch<lastRowEnd) ch=lastRowEnd;  // Overlap when apx 1,200 lines (just over 11*100) with short lines at the beginning and longer lines near the end, #2157
src/fread.c:1823:    allocnrow = clamp_szt((size_t)(bytesRead / fmax(meanLineLen - 2*sd, minLen)),
src/fread.c:1824:                          (size_t)(1.1*estnrow), 2*estnrow);
src/fread.c:1833:      DTPRINT(_("  Initial alloc = %"PRIu64" rows (%"PRIu64" + %d%%) using bytes/max(mean-2*sd,min) clamped between [1.1*estn, 2.0*estn]\n"),
src/fread.c:1973:  size_t chunkBytes = umax((size_t)(1000*meanLineLen), 1ULL/*MB*/ *1024*1024);
src/fread.c:2030:      .buff8 = malloc(rowSize8 * myBuffRows + 8),
src/fread.c:2031:      .buff4 = malloc(rowSize4 * myBuffRows + 4),
src/fread.c:2032:      .buff1 = malloc(rowSize1 * myBuffRows + 1),
src/fread.c:2102:          ctx.buff8 = realloc(ctx.buff8, rowSize8 * myBuffRows + 8);
src/fread.c:2103:          ctx.buff4 = realloc(ctx.buff4, rowSize4 * myBuffRows + 4);
src/fread.c:2104:          ctx.buff1 = realloc(ctx.buff1, rowSize1 * myBuffRows + 1);
src/fread.c:2453:    DTPRINT(_("%8.3fs (%3.0f%%) Memory map %.3fGB file\n"), tMap-t0, 100.0*(tMap-t0)/tTot, 1.0*fileSize/(1024*1024*1024));
src/fread.c:2460:      tAlloc-tColType, 100.0*(tAlloc-tColType)/tTot, (uint64_t)allocnrow, ncol, DTbytes/(1024.0*1024*1024), (uint64_t)DTi, 100.0*DTi/allocnrow);
src/fread.c:2464:            tReread-tAlloc, 100.0*(tReread-tAlloc)/tTot, nJumps, nSwept, (double)chunkBytes/(1024*1024), (int)(DTi/nJumps), nth);
src/fifelse.c:167:    REPROTECT(cons = eval(SEXPPTR_RO(args)[2*i], rho), Icons);
src/fifelse.c:168:    REPROTECT(outs = eval(SEXPPTR_RO(args)[2*i+1], rho), Iouts);
src/fifelse.c:173:      error("Argument #%d must be logical.", 2*i+1);
src/fwrite.c:376:    ch += 7 + 2*!squashDateTime;
src/fwrite.c:389:    ch += 8 + 2*!squashDateTime;
src/fwrite.c:614:  size_t buffSize = (size_t)1024*1024*args.buffMB;
src/fwrite.c:645:  size_t maxLineLen = eolLen + args.ncol*(2*(doQuote!=0) + 1/*sep*/);
src/fwrite.c:648:    maxLineLen += 2*(doQuote!=0/*NA('auto') or true*/) + 1/*sep*/;
src/fwrite.c:782:  if (maxLineLen*2>buffSize) { buffSize=2*maxLineLen; rowsPerBatch=2; }
src/fwrite.c:910:          int used = 100*((double)(ch-myBuff))/buffSize;  // percentage of original buffMB

renkun-ken · 2020-03-13T02:19:48Z

Also BTW, for such bugs that require very large data to reproduce, do we need to build test cases for them?

For this bug, a minimal reproducible example is

library(data.table)

n <- 1500000000
ngrp <- 4000
dt <- data.table(group = sample.int(ngrp, n, replace = TRUE), x = runif(n))
res <- dt[, .(
  xmedian = median(x, na.rm = TRUE)
), keyby = group]

but dt is 18GB and creating dt costs 5-10 mins.

shrektan · 2020-03-13T02:27:41Z

You can get a faster example by getting rid of the random functions.... just use rep() and c()... I actually did try to test this.

The difficulty is the requirement of a very large amount of memory...

renkun-ken · 2020-03-13T02:53:51Z

I tried

dt <- data.table(group = rep(seq_len(ngrp), each = n / ngrp), x = numeric(n))

and

dt <- data.table(group = rep(seq_len(ngrp), each = n / ngrp), x = rep(rnorm(1000), each = n / 1000))

Both are much faster but cannot produce the error. 👀

ColeMiller1 · 2020-03-13T06:30:17Z

Instead of rep(..., each =), remove the each. When the groupings are sorted, the part of code that uses TMP isn't used.

steffen-windmueller · 2020-05-05T07:55:19Z

I got a similar problem. When running my R script in a Windows SQL Server (R version 3.5.2 and data.table 1.12.0), the following error occurs:
Error in forderv(ans, cols, sort = TRUE, retGrp = FALSE, order = if (decreasing) -order else order, : Failed to allocate TMP or UGRP or they weren't cache line aligned: nth=8 Call: source ... [ -> [.data.table -> eval -> eval -> forder -> forderv
that is called by the following line of forcer.c:
src/forder.c:728: if (!TMP || !UGRP /*|| TMP%64 || UGRP%64*/) STOP(_("Failed to allocate TMP or UGRP or they weren't cache line aligned: nth=%d"), nth);

The script joins two data tables (X and Y) with an on=.(Id,date>=start_date,date<=end_date) statement and uses by=.EACHI for the operation. When running the same script on my local RStudio version, there is no error. Do you think setting "Id" and "date" resp. "start_date" and "end_date" as keys in X and Y will prevent the integer overload? Alternatively, would changing by=.EACHI to keyby=.EACHI do the thing?

Thank you already in advance.

shrektan · 2020-05-05T08:03:01Z

Do you think setting "Id" and "date" resp. "start_date" and "end_date" as keys in X and Y will prevent the integer overload? Alternatively, would changing by=.EACHI to keyby=.EACHI do the thing?

I don't have experience in MSSQLServer with R but I don't think it will solve your problem. Is it expensive to give a try?

steffen-windmueller · 2020-05-05T13:13:50Z

I gave it a try and it did not work out. Do you know what might help here? Could it have to do something with the dependencies between data.table and bit64, since dependencies are sometimes distorted in MSSQL?

shrektan · 2020-05-05T14:09:11Z

In my opinion, it should have nothing to do with bit64 because bit64 has stopped updating for 3 years. I have some (limited) suggestions for you:

Turn-off the multiple threads, i.e., data.table::setDTthreads(1L)
Use the dev version of data.table (albeit I doubt it will work)
Make a local reproducible example and report it here
Call for support from Microsoft if you have a business license

steffen-windmueller · 2020-05-13T06:22:39Z

Turning off multiple threads did not work. Sometimes, the error changes to "invalid BXL stream", what can be attributed to having not enough memory.

Would you know if the error src/forder.c:728: if (!TMP || !UGRP /*|| TMP%64 || UGRP%64*/) STOP(_("Failed to allocate TMP or UGRP or they weren't cache line aligned: nth=%d"), nth); can be caused by a lack of RAM?

jangorecki · 2020-05-13T08:49:22Z

@scharlatan0139 yes, it does look exactly like an error caused by lack of RAM.

Debasis5 · 2020-12-09T15:15:50Z

@shrektan , Is the rightway to import your fix is remotes::install_github("Rdatatable/data.table#fix4295")

I tried this but got an invalid repo error

shrektan · 2020-12-09T16:15:36Z

@Debasis5 It should be

remotes::install_github("rdatatable/data.table#4297")

or

remotes::install_github("Rdatatable/data.table@fix4295")

shrektan added the bug label Mar 10, 2020

shrektan mentioned this issue Mar 11, 2020

should use the long int 2 to avoid integer overflow #4297

Merged

ColeMiller1 mentioned this issue Dec 4, 2020

regression in big grouping #4818

Closed

jangorecki added the GForce issues relating to optimized grouping calculations (GForce) label Dec 4, 2020

jangorecki added this to the 1.13.5 milestone Dec 9, 2020

mattdowle closed this as completed in #4297 Dec 10, 2020

matthewgson mentioned this issue Sep 20, 2021

Unable to allocate TMP for items in parallel batch counting #5169

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to allocate counts or TMP when assigning g in gforce #4295

Failed to allocate counts or TMP when assigning g in gforce #4295

renkun-ken commented Mar 10, 2020 •

edited

Loading

MichaelChirico commented Mar 10, 2020

shrektan commented Mar 10, 2020

renkun-ken commented Mar 10, 2020

MichaelChirico commented Mar 10, 2020

renkun-ken commented Mar 10, 2020

shrektan commented Mar 10, 2020

shrektan commented Mar 10, 2020

shrektan commented Mar 10, 2020

renkun-ken commented Mar 10, 2020

shrektan commented Mar 12, 2020

renkun-ken commented Mar 12, 2020

renkun-ken commented Mar 12, 2020

shrektan commented Mar 12, 2020

renkun-ken commented Mar 13, 2020

MichaelChirico commented Mar 13, 2020

renkun-ken commented Mar 13, 2020

shrektan commented Mar 13, 2020

renkun-ken commented Mar 13, 2020

ColeMiller1 commented Mar 13, 2020

steffen-windmueller commented May 5, 2020

shrektan commented May 5, 2020

steffen-windmueller commented May 5, 2020

shrektan commented May 5, 2020

steffen-windmueller commented May 13, 2020

jangorecki commented May 13, 2020

Debasis5 commented Dec 9, 2020

shrektan commented Dec 9, 2020

Failed to allocate counts or TMP when assigning g in gforce #4295

Failed to allocate counts or TMP when assigning g in gforce #4295

Comments

renkun-ken commented Mar 10, 2020 • edited Loading

MichaelChirico commented Mar 10, 2020

shrektan commented Mar 10, 2020

renkun-ken commented Mar 10, 2020

MichaelChirico commented Mar 10, 2020

renkun-ken commented Mar 10, 2020

shrektan commented Mar 10, 2020

shrektan commented Mar 10, 2020

shrektan commented Mar 10, 2020

renkun-ken commented Mar 10, 2020

shrektan commented Mar 12, 2020

renkun-ken commented Mar 12, 2020

renkun-ken commented Mar 12, 2020

shrektan commented Mar 12, 2020

renkun-ken commented Mar 13, 2020

MichaelChirico commented Mar 13, 2020

renkun-ken commented Mar 13, 2020

shrektan commented Mar 13, 2020

renkun-ken commented Mar 13, 2020

ColeMiller1 commented Mar 13, 2020

steffen-windmueller commented May 5, 2020

shrektan commented May 5, 2020

steffen-windmueller commented May 5, 2020

shrektan commented May 5, 2020

steffen-windmueller commented May 13, 2020

jangorecki commented May 13, 2020

Debasis5 commented Dec 9, 2020

shrektan commented Dec 9, 2020

renkun-ken commented Mar 10, 2020 •

edited

Loading