documentation on how openMP is used are lacking #2687

malcook · 2018-03-18T08:17:36Z

The only docs I could find on how openMP is actually used are from "old news" file Changes in v1.9.8 (on CRAN 25 Nov 2016) which reads:

Added setDTthreads() and getDTthreads() to control the threads used in data.table functions that are now parallelized with OpenMP on all architectures including Windows (fwrite(), fsort() and subsetting).

it would be great if this could be in the FAQ with a little info on how (much) this helps what sort of operations.

Or am I missing something?

edit: I just found a little detail of the sort I'm seeking: reading/writing biggish data, revisited

tdhock · 2022-12-22T21:42:20Z

I read ?setDTthreads with the expectation of finding some indication of how openMP is used for parallelization, so I could get some idea about what kinds of data would result in speedups using more than one thread. The closest I found to an answer was:

     Internally parallelized code is used in the following places:
        • ‘between.c’ - between()
        • ‘cj.c’ - CJ()
        • ‘coalesce.c’ - fcoalesce()
        • ‘fifelse.c’ - fifelse()
        • ‘fread.c’ - fread()
        • ‘forder.c’, ‘fsort.c’, and ‘reorder.c’ - forder() and related
        • ‘froll.c’, ‘frolladaptive.c’, and ‘frollR.c’ - froll() and family
        • ‘fwrite.c’ - fwrite()
        • ‘gsumm.c’ - GForce in various places, see GForce
        • ‘nafill.c’ - nafill()
        • ‘subset.c’ - Used in ‘[.data.table’ subsetting
        • ‘types.c’ - Internal testing usage

There are links to these functions, so I expected to find more details in the linked man pages, but I did not find a sufficiently detailed description of how openMP is used / what kinds of data/operations would result in speedups when using multiple threads.
For example on ?fread the only mention of threads is:

 nThread: The number of threads to use. Experiment to see what works
          best for your data on your hardware.

Can someone please add details about how multi-threading is used, and when speedups should be expected?
for example, something like the following:

 nThread: The number of threads to use in the for loop over columns. 
          (THIS IS JUST AN EXAMPLE, I DO NOT KNOW IF THIS IS TRUE)
          Speedups should be expected when there are a large number of columns. 
          Experiment to see what works best for your data on your hardware.

The linked blog post has some benchmarking of fread and fwrite (computation times for some particular numbers of rows and columns), but it would be useful to have some description like this on the man pages.

tdhock · 2023-03-27T20:32:28Z

I wrote a blog that compares time and memory usage of CSV read/write functions, https://tdhock.github.io/blog/2023/compare-read-write/
I did not observe any big speed differences when using only 1 vs multiple threads; is this expected?

jangorecki · 2023-03-28T01:54:41Z

Reading character columns needs to populate global string cache. That is always single threaded. Try another CSV.

tdhock · 2023-03-28T17:06:01Z

hey @jangorecki thanks for the feedback. So you think that we should expect speedups with multiple threads, when we are using fread, as long as there is no bottleneck from the global string cache?
My examples seem to contradict that expectation.

In the example above, I don't think the global string cache is an issue, because it is always the same string (on every line and row/column). And in fact we see a small speedup when using two threads instead of one in this case. It used this code to generate the data:

chr_mat <- function(N.rows, N.cols){
  data.vec <- paste0("'quoted", c(" ", "_"), "data'")
  matrix(data.vec, N.rows, N.cols)
}

The figure below shows that same benchmark run on a different machine with up to 64 threads, which shows very little difference between single and multiple threads.

I also did try another CSV, with random real numbers.

random_real <- function(N.rows, N.cols){
  set.seed(1)
  matrix(rnorm(N.rows*N.cols), N.rows, N.cols)
}

For real numbers I observed qualitatively similar results in the figure below, (no big speedups when using multiple threads)

Overall I have not observed any big speedups when using multiple threads (with fread or any other data.table function), so I wonder if you know of any examples that I could run to observe that?

jangorecki · 2023-03-29T10:15:42Z

Which version are you trying out?

tdhock · 2023-03-30T04:18:49Z

I'm not sure what version of data.table was used for those previous figures, but I just re-did some, using max 4 CPUs instead of 64, and data.table 1.14.8, and I observe the results below.

character data above, real number data below

Some speedups are apparent (4 faster than 2, which is in turn faster than 1 thread) on large real number data sets, is that the extent of speedups you would expect for fread?

jangorecki added the openmp label May 30, 2018

tdhock mentioned this issue Dec 22, 2022

where is documentation for expected speedups with multi-threading? #5571

Closed

tdhock added documentation omp labels Dec 22, 2022

tdhock mentioned this issue Dec 27, 2022

data.table doesn't seem to be using multiple cores on Slurm cluster. How to troubleshoot? #5573

Open

tdhock self-assigned this Oct 31, 2023

jangorecki removed the omp label Nov 7, 2023

tdhock mentioned this issue Nov 7, 2023

dcast fails with segmentation fault using >92 threads #5731

Closed

tdhock mentioned this issue Jan 16, 2024

locating performance improvement in NEWS #5900

Open

Anirban166 mentioned this issue Mar 23, 2024

Documented the use of OpenMP and parallelization in C code #6018

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

documentation on how openMP is used are lacking #2687

documentation on how openMP is used are lacking #2687

malcook commented Mar 18, 2018 •

edited

Loading

tdhock commented Dec 22, 2022

tdhock commented Mar 27, 2023

jangorecki commented Mar 28, 2023

tdhock commented Mar 28, 2023

jangorecki commented Mar 29, 2023

tdhock commented Mar 30, 2023

documentation on how openMP is used are lacking #2687

documentation on how openMP is used are lacking #2687

Comments

malcook commented Mar 18, 2018 • edited Loading

tdhock commented Dec 22, 2022

tdhock commented Mar 27, 2023

jangorecki commented Mar 28, 2023

tdhock commented Mar 28, 2023

jangorecki commented Mar 29, 2023

tdhock commented Mar 30, 2023

malcook commented Mar 18, 2018 •

edited

Loading