-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data.table doesn't seem to be using multiple cores on Slurm cluster. How to troubleshoot? #5573
Comments
Data.table uses per default 50% of the available virtual cores. You can raise this limit, e.g., by setting |
There may be multiple steps in fread that may not be parallelized. So for example if your file has character columns then a lot of time will be spent single threaded. I suggest to try running forder (or frollmean algo=exact) in a loop and then observe |
from the output it looks like data.table is using 3 threads out of 6 on that cluster node, so I'm not sure this is a problem with data.table, and you may consider closing the issue. data.table::setDTthreads(as.integer(Sys.getenv("SLURM_JOB_CPUS_PER_NODE", "1"))) |
when using 3 threads, you would have at best 3x speedups relative to a single thread, but that would be only in an ideal case. related to #2687 we should add some docs to clarify how exactly openmp is used, so people can have realistic expectations of when speedups should happen. |
in fread.c the only instance of pragma omp for I see is #pragma omp for ordered schedule(dynamic) reduction(+:thRead,thPush)
for (int jump = jump0; jump < nJumps; jump++) { but I am not an expert on fread so I am not sure what exactly happens in this for loop, and if using several threads in this for loop should result in big speedups. |
Note that |
I'm using data.table on a SLURM cluster and for some reason it's having trouble using multiple cores on something as simple as fread, even though it's detecting them when loading the library. The file is a 46GB tab-delimited file in 4-column long format.
When I ssh into the node, it's not even using all of the CPUs:
I can verify that when I use it on our personal workstations it is using multiple threads. How should I go about troubleshooting this? My guess is that SLURM/R/data.table are having some kind of weird interaction that is not provisioning the CPUs properly.
#
Output of sessionInfo()
The text was updated successfully, but these errors were encountered: