-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'by' operations much slower when verbose=TRUE #6286
Comments
Investigating: I replaced all calls to # with wallclock()
system.time(copy(dt)[, 1, by = a, verbose=TRUE])
# Detected that j uses these columns: <none>
# Finding groups using forderv ... forder.c received 1000000 rows and 1 columns
# 0.008s elapsed (0.007s cpu)
# Finding group sizes from the positions (can be avoided to save RAM) ... 0.005s elapsed (0.041s cpu)
# Optimization is on but left j unchanged (single plain symbol): '1'
# Making each group and running j (GForce FALSE) ...
# memcpy contiguous groups took 0.031s for 1000000 groups
# eval(j) took 0.028s for 1000000 calls
# 0.274s elapsed (0.273s cpu)
# user system elapsed
# 0.322 0.020 0.289
# with clock() ie current
system.time(copy(dt)[, 1, by = a, verbose=TRUE])
# Detected that j uses these columns: <none>
# Finding groups using forderv ... forder.c received 1000000 rows and 1 columns
# 0.008s elapsed (0.024s cpu)
# Finding group sizes from the positions (can be avoided to save RAM) ... 0.005s elapsed (0.043s cpu)
# Optimization is on but left j unchanged (single plain symbol): '1'
# Making each group and running j (GForce FALSE) ...
# memcpy contiguous groups took 0.453s for 1000000 groups
# eval(j) took 0.438s for 1000000 calls
# 1.955s elapsed (0.509s cpu)
# user system elapsed
# 0.577 1.446 1.969 Can anyone comment on the difference between using |
Can reproduce. Adding a zero adds to a more noticeable difference: library(data.table) # 1.15.99
dt = data.table(a = 1:10000000)
system.time(copy(dt)[, 1, by = a])
# user system elapsed
# 1.924 0.039 1.849
system.time(copy(dt)[, 1, by = a, verbose=TRUE])
# Detected that j uses these columns: <none>
# Finding groups using forderv ... forder.c received 10000000 rows and 1 columns
# 0.069s elapsed (0.155s cpu)
# Finding group sizes from the positions (can be avoided to save RAM) ... 0.034s elapsed (0.032s cpu)
# Optimization is on but left j unchanged (single plain symbol): '1'
# Making each group and running j (GForce FALSE) ...
# memcpy contiguous groups took 4.140s for 10000000 groups
# eval(j) took 4.099s for 10000000 calls
# 18.0s elapsed (6.423s cpu)
# user system elapsed
# 6.611 11.598 18.096 |
Interesting, this doesn't happen with 1.15.4 library(data.table) # 1.15.4
dt = data.table(a = 1:10000000)
system.time(copy(dt)[, 1, by = a, verbose=TRUE])
# Detected that j uses these columns: <none>
# Finding groups using forderv ... forder.c received 10000000 rows and 1 columns
# 0.370s elapsed (0.020s cpu)
# Finding group sizes from the positions (can be avoided to save RAM) ... 0.050s elapsed # (0.050s cpu)
# Optimization is on but left j unchanged (single plain symbol): '1'
# Making each group and running j (GForce FALSE) ...
# memcpy contiguous groups took 0.258s for 10000000 groups
# eval(j) took 0.234s for 10000000 calls
# 2.230s elapsed (2.150s cpu)
# user system elapsed
# 2.23 0.06 2.67 |
I reckon
Good find! Seems like a potential performance regression sneaked in. |
NVM, it seems that this only affects non-Windows. AFAIK there are slight implementation differences between library(data.table)
# data.table 1.15.99 IN DEVELOPMENT built 2024-07-16 23:30:59 UTC; joshu using 8 threads (see ?getDTthreads). Latest news: r-datatable.com
dt = data.table(a = 1:1000000)
system.time(copy(dt)[, 1, by = a])
# user system elapsed
# 0.17 0.00 0.17
system.time(copy(dt)[, 1, by = a, verbose=TRUE])
# Detected that j uses these columns: <none>
# Finding groups using forderv ... forder.c received 1000000 rows and 1 columns
# 0.020s elapsed (0.000s cpu)
# Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu)
# Optimization is on but left j unchanged (single plain symbol): '1'
# Making each group and running j (GForce FALSE) ...
# memcpy contiguous groups took 0.029s for 1000000 groups
# eval(j) took 0.022s for 1000000 calls
# 0.230s elapsed (0.220s cpu)
# user system elapsed
# 0.22 0.03 0.27 Found this SO thread |
I don't think anybody cares about performance when verbose=TRUE, since it is for debugging |
Hm, I can understand that but switching to |
ok if you want to make the change, then make sure to add an atime performance test |
…ent)) is invalid or OS was the source of difference rather than the version
I can imagine that for verbose=T we could sometimes add extra forder calls (or potentially other heavy calls) just to report more information to the console. I would say this is acceptable. For performance verbose needs to be F. |
Edit: let's keep a distinction between CPU time spent executing instructions by the current process (returned by clock on non-Windows, getrusage on POSIX, possibly clock_gettime( Since the values of Empirically, on |
Thanks for pointing out
0b3cd65#diff-4052380a202e1373dcfdfa2fff9850d61b26758fc0475260b2871e70f0d153a1R129
r-devel/r-svn@8cf6136 I would suggest starting this as a discussion on r-devel first; at a minimum, it's out of scope here & gets its own issue. |
Seen when benchmarking #6228 , seems that by operations run much slower with
verbose=TRUE
My initial guess for the culprits are the numerous calls to
clock()
whenverbose=TRUE
#
Output of sessionInfo()
The text was updated successfully, but these errors were encountered: