R使用微基准进行基准测试
示例
您可以使用该microbenchmark程序包执行“亚毫秒级的准确表达表达计时”。
在此示例中,我们将data.table基于特定条件比较六个等效表达式的速度,以更新组中的元素。
进一步来说:
一个data.table有3列:id,time和status。对于每个id,我想查找具有最大时间的记录-然后,如果该记录的状态为true,则如果时间> 7,则要将其设置为false
library(microbenchmark)library(data.table)
set.seed(20160723)
dt <- data.table(id = c(rep(seq(1:10000), each = 10)),
time = c(rep(seq(1:10000), 10)),
status = c(sample(c(TRUE, FALSE), 10000*10, replace = TRUE)))
setkey(dt, id, time) ## create copies of the data so the 'updates-by-reference' don't affect other expressions
dt1 <- copy(dt)
dt2 <- copy(dt)
dt3 <- copy(dt)
dt4 <- copy(dt)
dt5 <- copy(dt)
dt6 <- copy(dt)
microbenchmark(
expression_1 = {
dt1[ dt1[order(time), .I[.N], by = id]$V1, status := status * time < 7 ]
},
expression_2 = {
dt2[,status := c(.SD[-.N, status], .SD[.N, status * time > 7]), by = id]
},
expression_3 = {
dt3[dt3[,.N, by = id][,cumsum(N)], status := status * time > 7]
},
expression_4 = {
y <- dt4[,.SD[.N],by=id]
dt4[y, status := status & time > 7]
},
expression_5 = {
y <- dt5[, .SD[.N, .(time, status)], by = id][time > 7 & status]
dt5[y, status := FALSE]
},
expression_6 = {
dt6[ dt6[, .I == .I[which.max(time)], by = id]$V1 & time > 7, status := FALSE]
},
times = 10L ## specify the number of times each expression is evaluated
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# expression_1 11.646149 13.201670 16.808399 15.643384 18.78640 26.321346 10
# expression_2 8051.898126 8777.016935 9238.323459 8979.553856 9281.93377 12610.869058 10
# expression_3 3.208773 3.385841 4.207903 4.089515 4.70146 5.654702 10
# expression_4 15.758441 16.247833 20.677038 19.028982 21.04170 36.373153 10
# expression_5 7552.970295 8051.080753 8702.064620 8861.608629 9308.62842 9722.234921 10
# expression_6 18.403105 18.812785 22.427984 21.966764 24.66930 28.607064 10
输出显示此测试expression_3中最快。
参考文献
data.table-添加和修改列
data.table-data.table中的特殊分组符号
以上是 R使用微基准进行基准测试 的全部内容, 来源链接: utcz.com/z/330699.html