Huge speed increase in R for scalar intensive computation - Statistics版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - Huge speed increase in R for scalar intensive computation

相关主题
● 为什么biglm能处理data sets larger than memory?	● overall mean in sas for several variables
● 怎样来选这些dyads	● 两个有关于R的小问题？
● 如何在R里面对一整列数据进行操作？	● R里怎么取出一个数组的1/3?
● R doesnt have pass by reference mechanism?	● 请教一个概率题的思路
● 请教sas code问题	● R question
● 请教一sas programmm	● Urgent R Question
● 突然对直线拟合的R不明白起来了	● 关于R的Simplex的错误信息
● 请教 SAS macro function 的问题	● R 画图问题求教

相关话题的讨论汇总
话题: tree话题: bw2话题: diff1话题: bw话题: points

进入Statistics版参与讨论

(共1页)

a*****n
发帖数: 230

I have been waiting for this for 10 years. But this noon, I saw this line:
The byte code compiler and interpreter now include new instructions that
allow many scalar subsetting and assignment and scalar arithmetic operations
to be handled more efficiently. This can result in significant performance
improvements in scalar numerical code.
I immediately downloaded the latest R build and benchmark against a KD-tree
code I wrote.
R Build from 20 days ago: 8.45 minutes
R build of today: 5.01 minutes
A huge 40% reduction of computing time. Thank you, Prof. Luke Tierney. My
understanding is more improvement in R byte code compiler is coming.
###### code for you to try out yourself ###################
library(compiler)
enableJIT(3)
#library(inline)
#points are the d by n matrix of the source points
## create an empty node
newtree = function(){ list(ll=NULL, ul=NULL, L=NULL, R=NULL) } ############
########################################################################
##ll: box lower limit, ul: box upper limit, simpler name saves memory
##L: left node, R: Right node
## add a node to the kdtree
addNode = function(tree, points)
{
if(dim(points)[2]<=50) tree$ll = points #no more splitting for 50 points
, the upper.limit is null in this case
else
{
tree$ll = apply(points, 1, min)
tree$ul = apply(points, 1, max)
#preparing for the left and right tree
diff1 = tree$ul - tree$ll
split.dim = which.max(diff1)

split.point = median(points[split.dim, ]) #median can behave badly
when there are ties, so the next line
if(split.point==tree$ll[split.dim] || split.point==tree$ul[split.dim]
)
{
p = ncol(points)
index1 = rep(c(TRUE, FALSE), times = p) #half left and half right
index1 = index1[1:p]
}
else index1 = (points[split.dim,] < split.point)

tree$L = addNode(newtree(), points[,index1,drop=F])
tree$R = addNode(newtree(), points[,!index1,drop=F])
}
tree
}
# x is two dimensional matrix, y is a vector
Epanech.kernel.R <- function(d, n, bw2, y, x)
{
sum1 = 0; loop2 = 1:d
for(j in 1:n)
{
value1 = 1
for(i in loop2)
{
diff1 = (y[i] - x[i,j])^2L
if(diff1>bw2) {value1 = 0; break}
else value1 = value1*(bw2 - diff1)
}
sum1 = sum1 + value1
}
sum1
}
##########
Epanech.kernel.R2 <- function(d, n, bw2, y, x)
{
#y: vector of length d, x: matrix of d*n, bw2: scalar
diff1 = bw2 - (y - x)^2L
diff1[diff1<0] = 0

temp1 = diff1[1,]
for(i in 2:d) temp1 = temp1*diff1[i,]

sum(temp1)
}

evaluate.element = function(target.element, bw, bw2, d)
{
lower.boundry = target.element - bw
upper.boundry = target.element + bw
func1 = function(tree)
{
#if(is.null(tree$ul)) Epanech.kernel.fortran(d, dim(tree$ll)[2], bw2,
target.element, tree$ll, sum1<-0)$sum1
if(is.null(tree$ul)) Epanech.kernel.R(d, dim(tree$ll)[2], bw2, target.
element, tree$ll)

else if(any(tree$ll > upper.boundry)) 0
else if(any(tree$ul < lower.boundry)) 0

else func1(tree$L) + func1(tree$R)
}
func1
}
kernel.estimate.kdTree = function(target, bw, tree) #tree is a kd-tree built
from source
{
d = nrow(target)
n = ncol(target)
bw2 = bw*bw
estimate = numeric(n)
for(j in 1:n)
{
func1 = evaluate.element(target[,j], bw, bw2, d)
estimate[j] = func1(tree)
}

const = (1/n)*(3/4/bw)^d
const = const/bw^(2*d) #additional term missing from function above
estimate*const
}
main1 = function(n, d)
{
bw = 1.4794953 - 1/(d+2)*log(n)
x = rnorm(n*d);
dim(x) = c(d, n)
tree = addNode(newtree(), x)
print( system.time( result <- kernel.estimate.kdTree(x, exp(bw), tree) )
)
hist(result)
}
n = 4000
main1(n, 4)

l******n
发帖数: 9344

很多公司都在做类似的东西，有些直接重写interpreter，效率更高

operations
performance
tree

【在 a*****n 的大作中提到】

: I have been waiting for this for 10 years. But this noon, I saw this line:
: The byte code compiler and interpreter now include new instructions that
: allow many scalar subsetting and assignment and scalar arithmetic operations
: to be handled more efficiently. This can result in significant performance
: improvements in scalar numerical code.
: I immediately downloaded the latest R build and benchmark against a KD-tree
: code I wrote.
: R Build from 20 days ago: 8.45 minutes
: R build of today: 5.01 minutes
: A huge 40% reduction of computing time. Thank you, Prof. Luke Tierney. My

(共1页)

进入Statistics版参与讨论

相关主题
● R 画图问题求教	● 请教sas code问题
● 【【泪奔求助】】R高手帮我看看	● 请教一sas programmm
● 再问R的问题 - 关于matrix 的operation	● 突然对直线拟合的R不明白起来了
● a R loop question	● 请教 SAS macro function 的问题
● 为什么biglm能处理data sets larger than memory?	● overall mean in sas for several variables
● 怎样来选这些dyads	● 两个有关于R的小问题？
● 如何在R里面对一整列数据进行操作？	● R里怎么取出一个数组的1/3?
● R doesnt have pass by reference mechanism?	● 请教一个概率题的思路

相关话题的讨论汇总
话题: tree话题: bw2话题: diff1话题: bw话题: points

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天