由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Statistics版 - Huge speed increase in R for scalar intensive computation
相关主题
为什么biglm能处理data sets larger than memory?overall mean in sas for several variables
怎样来选这些dyads两个有关于R的小问题?
如何在R里面对一整列数据进行操作?R里怎么取出一个数组的1/3?
R doesnt have pass by reference mechanism?请教一个概率题的思路
请教sas code问题R question
请教一sas programmmUrgent R Question
突然对直线拟合的R不明白起来了关于R的Simplex的错误信息
请教 SAS macro function 的问题R 画图问题求教
相关话题的讨论汇总
话题: tree话题: bw2话题: diff1话题: bw话题: points
进入Statistics版参与讨论
1 (共1页)
a*****n
发帖数: 230
1
I have been waiting for this for 10 years. But this noon, I saw this line:
The byte code compiler and interpreter now include new instructions that
allow many scalar subsetting and assignment and scalar arithmetic operations
to be handled more efficiently. This can result in significant performance
improvements in scalar numerical code.
I immediately downloaded the latest R build and benchmark against a KD-tree
code I wrote.
R Build from 20 days ago: 8.45 minutes
R build of today: 5.01 minutes
A huge 40% reduction of computing time. Thank you, Prof. Luke Tierney. My
understanding is more improvement in R byte code compiler is coming.
###### code for you to try out yourself ###################
library(compiler)
enableJIT(3)
#library(inline)
#points are the d by n matrix of the source points
## create an empty node
newtree = function(){ list(ll=NULL, ul=NULL, L=NULL, R=NULL) } ############
########################################################################
##ll: box lower limit, ul: box upper limit, simpler name saves memory
##L: left node, R: Right node
## add a node to the kdtree
addNode = function(tree, points)
{
if(dim(points)[2]<=50) tree$ll = points #no more splitting for 50 points
, the upper.limit is null in this case
else
{
tree$ll = apply(points, 1, min)
tree$ul = apply(points, 1, max)
#preparing for the left and right tree
diff1 = tree$ul - tree$ll
split.dim = which.max(diff1)

split.point = median(points[split.dim, ]) #median can behave badly
when there are ties, so the next line
if(split.point==tree$ll[split.dim] || split.point==tree$ul[split.dim]
)
{
p = ncol(points)
index1 = rep(c(TRUE, FALSE), times = p) #half left and half right
index1 = index1[1:p]
}
else index1 = (points[split.dim,] < split.point)

tree$L = addNode(newtree(), points[,index1,drop=F])
tree$R = addNode(newtree(), points[,!index1,drop=F])
}
tree
}
# x is two dimensional matrix, y is a vector
Epanech.kernel.R <- function(d, n, bw2, y, x)
{
sum1 = 0; loop2 = 1:d
for(j in 1:n)
{
value1 = 1
for(i in loop2)
{
diff1 = (y[i] - x[i,j])^2L
if(diff1>bw2) {value1 = 0; break}
else value1 = value1*(bw2 - diff1)
}
sum1 = sum1 + value1
}
sum1
}
##########
Epanech.kernel.R2 <- function(d, n, bw2, y, x)
{
#y: vector of length d, x: matrix of d*n, bw2: scalar
diff1 = bw2 - (y - x)^2L
diff1[diff1<0] = 0

temp1 = diff1[1,]
for(i in 2:d) temp1 = temp1*diff1[i,]

sum(temp1)
}

evaluate.element = function(target.element, bw, bw2, d)
{
lower.boundry = target.element - bw
upper.boundry = target.element + bw
func1 = function(tree)
{
#if(is.null(tree$ul)) Epanech.kernel.fortran(d, dim(tree$ll)[2], bw2,
target.element, tree$ll, sum1<-0)$sum1
if(is.null(tree$ul)) Epanech.kernel.R(d, dim(tree$ll)[2], bw2, target.
element, tree$ll)

else if(any(tree$ll > upper.boundry)) 0
else if(any(tree$ul < lower.boundry)) 0

else func1(tree$L) + func1(tree$R)
}
func1
}
kernel.estimate.kdTree = function(target, bw, tree) #tree is a kd-tree built
from source
{
d = nrow(target)
n = ncol(target)
bw2 = bw*bw
estimate = numeric(n)
for(j in 1:n)
{
func1 = evaluate.element(target[,j], bw, bw2, d)
estimate[j] = func1(tree)
}

const = (1/n)*(3/4/bw)^d
const = const/bw^(2*d) #additional term missing from function above
estimate*const
}
main1 = function(n, d)
{
bw = 1.4794953 - 1/(d+2)*log(n)
x = rnorm(n*d);
dim(x) = c(d, n)
tree = addNode(newtree(), x)
print( system.time( result <- kernel.estimate.kdTree(x, exp(bw), tree) )
)
hist(result)
}
n = 4000
main1(n, 4)
l******n
发帖数: 9344
2
很多公司都在做类似的东西,有些直接重写interpreter,效率更高

operations
performance
tree

【在 a*****n 的大作中提到】
: I have been waiting for this for 10 years. But this noon, I saw this line:
: The byte code compiler and interpreter now include new instructions that
: allow many scalar subsetting and assignment and scalar arithmetic operations
: to be handled more efficiently. This can result in significant performance
: improvements in scalar numerical code.
: I immediately downloaded the latest R build and benchmark against a KD-tree
: code I wrote.
: R Build from 20 days ago: 8.45 minutes
: R build of today: 5.01 minutes
: A huge 40% reduction of computing time. Thank you, Prof. Luke Tierney. My

1 (共1页)
进入Statistics版参与讨论
相关主题
R 画图问题求教请教sas code问题
【【泪奔求助】】R高手帮我看看请教一sas programmm
再问R的问题 - 关于matrix 的operation突然对直线拟合的R不明白起来了
a R loop question请教 SAS macro function 的问题
为什么biglm能处理data sets larger than memory?overall mean in sas for several variables
怎样来选这些dyads两个有关于R的小问题?
如何在R里面对一整列数据进行操作?R里怎么取出一个数组的1/3?
R doesnt have pass by reference mechanism?请教一个概率题的思路
相关话题的讨论汇总
话题: tree话题: bw2话题: diff1话题: bw话题: points