s*****n 发帖数: 2174 | 1 data <- read.csv("yourfile.txt", header = F)
unlist(strsplit(as.character(data$V1), split="")) |
|
q**j 发帖数: 10612 | 2 我今天又试了一下,接近成功了一点。
y = unlist(lapply(Data$Date, seq,length=12))
z = matrix(y,nrow=8640,ncol=12,byrow=T)
可以生成这样的一个矩阵。
但是有几个新问题。
1. lapply不让我用by="1 month"这样的参数,所以我得到了12个连续的日子,而不是
月份。
2. 生成的z是数字而非日期。我检查了,这个数字是正确从19700101开始的天数,请问
如何把这样一个数字矩阵转换成为日期矩阵?
最后一个小问题:as.date和as.Date有什么区别?看了一遍manual没有什么概念。 |
|
y****2 发帖数: 34 | 3 I am sorry, in the "select" term, better to use: -unlist(list2). Good luck! |
|
s*****n 发帖数: 2174 | 4 Of course tapply() is not the only way to do it. Read the help file of tappl
y(). If it is still unclear to you, you probably should write a loop to achi
eve your goal.
I can give you codes, but I do not suggest you use them if you do not
understa
nd them:
## Assume your data frame is called data
temp <- tapply(
data$Revenue,
paste(data$Sales_person,
as.Date(data$DATE_time),
sep = " "),
sum)
result <- data.frame(
matrix(unlist(strsplit(names(temp), split = " ")),
|
|
q**j 发帖数: 10612 | 5 Good.多谢。我用了一个array(unlist())好像很笨的样子。 |
|
b*****n 发帖数: 685 | 6 unlist should be enough? |
|
|
g********r 发帖数: 8017 | 8 matrix(unlist(),nrow=4)[2,]
4X100 |
|
h******e 发帖数: 6 | 9 unlist(lapply(split, function(x) {x[2]})) |
|
d*******1 发帖数: 854 | 10 需要transpose 才行:
split<- strsplit(test$Experiment, '_')
msplit<- t(matrix(unlist(split),nrow=4))
test$time<- msplit[,3] |
|
s*****n 发帖数: 2174 | 11 这些apply的overhead cost比较高, 所以对于循环体简单的loop, 未必会比for loop快
, 很多时候更慢, 比如:
> system.time(for (i in 1:100000) {1+1})
[1] 0.11 0.00 0.11 NA NA
> system.time(lapply(1:100000, function(i) {1+1}))
[1] 0.18 0.00 0.19 NA NA
如果仅仅是用apply来代替循环, 意义可能不是很大. 大多数apply都是用于某种直接的
计算, 很方便.
在几个apply当中, lapply是最基本的, sapply, tapply, apply本质上都是lapply的包
装, 大多数时候lapply稍快一些, 但是另外几个往往看上去更简洁. 比如:
> data <- data.frame(
+ id = rep(1:1000, each = 1000),
+ value = rnorm(1000 * 1000)
+ )
>
> system.time(unlist(lapply(spl |
|
f***a 发帖数: 329 | 12 a <- rep(1,15)
b <- rep(1:3,each=5)
tt <- c( "1,1,1,1,1,1,1,1,1,1,1,1,1,1,1"
,"1,1,1,1,1,2,2,2,2,2,3,3,3,3,3")
ind.a <- which(apply(data.frame(tt),1,function(t)
sum(as.numeric(unlist(strsplit(t,",",fixed=T)))-a)
)==0)
tt是你要检验的vector, ind.a是结果(index of elements which can be replaced
by a). 同理要检验b的话,在checking function中用b替换a就行了。
希望我看懂了你的意思,呵呵~
这个是假定element长度都是15的情况,不是的话添加检验长度的语句会更efficient些。 |
|
f***a 发帖数: 329 | 13 xx <- as.numeric(unlist(strsplit(x,"_")))
xx[!is.na(xx)] |
|
f***a 发帖数: 329 | 14 回来了回来了。
重新想了下,这个其实就是在一堆iid variables之间加了一个constraint。貌似
sample起来不难。
以最简单的n=2,m=1为例:
Without constraint, outputs space is {(0,0),(0,1),(1,0),(1,1)}.
The corresponding probability space is {(1-p1)*(1-p2), ..., p1*p2}.
With constraint, outputs space is O={(0,1),(1,0)}.
The corresponding probability space is P={(1-p1)*(p2), p1*(1-p2)}.
Under the constrain, standardize the probability space into
P.std={P1/(P1+P2),P2/(P1+P2)}.
Then under constrain, output (0,1) has the probability P1/(P1+P2) to be
sam... 阅读全帖 |
|
f***a 发帖数: 329 | 15 回来了回来了。
重新想了下,这个其实就是在一堆iid variables之间加了一个constraint。貌似
sample起来不难。
以最简单的n=2,m=1为例:
Without constraint, outputs space is {(0,0),(0,1),(1,0),(1,1)}.
The corresponding probability space is {(1-p1)*(1-p2), ..., p1*p2}.
With constraint, outputs space is O={(0,1),(1,0)}.
The corresponding probability space is P={(1-p1)*(p2), p1*(1-p2)}.
Under the constrain, standardize the probability space into
P.std={P1/(P1+P2),P2/(P1+P2)}.
Then under constrain, output (0,1) has the probability P1/(P1+P2) to be
sam... 阅读全帖 |
|
s*****n 发帖数: 2174 | 16 I just give you a hint, of course you need to modify it to fit what you need
. for example
> data
V1
1 ABCDE
2 ABCDE
3 ABCDE
> t(sapply(1:dim(data)[1], function(i) unlist(strsplit(data$V1[i], split = "
"))))
[,1] [,2] [,3] [,4] [,5]
[1,] "A" "B" "C" "D" "E"
[2,] "A" "B" "C" "D" "E"
[3,] "A" "B" "C" "D" "E" |
|
s*****n 发帖数: 2174 | 17 就用seq.Date就可以啊, 为什么不行?
顶多就是稍微做个简单的wrapper而已.
date.increment <- function(date.list, by){
num.days <- unlist(lapply(1:length(date.list),
function(i) seq.Date(from = as.Date(date.list[i]), by = by, length =
2)[2]))
return(as.Date(num.days, origin = "1970-01-01"))
}
date.increment(c("2010-10-01", "2010-10-10"), by = "1 month") |
|
a********s 发帖数: 188 | 18 Use "unlist" and then "matrix". |
|
l*********s 发帖数: 5409 | 19 say, d <- "12/10/2001",
datastruct <- as.numeric( unlist( strsplit(d, "/")) )
datastruct is a tuple of (month, day,year). You shall be able to figure out
the rest stuff on your own now.
baozi plz. |
|
n*********e 发帖数: 318 | 20 I am trying to achieve this:
- for each customer, how many unique products that customer has ordered?
Here is data -
#----------------------
customer_id, product_id, date
11111,634578,11/12/2011
11111,987654,11/12/2011
11111,678978,11/12/2011
11111,678978,12/22/2011
22222,456789,12/24/2011
33333,678978,01/10/2012
33333,678978,01/15/2012
44444,987365,03/30/2012
Here is my R code -
#-------------------------------------------------------------------
t<-read.table('C:\user_item_dt.txt',sep=',',head... 阅读全帖 |
|
c***z 发帖数: 6348 | 21 能详解一下么
我倒是找到了一个法子
首先列出目录下的文件和子目录,文件直接下载,子目录调用本函数(递归)
但是下载下来的文件大小不对,大侠能帮忙看看么
library("RCurl")
# ==========================================================================
====
# Function that downloads files from URL
# ==========================================================================
====
fdownload <- function(sourcelink) {
# sourcelink <- ftp.root # test, root level
# sourcelink <- dirs[1] # test, second level
targetlink <- paste(dropbox.root, substr(sourcelink, nchar(ftp.root)+... 阅读全帖 |
|
c***z 发帖数: 6348 | 22 I got a working version now:
#=====================================================================
# Function that downloads files from URL
#=====================================================================
fdownload <- function(sourcelink) {
# sourcelink <- ftp.root # test, root level
# sourcelink <- dirs[1] # test, second level
targetlink <- paste(dropbox.root, substr(sourcelink, nchar(ftp.root)+1,
nchar(sourcelink)), sep = '')
# list of contents
filenames <- getURL(sourceli... 阅读全帖 |
|
c*********t 发帖数: 340 | 23 想不出更好的办法,对rcurl不是很熟
但是有个笨办法供lz参考
既然是fixed length就找出想要的column的位置:)
> grep("M",unlist(strsplit(files[1],"")))
47
> substr(files,47,47+11)
[1] "Mar 26 16:16" "Mar 26 17:02" "Mar 28 10:05" "Mar 28 10:05" "Mar 28 10:
05" "Mar 28 10:05" "Mar 28 10:05" "Mar 28 10:05" "Mar 28 10:05"
[10] "Mar 28 10:05" |
|
c*****m 发帖数: 4817 | 24 你想得到什么呢? 一个vector? 用你的例子,x value以内rank y的结果就是y吧
unlist(tapply(y, x, rank)) |
|
s*********e 发帖数: 1051 | 25 matrix(unlist(your_result), ncol = 3, dimnames = list(1:5, c('mean', 'sd', '
max'))) |
|
w*******9 发帖数: 1433 | 26 unlist() will do the job |
|
p*****n 发帖数: 265 | 27 谢谢,output是
[1] "data.frame"
是这个问题吗?不过我用 unlist()就好了,呵呵 |
|
t*****w 发帖数: 254 | 28 When I had my job interview, they always tested my SAS skill.However I use R
all the time. To help your preparation, read my R codes to see how much you
can understand it.
%in%
?keyword
a<-matrix(0,nrow=3,ncol=3,byrow=T)
a1 <- a1/(t(a1)%*%spooled%*%a1)^.5 #standadization in discrim
a1<- a>=2; a[a1]
abline(h = -1:5, v = -2:3, col = "lightgray", lty=3)
abline(h=0, v=0, col = "gray60")
abs(r2[i])>r0
aggregate(iris[,1:4], list(iris$Species), mean)
AND: &; OR: |; NOT: !
anova(lm(data1[,3]~data1[,1... 阅读全帖 |
|
d*******7 发帖数: 118 | 29 s<-"123ABC45"
m<-unlist(strsplit(s,""))
paste(m[grep("[A-Z]",m)[1]:nchar(s)],collapse="")
123ABC45 |
|
f***8 发帖数: 571 | 30 "任取一个数据,找出和它最‘相似’的10个数据,"这个好办,用apply就可以,比如:
sort(apply(mtcars[-1, ], 1, function(x) cor(x, unlist(mtcars[1, ]))),
decreasing=TRUE)[1:10]
至于categorical的数据如何处理,需要自己定义一个cor 函数. |
|
p****r 发帖数: 46 | 31 # create matrix from applist, then transpose it
# so the matrix is N rows * 10 columns
app <- t(data.frame(applist))
# Same for scorelist
score<- t(data.frame(scorelist))
# generate column sequence (1,11,2,12...10,20) so as to reorder them after
cbind
cols <- rep(1:10,each=2)+rep(c(0,10),10)
# or you can do cols <- unlist(sapply(1:10,function(x) list(x,x+10)))
data <- cbind(app,score)
# reorder columns
data <- data[,cols]
# generate col_names: "applist1", "scorelist1", "applist2","scorelist2"...... 阅读全帖 |
|
Y****a 发帖数: 243 | 32 my guess is:
tmpTestIn@data@data@Dim[2] = 50
topn = 1000
Q1: length(v2) = 50, length(v2[[1]]) = 1000
v2 <- unlist(v2)
Q2: assume now your create df returns a data.frame of dim 50,000 x 7
dfall <- do.call('rbind',lapply(range of i, function(i) create df)
Q3:
dfall$v8 <- F(v4, v5, v6)
dfall$v9 <- 0
for (u in userIDs) {
for (i in itemIDs) {
idx <- which((dfall$userID == u) & (dfall$itemID == i))
dfall$v9[idx] <- order(dfall$v8[idx])
}
} |
|
|
w*******y 发帖数: 60932 | 34 Ebay is running a triple points offer for today 12/13 only. Additionally
there is a double points for gift cards through gift mall. For some reason
they are doing an unlisted combo on the gift cards giving you 4X Ebay points
(though it's not 6X points which would be 12%). I turns out to be 8% in
Ebay bucks.
Link:
http://stores.ebay.com/giftcardmall
I received $25 gift cards to Rock bottom with $2 Ebay Bucks for each one.
Combine with Discover Card 2% cashback for internet purchases through the
e... 阅读全帖 |
|
D**p 发帖数: 293 | 35 nothing wrong being a frog.
As you correctly pointed out, it should be 19 ponts, rather than 18. There
are also some other unlisted points. But I believe those 18 points are the
most important ones. Without this 3.5% thing, there still might be a second
ressession, but the chance is smaller. It might be carefully managed into an
L-shapped recovery. However, I do not see a real significant difference
between an L-shpped recovery and a small double-dip. In some people's mind,
an L-shpped recovery |
|
l*********y 发帖数: 3447 | 36 sorry, change to unlisted |
|