第2页 - 关于cbind的讨论汇总 - 话题女王

n*********e
发帖数: 318

来自主题: Statistics版 - 这段R logistic regression code有没有问题？

I am doing an R logistic regression exercise -
My question is - 是否要先从validation set 中删掉 dependent variable, 然后再 run
prediction？
谢谢。
--------------------
library(MASS)
attach(birthwt) #The famous 'low birth weight' data for logistic regression
index <- 1:dim(birthwt)[1]
test<- sample(index, trunc(length(index)/3))
train<-birthwt[-test,]
validation <- birthwt[test,]
logit.1<-glm(low~., data=train, family=binomial(link='logit'))
logit.1
#------------------------------
#这里是否要先从validation set 中删掉 dep... 阅读全帖

t******g
发帖数: 372

来自主题: Statistics版 - How to do 'look up' in R?

datanew<-cbind(data, as.character(ref[match(data[, 'prod_id'], ref[,'prod_id
']),'prod_name']))

c*****l
发帖数: 1493

来自主题: Statistics版 - any one did EM to find MLE of mixed effects model in R

simulating a very basic model: Y|b=X*\beta+Z*b +\sigma^2* diag(ni);
b~N(0,\psi) #bivariate normal
where b is the latent variable, Z and X are ni*2 design matrices, sigma is
the error variance,
Y are longitudinal data, i.e. there are ni measurements for object i.
Parameters are \beta, \sigma, \psi; call them \theta.
I wrote a EM, the M step is to maximize the log(f(Y,b;\theta)) as the
regular way,
the E step involves the evaluation of E step, using Gau... 阅读全帖

O*****y
发帖数: 222

来自主题: Statistics版 - 关于处理两个csv文件交集和子集的问题

a <- read.csv("A.csv", header=TRUE)
b <- read.csv("B.csv", header=TRUE)
new1 <- cbind(a, b[, setdiff(colnames(b), colnames(a))])
new2 <- b[, setdiff(colnames(b), colnames(a))]

f******9
发帖数: 267

来自主题: Statistics版 - R read many files

请教一个问题，In R, 如何同时 read in 1000 files，然后 assign 不同的名字 to
each file, dat1, dat2, ... , dat1000
这些 files, row length 一样，col length 不同，如何 cbind 这1000 files into a
single file?

a****y
发帖数: 91

来自主题: Statistics版 - Question for Stratify sampling.

I am trying to understand the sampling from the following description. Does
anyone know how they get the sample stratum sizes: 10,5,10,4,6. Thanks a lot!
Generates artificial data (a 235X3 matrix with 3 columns: state, region,
income).
# The variable "state" has 2 categories (nc and sc).
# The variable "region" has 3 categories (1, 2 and 3).
# The sampling frame is stratified by region within state.
data=rbind(matrix(rep("nc",165),165,1,byrow=TRUE),matrix(rep("sc",70),70,1,
byrow=TRUE))
data=cbi... 阅读全帖

a**j
发帖数: 60

来自主题: Statistics版 - correlation clustering用spss, stata, 或 r怎么做

Using R:
#dendogram for unsplitted efron2004 data set
install.packages("care")
library("care")
install.packages("rpart")
library("rpart")
install.packages("partykit")
library("partykit")
data(efron2004)
attach(efron2004)
efron2004
efron2004_rpart<-rpart(y~x[,1]+x[,2]+x[,3]+x[,4]+x[,5]+x[,6]+x[,7]+x[,8]+x[,
9]+x[,10], data=efron2004, control=rpart.control(numsplit=10))
plot(as.party(efron2004_rpart),main = "Dendogram of Y~ Xis for Efron data
sets", font.main = 4)
#dendograms for splitted efron20... 阅读全帖

t*****w
发帖数: 254

来自主题: Statistics版 - 怎样来选这些dyads

answer is the following;
your final result is the following:
student teacher senior
2732 3465 1
3347 3837 1
1179 1693 1
3875 1711 1
3875 2059 1
2032 1784 1
2848 3921 1
2148 1416 1
3038 1434 1
3530 2037 1
2585 3811 1
1481 3954 1
... 阅读全帖

c***z
发帖数: 6348

来自主题: Statistics版 - 哪个帅哥来帮我做道统计学的题啊？

## build data frame
work <- c(12, 14, 4, 16, 12, 20, 25, 8, 24, 28, 4, 15)
edu <- c(6,3,8,8,4,4,1,3,12,9,11,4)
income <- c(34.7, 17.9, 22.7, 63.1, 33.0, 41.4, 20.7, 14.6, 97.3, 72.1, 49.1
, 52.0)
studay.df <- data.frame(cbind(work, edu, income))
## linear model
model_3 <- lm(income ~ ., data = studay.df) # OLS
summary_table <- data.frame(summary(model_3)$coefficients)
colnames(summary_table) <- c("coef", "std.error", "t_value", "p_value")
summary_table$regressor <- row.names(summary_table)
s... 阅读全帖

f*******m
发帖数: 94

来自主题: Statistics版 - 问个简单的SAS问题！

我的问题是这样的：在long form 的数据里面需要加一列，在R里面是这样做：
xx <- rep(1:20, each=20)
yy <- rep(xx, each=100)
然后用cbind将yy作为一列合并到原来的数据当中，请问如何在SAS里面完成这样的操作
呢？问题非常简单，有点不好意思来问，但是确实不知道怎么做，非常感谢走过的路过
的能帮个忙!
祝大家正在找工作的都能找到工作，已经找到工作的都工作顺利，谢谢！

w*****1
发帖数: 473

来自主题: Statistics版 - 用R画图如何设置X-Y轴的比例 (转载)

我在画图之前设置了图的宽度和高度，
下面是我的code:
mhtdata=read.table('mht-bsbp.txt',head=T)
> png("bsbp.png",width=4,height=3)
> data <- with(mhtdata,cbind(chr,pos,PVALUE))
> par(las=2, xpd=TRUE, cex.axis=1.4, cex=1.2)
> color <- rep(c("black","red"),11)
> ops <- mht.control(colors=color,yline=1.5,xline=3,srt=0)
> mhtplot(data,ops,pch=19)
Loading required package: grid
结果出现这样的提示：
Error in plot.new() : figure margins too large
该如何设置参数呢？谢谢

s***y
发帖数: 1130

来自主题: Statistics版 - 用R画图如何设置X-Y轴的比例 (转载)

png("bsbp.png",width=4,height=3,units="in")
默认单位是像素点pixel啊。。

我在画图之前设置了图的宽度和高度，
下面是我的code:
mhtdata=read.table('mht-bsbp.txt',head=T)
> png("bsbp.png",width=4,height=3)
> data <- with(mhtdata,cbind(chr,pos,PVALUE))
> par(las=2, xpd=TRUE, cex.axis=1.4, cex=1.2)
> color <- rep(c("black","red"),11)
> ops <- mht.control(colors=color,yline=1.5,xline=3,srt=0)
> mhtplot(data,ops,pch=19)
Loading required package: grid
结果出现这样的提示：
Error in plot.new() : figure margins too large
该如何设置参数呢？谢谢

v*******e
发帖数: 133

来自主题: Statistics版 - 求一个简易的R Code

下面code可以，但是我觉得还是太复杂了
Product=c("A","A","A","B","B","C")
Color=c("red","yellow","black","yellow","white","black")
df1=data.frame(Product,Color)
b=aggregate(Color~Product, data = df1, FUN=paste, collapse = " ")
c <- strsplit((b$Color), " ")
maxLen <- max(sapply(c, length))
d<- as.data.frame(t(sapply(c, function(x) c(x, rep(" ", maxLen - length(x)))
)))
colnames(d) <- paste("Color", 1:maxLen, sep="")
df2=cbind(df1[,-c(2)], d)

m*****n
发帖数: 3575

来自主题: Statistics版 - 新手学R的困惑。

R是一门很讨厌的语言；语法很不规范，里面有很多经验性的东西。
不存在绝对意义上的学会。你把R in Nutshell全学会算入门，R in Action全会了算进
阶。但是这还不算够，很多东西只有在工作中碰到，发现是坑，才算学会。
例如循环里面忌讳用 c, cbind, rbind这么惨痛的坑，哪本R教材写过？

发帖数: 1

来自主题: Statistics版 - 新手学R的困惑。

R里面用循环本来就是坑，有经验的用户都会尽量避免for循环。什么c，cbind，
rbind更是能不用就别用，除非万不得已，都是能提前申明先申明。
R的很多经验教训都是在论坛里藏着，确实没有一本书在那，需要找个有经验的人看看
你的code再给你提点意见。

c***z
发帖数: 6348

来自主题: DataSciences版 - generating percentile-percentage charts

老板又有新花样，这次要cumulative的percentages
patient_percentiles_cum <- patient_percentiles_fin[, c(1,102)]
colnames(patient_percentiles_cum)[2] <- "top.0"
for (k in 1:100) {
# k <- 1

temp <- patient_percentiles_fin[, c(102:(102-k))]

top <- apply(temp,
1,
FUN = sum)
top <- data.frame(top)

patient_percentiles_cum <- cbind(patient_percentiles_cum,
top)

colnames(patient_percentiles_cum)[2+k] <- paste("top",
... 阅读全帖

k*******a
发帖数: 772

来自主题: DataSciences版 - 板上R高手多，包子求R数据输出到CSV方法

list里面vector如果都一样长，可以转换为data frame
比如
cbind(df$itemsetID, as.data.frame(applist), as.data.frame(scorelist))
再调调列顺序就可以了
然后合并就可以了

n*****3
发帖数: 1584

来自主题: DataSciences版 - 板上R高手多，包子求R数据输出到CSV方法

if 长度相同，都是N
as.data.frame converge them to data frame.
then cbind
then write.csv

t*****e
发帖数: 364

来自主题: DataSciences版 - random forest 有没有可能保证某几个变量一直被选上

sorry I mean using the package tree, rpart seems to be only for regression
tree
below is some sample code, 写的太仓促难免有bug, 不过你可以大概看一下logic
require(tree)
A_train = matrix(rnorm(20000),nc=20)
Label = rnorm(nrow(A_train))
df_train = data.frame(A_train,Label)
A_test = matrix(rnorm(20000),nc=20)
Label_test = rnorm(nrow(A_test))
df_test = data.frame(A_test,Label)
Prob_all = NULL
for (k in 1:100) {
index = sample(1:length(Label),length(Label),replace = T)
indF = sample(NonSelected,floor(length(NonSele... 阅读全帖

t*****e
发帖数: 364

来自主题: DataSciences版 - random forest 有没有可能保证某几个变量一直被选上

k*******a
发帖数: 772

来自主题: DataSciences版 - R问题请教

m <- matrix(0, nrow = max(i), ncol = max(j))
m[cbind(i, j)] <- count

s****h
发帖数: 3979

来自主题: DataSciences版 - R问题请教

多谢各位回复啊
给m[cbind(i, j)] <- count 点个赞
俺怎么没想到呢？

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天