n*********e 发帖数: 318 | 1 I am doing an R logistic regression exercise -
My question is - 是否要先从validation set 中删掉 dependent variable, 然后再 run
prediction?
谢谢。
--------------------
library(MASS)
attach(birthwt) #The famous 'low birth weight' data for logistic regression
index <- 1:dim(birthwt)[1]
test<- sample(index, trunc(length(index)/3))
train<-birthwt[-test,]
validation <- birthwt[test,]
logit.1<-glm(low~., data=train, family=binomial(link='logit'))
logit.1
#------------------------------
#这里是否要先从validation set 中删掉 dep... 阅读全帖 |
|
t******g 发帖数: 372 | 2 datanew<-cbind(data, as.character(ref[match(data[, 'prod_id'], ref[,'prod_id
']),'prod_name'])) |
|
c*****l 发帖数: 1493 | 3 simulating a very basic model: Y|b=X*\beta+Z*b +\sigma^2* diag(ni);
b~N(0,\psi) #bivariate normal
where b is the latent variable, Z and X are ni*2 design matrices, sigma is
the error variance,
Y are longitudinal data, i.e. there are ni measurements for object i.
Parameters are \beta, \sigma, \psi; call them \theta.
I wrote a EM, the M step is to maximize the log(f(Y,b;\theta)) as the
regular way,
the E step involves the evaluation of E step, using Gau... 阅读全帖 |
|
O*****y 发帖数: 222 | 4 a <- read.csv("A.csv", header=TRUE)
b <- read.csv("B.csv", header=TRUE)
new1 <- cbind(a, b[, setdiff(colnames(b), colnames(a))])
new2 <- b[, setdiff(colnames(b), colnames(a))] |
|
f******9 发帖数: 267 | 5 请教一个问题,In R, 如何同时 read in 1000 files,然后 assign 不同的名字 to
each file, dat1, dat2, ... , dat1000
这些 files, row length 一样,col length 不同,如何 cbind 这1000 files into a
single file? |
|
a****y 发帖数: 91 | 6 I am trying to understand the sampling from the following description. Does
anyone know how they get the sample stratum sizes: 10,5,10,4,6. Thanks a lot!
Generates artificial data (a 235X3 matrix with 3 columns: state, region,
income).
# The variable "state" has 2 categories (nc and sc).
# The variable "region" has 3 categories (1, 2 and 3).
# The sampling frame is stratified by region within state.
data=rbind(matrix(rep("nc",165),165,1,byrow=TRUE),matrix(rep("sc",70),70,1,
byrow=TRUE))
data=cbi... 阅读全帖 |
|
a**j 发帖数: 60 | 7
Using R:
#dendogram for unsplitted efron2004 data set
install.packages("care")
library("care")
install.packages("rpart")
library("rpart")
install.packages("partykit")
library("partykit")
data(efron2004)
attach(efron2004)
efron2004
efron2004_rpart<-rpart(y~x[,1]+x[,2]+x[,3]+x[,4]+x[,5]+x[,6]+x[,7]+x[,8]+x[,
9]+x[,10], data=efron2004, control=rpart.control(numsplit=10))
plot(as.party(efron2004_rpart),main = "Dendogram of Y~ Xis for Efron data
sets", font.main = 4)
#dendograms for splitted efron20... 阅读全帖 |
|
t*****w 发帖数: 254 | 8 answer is the following;
your final result is the following:
student teacher senior
2732 3465 1
3347 3837 1
1179 1693 1
3875 1711 1
3875 2059 1
2032 1784 1
2848 3921 1
2148 1416 1
3038 1434 1
3530 2037 1
2585 3811 1
1481 3954 1
... 阅读全帖 |
|
c***z 发帖数: 6348 | 9 ## build data frame
work <- c(12, 14, 4, 16, 12, 20, 25, 8, 24, 28, 4, 15)
edu <- c(6,3,8,8,4,4,1,3,12,9,11,4)
income <- c(34.7, 17.9, 22.7, 63.1, 33.0, 41.4, 20.7, 14.6, 97.3, 72.1, 49.1
, 52.0)
studay.df <- data.frame(cbind(work, edu, income))
## linear model
model_3 <- lm(income ~ ., data = studay.df) # OLS
summary_table <- data.frame(summary(model_3)$coefficients)
colnames(summary_table) <- c("coef", "std.error", "t_value", "p_value")
summary_table$regressor <- row.names(summary_table)
s... 阅读全帖 |
|
f*******m 发帖数: 94 | 10 我的问题是这样的:在long form 的数据里面需要加一列,在R里面是这样做:
xx <- rep(1:20, each=20)
yy <- rep(xx, each=100)
然后用cbind将yy作为一列合并到原来的数据当中,请问如何在SAS里面完成这样的操作
呢?问题非常简单,有点不好意思来问,但是确实不知道怎么做,非常感谢走过的路过
的能帮个忙!
祝大家正在找工作的都能找到工作,已经找到工作的都工作顺利,谢谢! |
|
w*****1 发帖数: 473 | 11 我在画图之前设置了图的宽度和高度,
下面是我的code:
mhtdata=read.table('mht-bsbp.txt',head=T)
> png("bsbp.png",width=4,height=3)
> data <- with(mhtdata,cbind(chr,pos,PVALUE))
> par(las=2, xpd=TRUE, cex.axis=1.4, cex=1.2)
> color <- rep(c("black","red"),11)
> ops <- mht.control(colors=color,yline=1.5,xline=3,srt=0)
> mhtplot(data,ops,pch=19)
Loading required package: grid
结果出现这样的提示:
Error in plot.new() : figure margins too large
该如何设置参数呢?谢谢 |
|
s***y 发帖数: 1130 | 12 png("bsbp.png",width=4,height=3,units="in")
默认单位是像素点pixel啊。。
我在画图之前设置了图的宽度和高度,
下面是我的code:
mhtdata=read.table('mht-bsbp.txt',head=T)
> png("bsbp.png",width=4,height=3)
> data <- with(mhtdata,cbind(chr,pos,PVALUE))
> par(las=2, xpd=TRUE, cex.axis=1.4, cex=1.2)
> color <- rep(c("black","red"),11)
> ops <- mht.control(colors=color,yline=1.5,xline=3,srt=0)
> mhtplot(data,ops,pch=19)
Loading required package: grid
结果出现这样的提示:
Error in plot.new() : figure margins too large
该如何设置参数呢?谢谢 |
|
v*******e 发帖数: 133 | 13 下面code可以,但是我觉得还是太复杂了
Product=c("A","A","A","B","B","C")
Color=c("red","yellow","black","yellow","white","black")
df1=data.frame(Product,Color)
b=aggregate(Color~Product, data = df1, FUN=paste, collapse = " ")
c <- strsplit((b$Color), " ")
maxLen <- max(sapply(c, length))
d<- as.data.frame(t(sapply(c, function(x) c(x, rep(" ", maxLen - length(x)))
)))
colnames(d) <- paste("Color", 1:maxLen, sep="")
df2=cbind(df1[,-c(2)], d) |
|
m*****n 发帖数: 3575 | 14 R是一门很讨厌的语言;语法很不规范,里面有很多经验性的东西。
不存在绝对意义上的学会。你把R in Nutshell全学会算入门,R in Action全会了算进
阶。但是这还不算够,很多东西只有在工作中碰到,发现是坑,才算学会。
例如循环里面忌讳用 c, cbind, rbind这么惨痛的坑,哪本R教材写过? |
|
发帖数: 1 | 15 R里面用循环本来就是坑, 有经验的用户都会尽量避免for循环。 什么c,cbind,
rbind更是能不用就别用,除非万不得已,都是能提前申明先申明。
R的很多经验教训都是在论坛里藏着,确实没有一本书在那,需要找个有经验的人看看
你的code再给你提点意见。 |
|
c***z 发帖数: 6348 | 16 老板又有新花样,这次要cumulative的percentages
patient_percentiles_cum <- patient_percentiles_fin[, c(1,102)]
colnames(patient_percentiles_cum)[2] <- "top.0"
for (k in 1:100) {
# k <- 1
temp <- patient_percentiles_fin[, c(102:(102-k))]
top <- apply(temp,
1,
FUN = sum)
top <- data.frame(top)
patient_percentiles_cum <- cbind(patient_percentiles_cum,
top)
colnames(patient_percentiles_cum)[2+k] <- paste("top",
... 阅读全帖 |
|
k*******a 发帖数: 772 | 17 list里面vector如果都一样长,可以转换为data frame
比如
cbind(df$itemsetID, as.data.frame(applist), as.data.frame(scorelist))
再调调列顺序就可以了
然后合并就可以了 |
|
n*****3 发帖数: 1584 | 18 if 长度相同,都是N
as.data.frame converge them to data frame.
then cbind
then write.csv |
|
t*****e 发帖数: 364 | 19 sorry I mean using the package tree, rpart seems to be only for regression
tree
below is some sample code, 写的太仓促难免有bug, 不过你可以大概看一下logic
require(tree)
A_train = matrix(rnorm(20000),nc=20)
Label = rnorm(nrow(A_train))
df_train = data.frame(A_train,Label)
A_test = matrix(rnorm(20000),nc=20)
Label_test = rnorm(nrow(A_test))
df_test = data.frame(A_test,Label)
Prob_all = NULL
for (k in 1:100) {
index = sample(1:length(Label),length(Label),replace = T)
indF = sample(NonSelected,floor(length(NonSele... 阅读全帖 |
|
t*****e 发帖数: 364 | 20 sorry I mean using the package tree, rpart seems to be only for regression
tree
below is some sample code, 写的太仓促难免有bug, 不过你可以大概看一下logic
require(tree)
A_train = matrix(rnorm(20000),nc=20)
Label = rnorm(nrow(A_train))
df_train = data.frame(A_train,Label)
A_test = matrix(rnorm(20000),nc=20)
Label_test = rnorm(nrow(A_test))
df_test = data.frame(A_test,Label)
Prob_all = NULL
for (k in 1:100) {
index = sample(1:length(Label),length(Label),replace = T)
indF = sample(NonSelected,floor(length(NonSele... 阅读全帖 |
|
k*******a 发帖数: 772 | 21 来自主题: DataSciences版 - R问题请教 m <- matrix(0, nrow = max(i), ncol = max(j))
m[cbind(i, j)] <- count |
|
s****h 发帖数: 3979 | 22 来自主题: DataSciences版 - R问题请教 多谢各位回复啊
给m[cbind(i, j)] <- count 点个赞
俺怎么没想到呢? |
|