关于newdata的讨论汇总 - 话题女王

全部话题 - 话题: newdata

f*******n
发帖数: 2665

这是程序和错误信息，多谢！
> modelscore<- function(model, newdata) {
+ pr <- predict(rpart, newdata=newdata[ , c(input, target)])[,2]
+ # ignore missing target values.
+ no.miss <- na.omit(newdata[ , c(input, target)]$goodbad)
+ miss.list <- attr(no.miss, "na.action")
+ attributes(no.miss) <- NULL
+ if (length(miss.list)) {
+ pred <- prediction(pr[-miss.list], no.miss)
+ } else {
+ pred <- prediction(pr, no.miss)
+ }
+
+ auc<-performance(pred,'auc')@y.values[[1]]
+
+ newlist<-list(pred,auc)
+ return (newlist... 阅读全帖

w******a
发帖数: 25

来自主题: Statistics版 - imputation question?thanks

Here is an R example to impute one or two missing data in each record:
The data will look like
col1 col2 col3
x
x x x
x x
x x
x x x
x
x x x
...
library(Rlab)
alp = 1
K_delta = 2
len_Y1 = 200
#Sample setting:
#Measurment N_
patient Percent
# 1 12
0.18
# 1 2 4
0.05... 阅读全帖

x******a
发帖数: 6336

来自主题: JobHunting版 - 请教为什么这段程序运行不work？（doubly linked list） (转载

／／main（）
#include
#include
#include "linkedlist.h"
int main()
{
LinkedList myList(5, "luke");
Iterator myIterator=myList.begin();
myList.insert(myIterator, "leia");
myIterator.forward();
myList.erase(myIterator);
myList.insert(myIterator, "chewbca"); 《＝＝＝＝＝＝出问题。
myList[2]="han";
for (int i=0; i std::cout< }
myList.insert(myIterator, "chewbca"); 《＝＝＝＝＝＝出问题。
这一句运行起来ta... 阅读全帖

x******a
发帖数: 6336

来自主题: JobHunting版 - 请教为什么这段程序运行不work？（doubly linked list） (转载

Thanks a lot! it worked.
再请教一个问题：我想历遍一个通过frontinsert生成的linkedlist，为什么看不到第
一个insert的元素？谢谢
#include
#include "singlylinkedlist.h"
int main(int argc, const char * argv[])
{
SingleLinkedList aList;
int i=0;
while(i<10){
aList.firstInsert(i);
++i;
}
aList.traverse();
}
输出是
9 8 7 6 5 4 3 2 1
没有0
template< typename T> class ListElement;
template< typename T> class SingleLinkedList;
template
class ListElement{
public:
ListElement(cons... 阅读全帖

o****o
发帖数: 8077

来自主题: Statistics版 - 请问一个sas数据集过滤的问题

afternoon time...
data original;
input x1 $ x2 $ x3;
datalines;
a b 1
b a 3
a c 2
;
run;
data newview/view=newview;
set original;
array _c{*} x1 - x2;
call sortc(of x1-x2);
_newid=cats(of x1-x2);
run;
proc sort data=newview out=newdata;
by _newid x3;
run;
/* keep max*/
data newdata1;
set newdata; by _newid;
if last._newid;
drop _newid;
run;
/* keep first one*/
data newdata2;
set newdata; by _newid;
if first._new

S********a
发帖数: 359

来自主题: Statistics版 - 【包子】生成RAW SAS DATASET问题

我还是没说明白，我在运行程序的时候生成了新的 dataset, 比如a.newdata, 然后我
想把这个newdata变成newdata.sas7bdat文件，不是excel或txt文件。
请支招。。。

f*******n
发帖数: 2665

来自主题: Statistics版 - 问一个关于R 的问题

多谢！！！下面是我写的function， output 是一个list，包含两项，一个是AUC，另
一个是pred, 是prediction(）的结果。 function和output 都没有问题。可是我需要
用plot(performance(pred，...)),而pred是list，类型不符，这个问题如何解决？请
赐教！
modelscore<- function(model) {
pr <- predict(rpart, newdata=newdata[ , c(input, target)])[,2]
# ignore missing target values.
no.miss <- na.omit(newdata[ , c(input, target)]$goodbad)
miss.list <- attr(no.miss, "na.action")
attributes(no.miss) <- NULL
if (length(miss.list)) {
pred <- prediction(pr[-miss.list], no.miss)
} else ... 阅读全帖

t*i
发帖数: 72

来自主题: Programming版 - C语言一个passing variable的问题

我有一个结构如下，
struct{
char *data;
.....
} Connection
在一个静态libraray里我定义了一个function如下
create_connection(){
Connection conn;
conn->data=olddata;
}
func1(Connection *conn, char * msg){
}
func2(Connection *conn, char *msg
if (conn->data =newdata) {
.......
}
}
我在住程序调用了func1如下
func1(Connection *conn, char *msg){
create_connection();
sprintf(conn->data, "%s", "newdata");
func2();
}
我发现尽管我在主程序改变了conn->data,可是似乎到func2的时候，conn->data还总是
olddata,请问这到底是啥问题呢。

h*********o
发帖数: 151

来自主题: Statistics版 - R里面用predict()的问题

predict.gam(object, newdata）
如果object我是先用1000个observation 估计的，我想用这个命令预测120个新的点的
值，为什么总是提示：'newdata' had 120 rows but variable(s) found have 1000
rows ？？？
可是本来预测点的个数就不会和用来fit model的点的个数一样多阿！
怎么修正这个问题啊？

s*****n
发帖数: 2174

来自主题: Statistics版 - R里面用predict()的问题

你确定你newdata里面的变量名和object里面用的是一样的吗?
90%的可能是你newdata 里面的变量名(列名) 和 object里面的不一样.
不一样的话, R会认为缺失prediction varaible, 于是使用默认的
原object里面的variable, 得到的结果是object里面的1000 个 fitted values.

f***a
发帖数: 329

来自主题: Statistics版 - How to fit a smoothed line in R?

### adjust parameter to meet your needs, hehe :)
#data
y <- c(100,150,160,200,300,290,350,400,450,399,500,1500)
x <- 1:12
nx <- seq(1,12,by=0.1)
plot(x,y,type="b")
#poly
tt <- lm(y~poly(x,10))
lines(nx,predict(tt,newdata=data.frame(x=nx)),col=2)
#loess
lsm<-loess.smooth(x,y,span=1/2)
lines(lsm,col=2)
lsm1<-loess(y~x,span=1/4)
pred<-predict(lsm1,newdata=nx,se=TRUE)
lines(nx,pred$fit,col=4)
lines(nx,pred$fit-1.96*pred$se.fit,lty=2)
#supsmu
lsmsup<-supsmu(x,y)
lines(lsmsup,col=2)
plot(x, y)
for(i i

p***r
发帖数: 920

来自主题: Statistics版 - 怎样用R定位变量的位置

data<-data.frame(matrix(101:106,2,3))
> data
X1 X2 X3
1 101 103 105
2 102 104 106
> newdata<-
data.frame(id=rep(rownames(data),dim(data)[2]),varname=rep(colnames(data
),dim(data)[1]),value=c(t(as.matrix(data))))
> newdata
id varname value
1 1 X1 101
2 2 X2 103
3 1 X3 105
4 2 X1 102
5 1 X2 104
6 2 X3 106

f****r
发帖数: 72

来自主题: Statistics版 - 请教一个SAS问题

你的表述好像有些不太清楚。Yes如果仅仅由P(i)所决定。那么Yes就是P(i)的值，对吗？
我是个新手，你可以试试下面的程序，看对不对。
proc transpose data = yourdata out = newdata;
by name Good;
var P1 P2 P3 P4 P5 P6 P7 P8 P9 P10;
run;
data one (keep = Good P Yes);
set newdata;
P = _name_;
Yes = COL1;
run;
proc freq data = one;
tables Good*P*Yes / out = two (keep = Good P Yes Count);
run;

l****x
发帖数: 58

来自主题: Statistics版 - 问一个R的问题

问一个R的问题
从一个dataframe里选择满足条件的纪录，比如：
newdata=olddata[olddata$A="test",]
结果在newdata中出现很多NA的行，是整行都是NA，象下面这样：
aa OrigID ID sourceID TRT name length hei
wid
NA NA NA NA
NA
NA.1 NA NA NA
NA
NA.2 NA NA NA
NA
NA.3 NA NA NA
N

h******e
发帖数: 1791

来自主题: Statistics版 - 【包子】生成RAW SAS DATASET问题

很简单，假设你的新libname是a，新的dataset的名字是newdata:
data a.newdata;
set b;
run;

s******r
发帖数: 1524

来自主题: Statistics版 - 【包子】生成RAW SAS DATASET问题

did not get it.
b and a.newdata are both sas7bdat file. the only problem is where it saved. b saved under your work directory and newdata saved on under libname a. You should check your lib setting.

w*******9
发帖数: 1433

来自主题: Statistics版 - 包子求助Survival问题，很菜，请大家包涵

你要每个individual整个的predicted survival curve的话，在newdata里只要加上
covariates就行，去掉individual=TRUE. For example, pred<-survfit(model,
newdata) plot(pred). 要得到10年survival prob., 你得自己从pred$surv里去取。
另外你有3000个features, 你怎么降维的？

Y******Y
发帖数: 8753

来自主题: Statistics版 - 包子求助Survival问题，很菜，请大家包涵

你仔细看说明：
individual
This argument has been superseded by the id argument and is present only for
backwards compatability. A logical value indicating whether each row of
newdata represents a distinct individual (FALSE, the default), or if each
row of the data frame represents different time epochs for only one
individual (TRUE). In the former case the result will have one curve for
each row in newdata, in the latter only a single curve will be produced.
就是说，if one row per subject, then individu... 阅读全帖

l*g
发帖数: 46

来自主题: Statistics版 - 请教一个用R建multinomial probit model的问题

用于categorical outcome，所以先用了logit link，用的multinom这个function，然
后需要用newdata predict probability，但是在试probit link的时候发现不太对，首
先multinom没有probit选项，然后试了mlogit，非要transfer成long format，可是
predict的又不对，发现newdata非得包含outcome variable（y），如果y都有了还
predict什么呢，不明白了。。。然后又试了mnp，估计因为是有random sample的原因
，每次出来的probability都不一样，也不知我理解的对不对，请赐教！！！谢谢各位
大神

o*****p
发帖数: 2977

来自主题: Military版 - 中国产业状况和差距【zz】

https://zhuanlan.zhihu.com/p/29859928
量比别人大，产值、利润乃至收入却比不过，谓之大而不强。作为世界第一大工业国的
国民，我们要正确的理解大而不强这四个字：中国是全产业链国家，大而不强是和本行
业全球最强者比较。作者发现我国进口金额最高的13类工业品，全部在快速的进行国产
化替代，只是由于有的工业品进口比例高，有的进口比例低，国产化替代的速度各有不
同。
———————————————————————————————————————
中国制造大而不强，这句话是对了一半，的确有的中国制造产量非常大，营业收入也非
常高，但是产值和利润都非常低，这个是事实。
2017年第一季度中国品牌手机份额占到世界48%，也就是人类每买两部手机，就有一部
是中国品牌。但是如果论产值和利润，中国手机可就占不到48%了。
苹果公司2016年卖了2.15亿部iphone，占世界14.6%；中国三强华为+OPPO+VIVO 2016年
卖了3.157亿部手机，占世界21.6%；三星公司2016年卖了3.11亿部手机，占世界21.2%。
然而对比一下营收，2016年苹果来自ip... 阅读全帖

f*******t
发帖数: 7549

来自主题: JobHunting版 - 难道我下载到的是盗版CareerCup 150？？

没什么不好吧，实在看不顺眼的话，你在面试的时候如果碰到就写个static method呗
，又没什么大不了的事。
public static void appendToTail(Node node, int newData);
效果完全一样，只要有任一节点的ref，就可以在最后插入一个新节点。

f*****Q
发帖数: 1912

来自主题: Programming版 - C语言一个passing variable的问题

不太看得懂你问啥，但是if( conn->data =newdata)有点让我的小心肝颤悠。

c***z
发帖数: 6348

来自主题: Programming版 - 从今天开始起，学C++！

c******k
发帖数: 1140

来自主题: Computation版 - 用Matlab的fzero解这个单变量非线性方程，总是报错？

-a*c*log(exp(-pi/(a*d))+exp(pi/(a*d)))+c*pi/d=0
其中
c=3.0e+8
d=345e-9
a的初始值在5.5e+5附近，但是用Matlab的fzero解这个单变量非线性方程，总是报如下
错误：
??? Error using ==> fzero
Function value at starting guess must be finite and real.
Error in ==> D:\NewData\Re-visite PBG Curve\FsolvePBGFun.m
On line 44 ==> x=fzero(inline('-x*(3.0e+8)*log(exp(-pi/(x*(345e-9))+exp(pi/
(x*(345e-9)))))+(3.0e+8)*pi/(345e-9)'),x0,options)；
a=5.5e+5的初始值，难道不是finite and real吗？换其它值也报错，怎么回事，谢谢
了

s*****n
发帖数: 2174

来自主题: Statistics版 - R里面用predict()的问题

肯定还是variable name的问题.
你试试把所有东西都弄成data frame
比如原fit里面, 用
gam(y~x+z, data=...)
反正整理好了以后, 肯定是给你120个fit.
如果你得到的还是1000个fit, 还是说明你的object和newdata没match好.

m*********d
发帖数: 209

来自主题: Statistics版 - 请教一个SAS问题，感谢！

如何在下面加循环（或象C那种指针）来达到隔行读取数据? 比如每隔4行读一条记录，
也就是我想要第4，8，12，16.....行数据。原来文件数据总数不确定。
data newdata；
set olddata；
run；
SAS用的不多，所以就是搞不定。急着要这数据，请教大家了，万分感谢！

q**j
发帖数: 10612

来自主题: Statistics版 - 请教一个SAS问题，感谢！

data newdata;
set olddata;
if mod(_N_,4)=0;
run;
or you can create a counter yourself, that is more flexible.

o****o
发帖数: 8077

来自主题: Statistics版 - SAS 一个问题的急问

SAS can solve this fair easily, too.
%macro wrap;
data &newdata;
set &olddata;
array _S[8] P01-P08;
%do j=1 %to 8;
_S[&j]=sum(of P0&j:);
%end;
keep P01-P08;
run;
%mend;
%wrap;

o******6
发帖数: 538

来自主题: Statistics版 - [合集] 求助:刚开始学习R的菜鸟求教一个比较白痴的问题

☆─────────────────────────────────────☆
sweetandlow (Pepper) 于 (Wed Mar 18 23:30:41 2009) 提到:
刚开始用R, 啥都搞不清, 只能依葫芦话瓢, 大家不要笑我. 帮我看一下这个好吗, 很
急哦.
今天遇到一个问题, 我想应该是一次就可以得到所有答案的,可是我实在不知道怎么弄,
于是我得每次改一个数字, 再输出X, 觉得实在太繁琐了, 请大家教我一下怎么做吧.
...
fm <- glm(cbind(Mim, Total-Mim) ~ Age+ I(Age^2)+I(Age^3), mim, family=
binomial)
tfct <- function(x) predict(fm, newdata=data.frame(Age=x)) - zxx
zxx <- log(0.1/(1-0.1))
uniroot(tfct, range(mim$Age))$root ->X1
X1
zxx <- log(0.2/(1-0.2))
uniroot(tfct, range(mim$Age))

y*m
发帖数: 102

来自主题: Statistics版 - how to avoid changing the raw dataset when you work on proc sort?

use out=newdata to avoid overwriting the old one.

D******n
发帖数: 2836

来自主题: Statistics版 - 怎么在EXCEL里把一个column的A/B分到两个column里？

dont confine yourself to SAS, sas is not good at handling dirty data set.
this can be done under linux in one line
sed 's/\// /g' yourdata>newdata
and as i said, u dont even need this if u do it in excel.

function!
save

o****o
发帖数: 8077

来自主题: Statistics版 - 问个比较具体的算法问题

if all ids are capitalized Char A-to-Z, then it is easy:
*-----------------------;
data yourdata;
input group_id id1 $ ID $;
datalines;
1 A B
1 A C
1 A D
2 B A
2 B C
2 B D
3 C B
4 D A
4 D B
5 B E
5 B K
5 B E
6 B F
7 E K
;
run;
data newdata;
array _C1[65:90] _temporary_;
do i=65 to 90; _C1[i]=0; end;
do until (eof);

o******6
发帖数: 538

来自主题: Statistics版 - help need for SAS macro

%macro data(num);
%do i=1 %to #
data est&i;
infile "path\data&i";
input a b c d;
run;
proc append base=newdata data=est&i force;
run;
%end;
%mend;
%data(num=100);

o******6
发帖数: 538

来自主题: Statistics版 - help need for SAS macro

%macro new(num);
filename combine (%do i=1 %to #"path\data&i..txt" %end;);
data newdata;
infile combine;
input a b c d;
run;
%mend;
%new(100);

num

w******a
发帖数: 25

来自主题: Statistics版 - imputation question?thanks

Here is an R example to impute one missing data in each record,half of the code is to make data sample, you probably only need second half,but including them here helps you understand what is going on:
The data will look like
col1 col2
x
x x
x
x x
x x
...
library(Rlab)
alp = 1
Prob_R1 = 0.5
Prob_R0 = 1 - Prob_R1
len_Y1 = 200
K_delta = 2
Y1 = rnorm(len_Y1,mean=0,sd=1)
R1 = rbinom(n=len_Y1, size=1, prob=Prob_R1)
Y2 = rnorm(n=len_Y1,... 阅读全帖

o****o
发帖数: 8077

来自主题: Statistics版 - data set problem (SAS)--first two will have two(baozi) Happy holiday

proc means data=yourdata noprint;
by id;
var medication_:;
output out=newdata(where=(_STAT_='MAX') drop=_TYPE_ _FREQ_) ;
run;
baozi pls...

m**********r
发帖数: 122

来自主题: Statistics版 - 关于 data merge

正在看 data merge，有个例子是这样得
data newdata;
merge yourdata (in=a) otherdata (in=b);
by permno date;
请问这里的(in=a) 和(in=b)是什么意思。

N****n
发帖数: 1208

来自主题: Statistics版 - 哪位用R做过CART MODEL

谢谢老兄~~~
大概是这样的
Tree = rpart( response ~ explanatory, method="class")
pred=predict(Tree, newdata=data.frame(explanatory), type="prob")
期中explanatory = [x1 x2 ..... x13] 树模型是用700多个OBS FIT的，想用60多个
TEST DATA 来TEST，
没有ERROR，但WARNING给的是：TEST里有60个，而变量有700个。输出的结果也是700，
但我想要20个PREDICTED VALUE 。。。。。

N****n
发帖数: 1208

来自主题: Statistics版 - 哪位用R做过CART MODEL

大约知道是这个意思。
想知道你怎么定义的TEST.DATA。
"Warning message:
'newdata'有86行但变量里有770行 "
我得到这个，意思是TEST DATA和原MODEL数量不相等。。。。很WEIRD，本来就不该相
等，我哪地方语法写的不对吧。。。。晕了。。。R TUTORIAL 我从来就没看明白过。
。。

r******m
发帖数: 369

来自主题: Statistics版 - [SAS] how to do nested loop between 2 datasets?

为啥要用nested do loop啊，data step 本身不就是loop吗？
data newdata;
set dataset1;
do i= 1 to ndataset2;
set dataset2 nobs=ndataset2 point=i;
end;
run;

a*******g
发帖数: 80

来自主题: Statistics版 - Help on nomogram

Hi, I used Dr. Harrell's rms package to make a nomogram.
Below is my code for nomogram and calculate total points and probability in
original data set used for building the nomogram. My question is how I get
the formula for calculating the survival probability for this nomogram. Then
I can use this formula to do validation by using other data set.
f1 <- cph(Surv(retime,dfs) ~ age+her2+t_stage+n_stage+er+cytcyt+Cyt_PCDK2 ,
data=data11,
surv=T, x=T, y=T, time.inc=5)
surv<- Survival(f1)
surv10 <- ... 阅读全帖

D******n
发帖数: 2836

来自主题: Statistics版 - R 有点令人失望

well in R, i will transform this data structure into a list
newdata <- split(old_data,old_data$id)

in

o****o
发帖数: 8077

来自主题: Statistics版 - 每个ID出现一次，missing去掉，请问高手用SAS怎么做？

proc means data=yourdata nway noprint;
class ID;
var m1 m2;
output out=newdata max(m1)=m1 max(m2)=m2;
run;

s***1
发帖数: 343

来自主题: Statistics版 - 包子求助Survival问题，很菜，请大家包涵

100多个subject，每个subject都有已经给定的survival time和censoring value，同
时每个subject都有相应的gene expression data（大概30000的gene features），我
先用这些fit了一个cox model（就是最常规的model<-coxph(Surv(survival$t.os,
survival$e.os)～expression。。。）
现在需要做的是要用这个model去predict其他给定gene expression和time point的new
subject的survival rate（比如10年的存活率）。
我用的是survfit(model,newdata=data.frame(new_expression),individual=TRUE),
期望出来的是几条对应于这几个individual的曲线，可是却报错说start和stop缺失。
想请问问有经验的前辈们，是不是需要把时间加在predict里面，如何能加（survfit的
help里没找到）。
不好意思问题很菜，但是实在是折腾了一阵子... 阅读全帖

t*****w
发帖数: 254

来自主题: Statistics版 - 请问面试 R 应该怎么准备？

When I had my job interview, they always tested my SAS skill.However I use R
all the time. To help your preparation, read my R codes to see how much you
can understand it.
%in%
?keyword
a<-matrix(0,nrow=3,ncol=3,byrow=T)
a1 <- a1/(t(a1)%*%spooled%*%a1)^.5 #standadization in discrim
a1<- a>=2; a[a1]
abline(h = -1:5, v = -2:3, col = "lightgray", lty=3)
abline(h=0, v=0, col = "gray60")
abs(r2[i])>r0
aggregate(iris[,1:4], list(iris$Species), mean)
AND: &; OR: |; NOT: !
anova(lm(data1[,3]~data1[,1... 阅读全帖

Z*******n
发帖数: 694

来自主题: Statistics版 - 求教一个模型/预测问题

Three people suggested survival model.
I am willing to learn and try.
I used the cox proportional hazard model long long time ago, and now I
forgot how to use it.
Some old R code is below (NOT for this problem at hand, but for some
exercise problem). Am I in the right track? Any tips/hint/R code snippets?
library(survival)
?coxph
coxph.m <- coxph(Surv(lifetime,notcensored1) ~ x1+x2, data=mydata)
summary(coxph.m)
plot(survfit(coxph.m),xlab='time',ylab='1-CDF')
predict(coxph.m, newdata=ldkfakdfj... 阅读全帖

H*M
发帖数: 22

来自主题: Statistics版 - 一个SAS问题求助

data newData (drop preYear);
retain YGroup;
retain preYear ;
set myData; * original data. suppose it has been sorted by year.
if _N_ = 1 then
YGroup = year;
else if (preYear+1 < year) then
YGroup = year;
preYear = year;
run;

B******y
发帖数: 9065

来自主题: Statistics版 - Re: 如何快速的将单行分成多行

假定SAS有个unnest的命令，然后一步到位：
data newdata;
set mydata;
by ID;
newindex = unnest(index, dlm=" &,AND"）；
delete index;
run;
有这种可能么？

t*****e
发帖数: 364

来自主题: DataSciences版 - random forest 有没有可能保证某几个变量一直被选上

sorry I mean using the package tree, rpart seems to be only for regression
tree
below is some sample code, 写的太仓促难免有bug, 不过你可以大概看一下logic
require(tree)
A_train = matrix(rnorm(20000),nc=20)
Label = rnorm(nrow(A_train))
df_train = data.frame(A_train,Label)
A_test = matrix(rnorm(20000),nc=20)
Label_test = rnorm(nrow(A_test))
df_test = data.frame(A_test,Label)
Prob_all = NULL
for (k in 1:100) {
index = sample(1:length(Label),length(Label),replace = T)
indF = sample(NonSelected,floor(length(NonSele... 阅读全帖

t*****e
发帖数: 364

来自主题: DataSciences版 - random forest 有没有可能保证某几个变量一直被选上

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天