w******a 发帖数: 25 | 1 Patient_Code Sex Age Color_used DIAGNOSIS
1 22 1 29 4 0
2 1 1 25 10 0
3 9 1 48 1 0
4 13 1 27 3 0
5 18 1 29 6 0
6 20 1 24 6 0
7 23 0 57 3 0
8 25 1 62 1 0
9 27 0 56 1 0
10 5 1 39 1 0
11 |
w******a 发帖数: 25 | |
S******y 发帖数: 1123 | 3 supervised learning
Binary classifier - |
d*******o 发帖数: 493 | 4 proc logistic data=one;
model diagnose = sex Age Color_used ;
run; |
w******a 发帖数: 25 | 5 > mylogit<-glm(DIAGNOSIS~Color_used,family=binomial(link="logit"), na.action=na.pass)
> summary(mylogit)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.57289 1.13605 -1.385 0.166
Color_used 0.09416 0.18558 0.507 0.612
The table of coefficients shows that Color_used is statistically non-
significant. |
w******a 发帖数: 25 | 6 如果用classification tree或是Support Vector Machine或是KNN分析的话,training set and test set的data size太小,怎么分都是不够的.
请问statsguy,该怎么分呢?要bootstrap吗? |
w******a 发帖数: 25 | 7 谢谢dapangmao 的sas code!热心人啊 |
S******y 发帖数: 1123 | 8 naiveBayes - R package e1071
Neural Net - R package nnet
Collaborative Filtering (?) - Matlab?? |
w******a 发帖数: 25 | 9
e1071 only has one method called e1071.e1071-deprecated().
i don't know how to use it.
> e1071-deprecated()
Error: object 'e1071' not found
nnet
>train<-read.csv("dataset_nee.csv",header=T)
> nnet(x=train[,2:4],y=train[,5])
Error in nnet.default(x = train[, 2:4], y = train[, 5]) :
element 1 is empty;
the part of the args list of 'c' being evaluated was:
(size, dim(y)[2]) |
D******n 发帖数: 2836 | 10 then u have to accept the truth....
action=na.pass)
"logit"), na.action=na.pass)
【在 w******a 的大作中提到】 : > mylogit<-glm(DIAGNOSIS~Color_used,family=binomial(link="logit"), na.action=na.pass) : > summary(mylogit) : Coefficients: : Estimate Std. Error z value Pr(>|z|) : (Intercept) -1.57289 1.13605 -1.385 0.166 : Color_used 0.09416 0.18558 0.507 0.612 : The table of coefficients shows that Color_used is statistically non- : significant.
|
w******a 发帖数: 25 | 11 in pilot study, only "color_used" stands out. We wish we can prove it and
apply the result to save cost.I believe the data size is the issue and keep
recruiting now.
It is still really nice to be able to learn different analysis methods from
your guys!
Can anyone suggest something to solve the training/test dataset splitting
problem? |