由买买提看人间百态

topics

全部话题 - 话题: imbalanced
首页 上页 1 2 (共2页)
D******n
发帖数: 2836
l***a
发帖数: 12410
2
来自主题: Statistics版 - how do you deal with sparse data?
have to read it when I get back home. my office computer cannot access this
link :(
btw, is my case called "imbalanced data"? I thought that's about the balance
of treatments, in other words about independent variables. my case is about
the very rare event, which is about the dependent variable. maybe a silly
question
D******n
发帖数: 2836
3
来自主题: Statistics版 - how do you deal with sparse data?
yes, it is called imbalanced data, with very scarce amount of 1s and a lot o
f 0s or vise versa.

this
balance
about
R******d
发帖数: 1436
4
我balance的目的是,试了不同的ratio,balance的performance最好。
我是故意要很高的specificity的,sensitivity和specificity可以通过设定不同的
prediction score threshold来实现。一般都是报道两者加和最大时候的值吧?AUC这
个指标和数
据是balance和imbalanced的无关。我的AUC在0.88
我的目的是只要找到就行,不需要都找到,所以把specificity调得很高。对于我的处
理方法,算ppv
是应该按balance还是imbalance的来算?
profit lift怎么用?

matrix
d***2
发帖数: 341
5
来自主题: Statistics版 - Drop out immediately after randomization
for phase 1 studies, replacements is often done for drop-outs. When a
subject drops out, the next enrolled subject gets the same randomization
numbers...etc...as long as it's well defined in protocol.
in phase II+ trials, an additional portion of subjects is added into the
study based on estimated drop-out rate (such as 15-20%) to maintain the
power. Imbalanced drop-out is another story, you will have to look into the
discontinuation mechanism by using pattern mixture...etc.
a*z
发帖数: 294
6
来自主题: Statistics版 - logistic regression question
Can SPSS bagging? If so, you may try this out:
http://zyxo.wordpress.com/2009/03/28/mining-highy-imbalanced-da
Best luck.
c*****l
发帖数: 1493
7
来自主题: Statistics版 - Random forests on imbalanced data (转载)
做个down sampling, 弄得相对balanced (或者来个two stage classification)
还可以把两种error加权作为misclassification rate
z******n
发帖数: 397
8
来自主题: Statistics版 - Random forests on imbalanced data (转载)
关注降低FN,怎么平衡ACC?

most
Y****a
发帖数: 243
9
来自主题: Statistics版 - Random forests on imbalanced data (转载)
一般来讲,model自带的evaluation score都是针对整体performance的,即false
negative + false positive,有些model可以针对不同的classification error设置不
同的panalty,不太清楚你用的rf算法里是怎么弄得。不过既然你的主要concern是
false negative,那就不要过分依赖model给出的acc value了。
random forest应该会给出每个observation 对每个class的score,default是把这个
observation分到score高的那个class,你也可以调节这个theshhold,比如,negative
score是0.4,positive score 是0.6,还是把它划为negative,虽然positive的score
更高一些。看看是不是有什么帮助。
j****k
发帖数: 46
10
来自主题: Statistics版 - Random forests on imbalanced data (转载)
chapter 16 of this book addressed this issue in particular, hope this helps,
including most of the ideas you mentioned: http://appliedpredictivemodeling.com/
I have a worse problem than yours, 300 "1", while 0.8m "0", hopefully it
works as well :D

most
l******n
发帖数: 9344
11
来自主题: Statistics版 - Random forests on imbalanced data (转载)
这个基本上很难

helps,
y**3
发帖数: 267
12
来自主题: Statistics版 - Random forests on imbalanced data (转载)
可以直接用firth penalized 吧?有争论说比OVERSAMPLING好
A*****n
发帖数: 243
13
来自主题: DataSciences版 - Random forests on imbalanced data
这种情况下ACC的比较应该没有什么意义,可以看看你4种情况下的ROC曲线,说不定都
很像。
y**3
发帖数: 267
14
来自主题: DataSciences版 - Random forests on imbalanced data
请问acc是啥?是AIC, OR AICC吗?
不比较ACC的话,应该比较什么呢?请指教
y**3
发帖数: 267
15
来自主题: DataSciences版 - Random forests on imbalanced data
Just figured out. ACC should be accuracy
w*****a
发帖数: 218
16
来自主题: DataSciences版 - Random forests on imbalanced data
这个是正解
必要情况下,X-轴用 LOG, 或Y-轴也用 LOG
m******r
发帖数: 1033
17
来自主题: DataSciences版 - 请教一道面试题
我来抛个砖。
看见这个2.5% vs 97.5% 是不是可以imbalanced sampling?
另外,怎么会有这么多feature ? 有的feature一眼看过去就没用 直接garbage
collection.
s*****n
发帖数: 134

发帖数: 1
19
来自主题: DataSciences版 - 请教一道面试题
Over sampling under sampling techniques. From the link u provided, this only
applies to cases that sampling is biased from population and u know it
beforehand. Confusion mertrics and classification report may be one tool
with purposely adjusting the class probability and use f score as a measure.
The features are big, probably need do Sth on it first. Feeling need reduce
the dimensions first instead of only shrinking it.
Rookie一个, please feel free to comment .


: Combat Imbalanced Classes
<... 阅读全帖
t******g
发帖数: 2253
20
来自主题: DataSciences版 - 请教一道面试题
这个是问怎么处理imbalanced samples,然后如何在这种情况下build model
i********r
发帖数: 1153
21
来自主题: _TexasHoldem版 - Am I too aggressive?
then wouldn't that make your shoving range and 3bet range imbalanced?
首页 上页 1 2 (共2页)