第10页 - 关于scorecard的讨论汇总 - 话题女王

全部话题 - 话题: scorecard

j**********3
发帖数: 305

来自主题: Statistics版 - A SAS Macro Calculating PDO

It was a very useful tool for score alignment of ongoing scorecard
monitoring and support. ensuring the same score represents the same actual
odds across all models in a multi-model environment is critical !
Thanks man !

s*********e
发帖数: 1051

来自主题: Statistics版 - evaluate a scorecard with weighting scheme

it should be able to take care of the weight with any positive value.
http://statcompute.wordpress.com/2013/05/04/a-sas-macro-for-sco
hope you find it useful and have a nice weekend.

w*****e
发帖数: 806

来自主题: Statistics版 - evaluate a scorecard with weighting scheme

非常不错啊，谢谢！！

m******e
发帖数: 89

来自主题: Statistics版 - 下载：　Code for Credit Risk Scorecards: Development and Implementation Using SAS

many people asked me for the code. Please download it from my dropbox:
https://dl.dropboxusercontent.com/u/44155643/ScorecardDev.rar
(I do not have the book)
Thanks,
Original post:
http://www.mitbbs.com/article_t/Statistics/31341815.html

w*****e
发帖数: 806

来自主题: Statistics版 - 下载：　Code for Credit Risk Scorecards: Development and Implementation Using SAS

thanks a lot!!

h*********o
发帖数: 46

来自主题: Statistics版 - 下载：　Code for Credit Risk Scorecards: Development and Implementation Using SAS

Thanks a lot!

l******h
发帖数: 855

来自主题: Statistics版 - (update)拿到了Offer，离开前准备帮现在公司推荐一个新人，有兴趣的看

我们公司是在Roswell,Georgia. 基本要求是需要SAS, SQL 以及EXCEL. 据我的
经验，一般SAS 用到BASE的东西较多，需要懂MACRO。另外，SQL也要会，不过就是一
般的基本操作如join, update, drop, add, where 之类的就可以。
职位是Data Analyst，做关于credit risk方面的事情。包括data process（
cleaning, formatting, etc.）， data aggregation, Analyze files， apply
scorecards and model validation in SAS, 等。
你们有兴趣的话，可以私信我，求更多信息。
Update: 我们老板说了只考虑州内及马上能上班的同学们，所以不符合条件的同学，
抱歉我只能帮你们到这了。

S*******1
发帖数: 251

来自主题: Statistics版 - How to deal with the NULL value?

when building a scorecard for credit/risk, I am puzzled on how to deal with
the null value. three choices:
1. set as a constant such as: 0
2. set as average
thanks!

d******e
发帖数: 7844

来自主题: Statistics版 - SVM (R)的结果能转成scorecard吗?

SAS里有SVM。
http://support.sas.com/documentation/cdl/en/whatsnew/64209/HTML
SVM和带Ridge Penaly的Logistic Regression没有什么本质区别，所以可以试试下面的
实现方法。
http://www.sas-programming.com/2010/09/regularized-discriminant
上面这个是我找到的，我不用SAS，所以也不保证是对的。

s******n
发帖数: 189

来自主题: Statistics版 - SVM (R)的结果能转成scorecard吗?

:)
你是说"SVM和带Ridge Penaly的Logistic Regression"的优化问题没有什么区别？
SVM中的优化问题是带线性约束，目标函数有一项Ridge Penalty和一项关于松弛因子的。
带Ridge Penaly的Logistic Regression的优化问题是基于MLE的吗？
有什么书或者文献？谢！

d******e
发帖数: 7844

来自主题: Statistics版 - SVM (R)的结果能转成scorecard吗?

Logistic Regression用的是Logistic loss，形状上和SVM的Hinge Loss区别不算大，
所以实际表现区别也不太大。
至于你说的SVM带线性约束，不过是一种reparametrization，因为hinge loss可以用
piece-wise linear function来表示。
讲这两种loss的共同点和区别的文章网上有很多：
https://www.google.com/search?q=hinge+loss+logistic+loss&safe=off&espv=210&
es_sm=91&source=lnms&sa=X&ei=ual-UorQKMzl4APQzYCABA&ved=0CAYQ_AUoAA&biw=1525
&bih=864&dpr=2
Logistic Regression好处是完全Model-based，所以优化问题就是一个MLE，解释起来
容易，可以直接输出后验概率。
SVM的概率输出就像对麻烦很多，而且基本也是在模拟Logistic Regression，虽然是
Hinge Loss比Logistic Loss对Outl... 阅读全帖

o****o
发帖数: 8077

来自主题: Statistics版 - SVM (R)的结果能转成scorecard吗?

drburnie说的可以看这篇论文，讲得比较详细：
Zhang et al, “Modied Logistic Regression: An Approximation to SVM and Its
Applications in Large-Scale Text Categorization”， ICML-2003

的。

w********m
发帖数: 1137

来自主题: Statistics版 - SVM (R)的结果能转成scorecard吗?

大牛，能讲讲SVM for regression与一般regression的优缺点吗?

1525

d******e
发帖数: 7844

来自主题: Statistics版 - SVM (R)的结果能转成scorecard吗?

我一直觉得SVR不算啥好东西，比Regularized LAD Regression没啥太多优势，还多一
个调节参数。
目前没发现有啥明显优点。

D******n
发帖数: 2836

来自主题: Statistics版 - SVM (R)的结果能转成scorecard吗?

写文章。

A*******4
发帖数: 218

来自主题: Statistics版 - Statistics in industry行业知识系列

Statistics in industry行业知识培训系列之：Analytics of Mortgage Banking小班
火爆开班-试听,报名正在进行中
人数：3-5人
内容：
Overview of mortgage industry
SAS programming techniques
Statistical modeling and analytics
Resume calibration and interview tips
Hand-on Project: case study of a mortgage dataset
有兴趣的参加试听的同学请加QQ群:303124359,
还将陆续推出Financial Scorecard, Marketing Mail Campaign, Healthcare,
Insurance Pharmaceutical工业统计实战课程，欢迎进群咨询

A*******4
发帖数: 218

来自主题: Statistics版 - Statistics in industry行业知识系列

Statistics in industry行业知识培训系列之：Analytics of Mortgage Banking and
Healthcare Medicaid/Medicare Data小班火爆开班-试听,报名正在进行中
人数：3-5人
有兴趣的参加试听的同学请加QQ群:303124359,
内容：
Financial 培训讲师是在BOA和Fannie Mae的资深analyst.下面为课程安排
Class 1: Introduction to Banking Industry and Mortgage Business
Class 2: Data Manipulation using SAS
Class 3: Data Merge: data step, proc sql, Microsoft SQL
Class 4: Statistical Modeling
Class 5: Resume and interview tips, behavior questions.
Healthcare的讲师为Yale Medical Center的资深Statistician以下是... 阅读全帖

f*******n
发帖数: 2665

来自主题: Statistics版 - 问个有关boosting的问题

想用adaboost预测fraud, 但客户要求model的结果要“透明”，比如用logistic
regression做得 scorecard，每个变量（及其值）对score的贡献一目了然。adaboost
的结果有多少透明度？

h****t
发帖数: 33

来自主题: Statistics版 - 求教版上前辈，这个position到底是干嘛的？大概多少pay？

Wells Fargo的Quantitative Associate Program， Credit Risk track
他们的description如下：
Our Credit Risk track, sponsored by Corporate Risk and the Consumer Lending
Group will give Associates the opportunity to work with various lines of
businesses to develop, maintain and validate statistical models for loss
forecasting, credit risk scorecard, risk segmentation, capital management,
and stress testing for a variety of lending products.
要求的qualification:
Qualifications
 Completion of all requireme... 阅读全帖

w******l
发帖数: 34

来自主题: Statistics版 - 今天和一个阿三聊segmented logistic regression

大侠能再具体讲一下log odds的可加性指什么吗
我知道logistics reg本身设计成indep是log odds就是为了能fit出来可加的
Linear sum of indep variables, 但是如果是不同的打分加在一起，还有道理吗，比如
score1 = log (p1/1-p1),
score2 = log(p2/1-p2),
那么score1 + score2 好像不好解释的通啊, 尤其是p1,p2不是independent的两件事情
的话（虽然比直接p1 + p2多些道理）。
一个引申的问题就是， scorecards model是要把prob 转变成score的，一直不理解转
化的公式为什么要那么定义：-）

A*******s
发帖数: 3942

来自主题: Statistics版 - 今天和一个阿三聊segmented logistic regression

bureau的分数对于用户来说其实就是个black box吧，没有itemize的需求，所以可加不
可加也无所谓了。但对于银行来说，itemized scorecard的需求还是挺多的，比如说
adverse action code，或者说制定具体的underwriting policy。

D******n
发帖数: 2836

来自主题: Statistics版 - 今天和一个阿三聊segmented logistic regression

精算的意思是，不单要排好序，取头4条，而且要每条令score降低多少都要说出来。
但我並不知道FACTA或CFPB變的這麼嚴厲了
乾脆把scorecard 公開得了。

predictor

y**3
发帖数: 267

来自主题: Statistics版 - 请问ln odds

请问ln odds 代表什么意义？面试时被问到，
我们可以用ln odds计算概率，记得版上有大牛提到做SCORECARDS ln odds可以相加 ,
但概率不可以，是不是跟这有关呢？为什么ln odds可以相加？
请指点

A*******s
发帖数: 3942

来自主题: Statistics版 - measuring the change of intercept

definitely not directly use the intercept in a logit model, since intercept
has no physical meaning at all. And think about if the intercept is very
close to zero or is negative.
Go back to scorecard alignment practice--usually we use two metrics--target
odds at a reference score, and points double the odds--to build a linear
transformation between xbeta and score. The former stands for absolute risk
level of the score (intercept), and the latter is about relative risk (
slope). So the answer t... 阅读全帖

s*******t
发帖数: 300

来自主题: Statistics版 - 银行的risk modeling 工作机会求建议

做过统计和machine learning的工作，开发过软件。工作申请方向是
software developer和 data scientist. 面了第
二个公司，主要是做scorecard, credit risk modelling的，给的offer低于业内
标准。这里请问大家这个方向怎么样，有前途吗？我知道现在因为政府要求
，形势还不错，但是会不会几年之后regulation放松之后，又面临困境了呢？已经入错
一次行，不想再错了。credit risk modeller的职业发展规划是怎么样的呢？谢谢

S******3
发帖数: 66

来自主题: Statistics版 - 老问题，统计博士药厂还是银行

我只是针对回国做统计的，经常看到海归版报制药offer, 本版也曾有制药猎头出没。
你啥时见到商行海归credit analyst/risk modeler? MBA/IB/PE不算。国内银行就不待
见啥数据分析。最近听说某些咨询公司给商行Scorecard 的报价是5W....人刀！

a******c
发帖数: 291

来自主题: Statistics版 - 怎么把模型变成score呢

有多种方法，看使用对象和你自己的偏好。假如你要将Score纳在1-100之间，我用过以
下几种方法：
1）将probability score乘以一个常数，使之变为整数，再加上限，比如if score
> 100 then score=100。同时告知这个Score也可以当probability使用（除以100即可
）。但假如你的score是要卖给第三者，你就要防止别人reverse engineer你的score的
话，这个方法不能用。
2）可以将probability score 分100等分(PROC RANK)，每一等分的score为1， 2
， 3，…, 99, 100。
3）看看probability score的分布，以中值为中心，定为score=50，再向两边发散
。这样的话，你可以“制造”一个正态分布的score。
2）and 3) 可以确保别人不能reverse engineer你的score，同时score只适用于rank
order，而不是estimat，比如说你可以说score =90比score=30好，但不能说score=90
比s... 阅读全帖

r*****t
发帖数: 91

来自主题: Statistics版 - 能否帮我下载这篇paper？

Using a transactor/revolver scorecard to make credit and pricing decisions
Decision Support Systems
Volume 59, March 2014, Pages 143–151
发送到[email protected]
/* */
非常感谢！

E**********e
发帖数: 1736

来自主题: Statistics版 - 谁有这本书 Credit Risk Scorecards: Development and Implementation Using SAS

有人可以share一下这本的电子书吗？ amazon 太贵，买实体书划不来。国内淘宝自己
也没有账号，也买不了。我搜索到这个版几年前有人贴这个书。要是有人手里有的话
，能给我一份吗？包子奉上。

f**********0
发帖数: 142

来自主题: Statistics版 - 谁有这本书 Credit Risk Scorecards: Development and Implementation Using SAS

i have, your email please!

E**********e
发帖数: 1736

来自主题: Statistics版 - 谁有这本书 Credit Risk Scorecards: Development and Implementation Using SAS

yes. do you have English pdf version?

b********1
发帖数: 291

来自主题: Statistics版 - 谁有这本书 Credit Risk Scorecards: Development and Implementation Using SAS

真想看就买一本呗。
这书我以前看过，写的还行。

s******n
发帖数: 240

来自主题: Statistics版 - 谁有这本书 Credit Risk Scorecards: Development and Implementation Using SAS

sorry，couldn't find it either.

s******n
发帖数: 240

来自主题: Statistics版 - 谁有这本书 Credit Risk Scorecards: Development and Implementation Using SAS

go for an alternative Chinese version.

E**********e
发帖数: 1736

来自主题: Statistics版 - 做credit risk scorecard的朋友们，请进来，有问题求教

转stackexchange上的一个帖子。
http://stats.stackexchange.com/questions/27750/feature-selectio

。

n**********0
发帖数: 66

来自主题: Statistics版 - 做credit risk scorecard的朋友们，请进来，有问题求教

我没做过LZ的方向,但是做过一些marketing的分析，选变量的时候我会筛选一些先，踢
出去那些就算选了也很难解释的。我们这边需要最后make sense out of it,那些没法
解释的就略过了。然后就是仁者见仁智者见智了，方法也很多，一般如果AUC增加的不
多了，不要超过20个variable吧我觉得，特别是你sample size不大，另外hold out可
以小点我觉得，这样你可以多些样本。testing也不是必须的吧，可以做in-sample
cross validation

E**********e
发帖数: 1736

来自主题: Statistics版 - 做credit risk scorecard的朋友们，请进来，有问题求教

有supervised的选变量，也有unsupervised的选变量，前者看response variable，后
者不看。

l*******s
发帖数: 1258

来自主题: Statistics版 - 做credit risk scorecard的朋友们，请进来，有问题求教

如果training data就4k左右，但维数有好几千，这就curse of dimensionality了。
降维吧或者多找training data

E**********e
发帖数: 1736

来自主题: Statistics版 - 做credit risk scorecard的朋友们，请进来，有问题求教

谢谢回复。这个降维现在还用不着。在这个领域内，不是常规做法。而且有很多data
cleaning要处理，比如，missing imputation。
三大信用局的数据，四份之三是垃圾。剩下的2,3百变量中，有些事不重要的。从中挑
选，如果supervised的话（看response 变量）就会带来bias。
具体做过risk modeling 项目的话，可能会理解我的问题。

x***4
发帖数: 1815

来自主题: Statistics版 - 做credit risk scorecard的朋友们，请进来，有问题求教

降维效果不大，你要做regularization.

E**********e
发帖数: 1736

来自主题: Statistics版 - 做credit risk scorecard的朋友们，请进来，有问题求教

是的。现在已经用regulation了。问题是每次cross validation，进入的变量都变
。 AUC提高了点，问题是原先的变量是基于数据一和二选的。可能还是有bias，用到
新的数据会performancwe 不好。
现在的问题就是怎么unsupervised的预选重要的变量。不知大银行的modeler是基于
pvalue， IV，clustering 来预选变量吗？就像那几本modeling的书讲的一样。

l*******s
发帖数: 1258

来自主题: Statistics版 - 做credit risk scorecard的朋友们，请进来，有问题求教

能不能多问问搞业务的部门的人他们如果能提供一些business insight 或许会有帮助
而且 model往往也是要转化成为business 如果按照他们提供的规则选feature 那么即
便最后效果不是太理想也算是拉他们下水了责任不会全在你这里

data

m******e
发帖数: 89

来自主题: Statistics版 - Dell is hiring senior risk modeler/analyst at Austin

要求5－6年以上经验，BASE　大概　９　－　１１万　根据经验和能力而定。感兴趣的
请把简历发给我。
[email protected]/* */
谢谢！
Data Scientist Consultant
Description
GRADE LEVEL: II
1. Must be hands-on
2. Working experience in statistical modeling and advanced analytics in a
financial service company, preferably have the work experience in
statistical modeling and analytics for Consumer and Small Business lending.
Successfully developed & implemented statistical models.
3. Master or PhD degree in a field where there is heavy training i... 阅读全帖

发帖数: 1

来自主题: Statistics版 - 银行建模求助

一个scorecard模型，其他segment的KS都很高，只有一个segment比较低，这种情况该
怎么办？

E**********e
发帖数: 1736

来自主题: DataSciences版 - 做credit risk scorecard的朋友们，请进来，有问题求教 (转载)

转stackexchange一个讨论的帖子。
http://stats.stackexchange.com/questions/27750/feature-selectio

。

w*r
发帖数: 2421

来自主题: DataSciences版 - 做credit risk scorecard的朋友们，请进来，有问题求教 (转载)

很想知道你哪个公司的.你的这个问题基本没人会回答，做这件事情的人就靠这吃饭啊
，而且基本这就是公司的ip

E**********e
发帖数: 1736

来自主题: DataSciences版 - 做credit risk scorecard的朋友们，请进来，有问题求教 (转载)

我也知道。试试运气拉。其实也没是么大的秘密了吧。我这儿主要是sample size 有点
少。解决这个问题有点复杂点。半年后我就能搞定这个。

E**********e
发帖数: 1736

来自主题: DataSciences版 - 做credit risk scorecard的朋友们，请进来，有问题求教 (转载)

现在用的是lasso logistic regression. 具体用是么方法暂时可以不讨论，之所以分
成三份数据，是要把holdout 或数据三放到cross validation 里，最后的model用在数
据三可以得到一个分布，得到置信区间，而不是一个值。
cross validation 是用来检查建模方法可不可行。并不是用来建最后的model。原则是
用数据一来挑变量，数据二来测model可靠性，数据三来看model 的robust和稳定性。
然而数据一和二已经是预选后的变量。这些变量已经是看了response变量后选的，根据
IV，WOE，common sense and business sense. 毫无疑问，这里已经引入了bias。
我以前是这样做的，但是现在可以说已经确认了bias，model表现有点不稳。我就是想
和大侠聊聊，书上说最后变量缩小到7到15个，可不可行。在我的model里，用LASSO，
缩小到40到50，表现最好。但是每次cross validation里变量list都变，模性不稳定。
当然，我每回为了实现auto化，自动 binning。可是预先... 阅读全帖

g****e
发帖数: 1829

来自主题: DataSciences版 - 做credit risk scorecard的朋友们，请进来，有问题求教 (转载)

no offense，你说每句话都对，每句话都是背书。最后我都糊涂了。
lasso的 k fold可以拿来决定shrinkage，一般不用来选variable。因为每个fold选的
variable都不一样，你怎么知道那个留，哪个不留？如果你想用lasso，可以这样做：
1) 合并sample 1和2，这样你就只有一个大sample，以及一个hold out（原来的sample
3）
2）用第一步的大sample做k fold，看out of sample error，用来选 lasso的
shrinkage
2）用第一步的大sample全部data做一遍lasso，剩下的variable留下。
以上过程是你的training过程。
3）用第二步的结果用在sample 3（hold out）上，看 prediction，或者benchmarking
。我不知到你说的分布是怎么意思，是不是你bootstrap了？如果是，那么是有问题的。

E**********e
发帖数: 1736

来自主题: DataSciences版 - 做credit risk scorecard的朋友们，请进来，有问题求教 (转载)

刚进入这一行，没有大牛带。所以只能从书上的开始学习。只不过书上也是泛泛而谈
。不同的case，建模过程还是有点不一样。
lasso 的k fold可以决定一个shrinkage。自己也可以选一个。这本质上没有不同。
最后实在40-50个变量效果最好（对应的auc大，变量少）。你这个方法对应的只是一个
model。但是本身不能用来说你找到了一个好的model。 cross validation 是用来检
查你的modeling 是不是可行，是不是robust。一旦cross validation的model check
好的话，就可以用数据一，二和三建立最后的model，用来预测将来的data。
现在用数据一和二来建模，我完全同意，sample size 比较小。不过cross
validation 是尽量避免用数据一和二来预选变量（从原来的1000多原始变量），这不
是我说的， machine learning 上建议的，最好用unsupervise的的方法选变量（就是
不要看response variable），同时我建模过程中也察觉到这一点。
我估计大银行的mode... 阅读全帖

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天