B**r 发帖数: 311 | 1 Tang手下这样的master不少哈
X. Wang and X. Tang, "Random Sampling for Subspace Face Recognition," Int'l
J. Computer Vision (IJCV), vol. 70, no. 1, pp. 91-104, Oct. 2006.
X. Wang and X. Tang, "Hallucinating face by eigentransformation," IEEE Trans
. Systems, Man, and Cybernetics (SMC), Part C, vol. 35, no. 3, pp. 425-434,
Aug. 2005.
X. Wang and X. Tang, "Subspace Analysis Using Random Mixture Models," Proc.
IEEE Conf. Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 574-
580, Jun. 2005.
X. Wang... 阅读全帖 |
|
d****i 发帖数: 4809 | 2 多谢。请问如果直接上standard classifier的话,怎么对一组sequence进行分类? 就
是given x1, x2, ..., x_N, classify this sequence of vectors into either
class A or class B?问题初级,见笑。
label)
structure |
|
z**r 发帖数: 17771 | 3 what are the criteria for xml application classification? I think acceleration
is the goal, but the key point is how to classify the application. The
traditional layer 7 switch can classify the traffic based on the layer 3/4 and
the application(layer7) header, but not very deep packet inspection, if xml
routing/acceleration moves onto DPI and content basis, there would be huge
challenge, right?
consideration. |
|
o*****g 发帖数: 317 | 4 EVGA宣称,通过内部测试证实,GeForce GTX 480公版卡完全可以支持四卡SLI并联运行。
每一块GTX 480的PCB上都有两个桥接金手指,NVIDIA也宣传说它最多只可以再连接两块
卡,组建三卡SLI系统,不过EVGA设计了一个特殊的SLI桥接器,可以将四块GTX 480连
接在一起,组成四卡SLI系统,或者说四路SLI系统。
EVGA展示使用的主板是自家的X58 Classified 4-way SLI,不久前发布的双路工作站级
主板Classified SR-2同样支持,因为它们都配备了足够多的PCI-E x16扩展插槽和足够
宽敞的布局,可以为四块双插槽体积的GTX 480显卡提供充分的带宽和空间。 |
|
c****e 发帖数: 1453 | 5 public cloud还是挺贵的。先自己弄个机器弄个business plan在家里用一段时间。等
有人真正开始用你的service以后,再考虑EC2或者Azure.其实,传统的托管价格比较实
惠。关键是你的整个pipeline是怎么做的,比如你需要periodically run hadoop job
做traning产生model吗? 这又牵涉到你的data store用什么。 看你说的,应该是考虑
mongoDB,CouchDB之类的,总的来说,这方面EC2玩起来最flexible.
如果用微软的平台,可以不花钱加入bizspark.三年之内用windows server, sql
server license不要钱。用Azure有免费的package价值$60K.
decision tree, random forest之类的classifier很快.优化一下你的engine吧。model
可以放到memory cache里面.如果qps确实高,可以很容易增加你的role instance
number。
从cost优化的角度考虑,可以把request分tier做。如某些可以... 阅读全帖 |
|
c****e 发帖数: 1453 | 6 public cloud还是挺贵的。先自己弄个机器弄个business plan在家里用一段时间。等
有人真正开始用你的service以后,再考虑EC2或者Azure.其实,传统的托管价格比较实
惠。关键是你的整个pipeline是怎么做的,比如你需要periodically run hadoop job
做traning产生model吗? 这又牵涉到你的data store用什么。 看你说的,应该是考虑
mongoDB,CouchDB之类的,总的来说,这方面EC2玩起来最flexible.
如果用微软的平台,可以不花钱加入bizspark.三年之内用windows server, sql
server license不要钱。用Azure有免费的package价值$60K.
decision tree, random forest之类的classifier很快.优化一下你的engine吧。model
可以放到memory cache里面.如果qps确实高,可以很容易增加你的role instance
number。
从cost优化的角度考虑,可以把request分tier做。如某些可以... 阅读全帖 |
|
|
|
l*******s 发帖数: 1258 | 9 我是搞NLP的。
这个自动生成Regex 目前来讲NLP做不到
不知道你的具体任务是什么 基本来讲 可以这个思路:
1.用以前的Regex跑一边 得到一些结果 作为最初的training data
2.用这些training data建一个classifier,named entity recognizer之类的,取决于
你的应用
3.再进来新的数据,就用上一步得到的东西来处理。
4.把得分高的data point加回到training data,重新训练一遍classifier。慢慢的性
能就会越来越好。
感兴趣的话可以查一下active learning和Reinforcement learning |
|
m********n 发帖数: 15 | 10 Today I will briefly cover a very exciting topic, valuation of loans and
securities. As you may already understand, under FAS 115 securities can be
classified under held-to-maturity ("HTM"), available-for-sale ("AFS"), or
trading. Synonymously, mortgage loans can be classified as loans held for
investment ("LHFI") or loans held for sale ("LHFS") under FAS 65. In general
, there are two accounting models for valuation: 1) amortized cost with
LOCOM; 2) fair value.
Assuming fair value option is not |
|
d**********s 发帖数: 103 | 11 请大家帮忙看看这个NFP的问题
27. Which of the following is most likely to be classified as a
nongovernment not-for-profit vs. a governmental organization:
a. Hospital with the authority to issue debt that pays interest exempt
from federal taxation.
b. Hospital without the authority to issue debt that pays interest exempt
from federal taxation.
c. Hospital that properly utilizes GASB accounting standards
d. It is not possible for any of these organizations to be classified as
a nongovernment n |
|
c****t 发帖数: 19049 | 12 Emblem是个total垃圾;classifier还有啊,不过是另外的license; Emblem和
Classifier都是$3,500 per license per year; 如SAS是server license,有许多users
,平均下来比一个Emblem license可能都要省
Emblem是UK的一家软件公司搞的,现在也是。TW不过是sales front。伊popular是借着
这两年GLM的名头。Emblem里面有很多P&C insurer常用的(老旧的)modeling
procedures,就像SAS里有很多(非)Statistician喜欢的tests
就像受过编程training的都认为SAS是垃圾;受过statistics training的人是绝不会用
Emblem这种垃圾去build model的。
谁告诉你SAS能弄everything about GLM的。P&C insurance data最少也是repeated
measurements, nested data, high heterogeneity, zero-inflated, ... 阅读全帖 |
|
f**********e 发帖数: 1994 | 13 你的 sampling method 有一个问题:你只听了 positive cases. 做过 classifier
的人都知道,你的 training set 里头一定要有够多的 negative cases, classifier
才会准。你应该也同时找找一些冲击过 market, 但最后灰头土脸的千老。
. |
|
k****o 发帖数: 728 | 14 一个Proteomics/Genomics 方面的open access journal,我正在handle一个稿子,需
要寻找几个reviewers。文章title/abstract如下。杂志也不是多好,但也能派上用场
。感兴趣的可站内联系,我需要尽快发出邀请,已经拖了几天了。
请给出name, institute, position, email, expertise, and representative
publications related to this manuscript.
Article Title : Classification of non-animals and invertebrates based on
amino acid composition of complete mitochondrial genomes
Abstract :
Amino acid compositions were predicted from data sets of 47 non-animal and
58 invertebrate animal complete mit... 阅读全帖 |
|
D*a 发帖数: 6830 | 15 Red meat
After thoroughly reviewing the accumulated scientific literature, a Working
Group of 22 experts from 10 countries convened by the IARC Monographs
Programme classified the consumption of red meat as probably carcinogenic to
humans (Group 2A), based on limited evidence that the consumption of red
meat causes cancer in humans and strong mechanistic evidence supporting a
carcinogenic effect.
This association was observed mainly for colorectal cancer, but associations
were also seen for panc... 阅读全帖 |
|
发帖数: 1 | 16 非故意的,举个例子,
比如某人做了个machine learning的classifier,用个cross validation检测,效果很
好发了一篇paper。由于对数据来源有一些不了解,过了几年之后,发现原来用的数据
,里边有duplication(这是很正常了,现在很多高通量数据一个样品都测两三次,而
对ML来说,尽可能多的搜集数据也是很重要的,如果不是特别小心,的确很有可能出现
这种情况),比如一个样品被测了3次,三次基本信号都差不多,但在cross
validation的时候,可能2次分到了training里边一次分到了testing里边,这样
testing的准确度当然很高,但实际来说这是个overfitting的model,classifier的表
现根本没原文那么好?这种情况算不算造假?需要撤稿吗?随着发文章带来的一系列
benefit,比如PhD degree和绿卡等,是否要被取消? |
|
发帖数: 1 | 17 等到有人说这不算造假,接下来就趁势宣布2.14亿经费和副部待遇的合理性是不是?
如果这么打算,下面这段需要修正:
“比如一个样品被测了3次,三次基本信号都差不多,但在cross
validation的时候,可能2次分到了training里边一次分到了testing里边,这样
testing的准确度当然很高,但实际来说这是个overfitting的model,classifier的表
现根本没原文那么好?”
为:
“比如共有十几个样品,每一个样品被测了3次,三次基本信号都差不多,但在cross
validation的时候,每一个样品都可能2次分到了training里边一次分到了testing里边
,这样testing的准确度当然很高,但实际来说这是个overfitting的model,
classifier的表
现根本没原文那么好?” |
|
发帖数: 1 | 18 非故意的,举个例子,
比如某人做了个machine learning的classifier,用个cross validation检测,效果很
好发了一篇paper。由于对数据来源有一些不了解,过了几年之后,发现原来用的数据
,里边有duplication(这是很正常了,现在很多高通量数据一个样品都测两三次,而
对ML来说,尽可能多的搜集数据也是很重要的,如果不是特别小心,的确很有可能出现
这种情况),比如一个样品被测了3次,三次基本信号都差不多,但在cross
validation的时候,可能2次分到了training里边一次分到了testing里边,这样
testing的准确度当然很高,但实际来说这是个overfitting的model,classifier的表
现根本没原文那么好?这种情况算不算造假?需要撤稿吗?随着发文章带来的一系列
benefit,比如PhD degree和绿卡等,是否要被取消? |
|
发帖数: 1 | 19 等到有人说这不算造假,接下来就趁势宣布2.14亿经费和副部待遇的合理性是不是?
如果这么打算,下面这段需要修正:
“比如一个样品被测了3次,三次基本信号都差不多,但在cross
validation的时候,可能2次分到了training里边一次分到了testing里边,这样
testing的准确度当然很高,但实际来说这是个overfitting的model,classifier的表
现根本没原文那么好?”
为:
“比如共有十几个样品,每一个样品被测了3次,三次基本信号都差不多,但在cross
validation的时候,每一个样品都可能2次分到了training里边一次分到了testing里边
,这样testing的准确度当然很高,但实际来说这是个overfitting的model,
classifier的表
现根本没原文那么好?” |
|
s******s 发帖数: 13035 | 20 经常有这个需要,比如做population stratefication或者其他molecular profiling的
时候,
1. 有10000个点,每个点有x, y ,可以在2D plot上显示出来。
2. 有或者没有一些已知classification的点,比如1000个点,知道population的分类,
可以通过不同的颜色在2D plot上显示
需要可以通过鼠标在plot选不同的区域用人眼classify,然后最后告诉我每个点在那个
cluster.
我知道有些R/python可视化package应该可以实现这个,不过有没有具体的sample code。
如果有现成的工具或者网站那就更好了。
另外,就population stratefication来说,有没有啥根据smartpca好的classifier?
我试了
一下knn, 对1kg的选择还行,对我的population assignment看起来有很多不对的地方。
感觉上可能手动更加准一点,也在看看tsne等其他方法,大家有推荐方法么? |
|
H*****e 发帖数: 160 | 21 计算operation cash flow的时候,depreciation应不应当算在operation cash flow?to
my understanding, it should not be classified as operation cash flow. the
question is , in 2008 notes, concept checkers question 1, depreciation is
classified as part of the flow. |
|
s**u 发帖数: 9035 | 22 发信人: oxthigh (老是肚饿), 信区: FDU
标 题: Re: 千青的真实性如何?(ZZ)
发信站: BBS 未名空间站 (Mon Jan 9 10:44:11 2012, 美东)
wigner fellow(专门给新加入的博后的)在ORNL还是不错的荣誉
至于utk的adjunct position,大部分都是挂个名而已,没工资、没福利(
subcontractor除外)、没学生、不教课、没有办公室、没有信箱,搞不好连物理系
main office在那座楼都弄不清楚
ORNL其实对不涉及classified研究的staff(当然外国人基本上不可能在classified部
门任职)的outside activity还是很宽容的,毕竟是一个学术机构,但是前提是你别偷
偷摸摸,我知道有光明正大9个月国内,三个月ornl的兼职中国科学家(很多是放不下
这头的孩子)
至于ornl的网页,根本没人管,很多staff都没有网页,有的人走了都N年了还挂在那里 |
|
n**o 发帖数: 11 | 23 I need to classify the recessions from a discrete AR(1) process or a
continuous geometric Brownian motion. Because I do not explicitly specify
the regimes of recessions and booms in the model, I am thinking about using
the shocks from the AR model to determine the recession periods by following
the leading economic index (LEI) proposed by the conference board.
However, I can not find their documents on how they determine the recessions
. From google search, some news articles read that the confe... 阅读全帖 |
|
r*****u 发帖数: 94 | 24 Seeking Post Doctors and Ph.D. students in Wireless Communications/
Networking in the EECS Department at the University of Kansas (KU)
About the University: University of Kansas (KU) is one of the oldest
universities in the United States and is the flagship university in the
state of Kansas. KU is classified by USNews as a tier 1 university and is
classified by Carnegie Foundation as RU/VH (research university with very
high research activity). The university is also one of 63 members of the
Ass... 阅读全帖 |
|
N********y 发帖数: 15 | 25 Episodes of urticaria confined to a 6-week period are classified as acute
and are commonly
caused by an acute infection, medication, food, or pollen. Episodes of
urticaria that last longer than 6 weeks are classified as chronic. Many
patients with chronic urticaria have an IgG antibody to the IgE receptor;
others are reacting to a chronic infection or ingestant. As many as 50% of
cases of chronic urticaria have no identifiable cause.
it will go away, i had that for 2 month. |
|
s**u 发帖数: 9035 | 26 发信人: oxthigh (老是肚饿), 信区: FDU
标 题: Re: 千青的真实性如何?(ZZ)
发信站: BBS 未名空间站 (Mon Jan 9 10:44:11 2012, 美东)
wigner fellow(专门给新加入的博后的)在ORNL还是不错的荣誉
至于utk的adjunct position,大部分都是挂个名而已,没工资、没福利(
subcontractor除外)、没学生、不教课、没有办公室、没有信箱,搞不好连物理系
main office在那座楼都弄不清楚
ORNL其实对不涉及classified研究的staff(当然外国人基本上不可能在classified部
门任职)的outside activity还是很宽容的,毕竟是一个学术机构,但是前提是你别偷
偷摸摸,我知道有光明正大9个月国内,三个月ornl的兼职中国科学家(很多是放不下
这头的孩子)
至于ornl的网页,根本没人管,很多staff都没有网页,有的人走了都N年了还挂在那里 |
|
t*****9 发帖数: 33 | 27 印度人顾问公司,但好处是当lead data science第一人,和入职后直接跟大银行做 (
like GS,JPM,MS),
project已谈好。现在就要人,有兴趣的朋友可以试试。MS/PhD 加四年经验。
-Need for a data scientist in NYC to build some scorecards for a lending
business
-The person should have knowledge of consumer lending, credit bureaus and
predictive modeling.
-Developing data-drives decision to build consumer credit origination/
underwriting scorecards.
· Work with the teams to build the credit scoring, pricing, and risk
scorecards
· Implement and test data-mining project... 阅读全帖 |
|
u******3 发帖数: 11 | 28 To test my first hypothesis, I would like to classify firms into firms with
diffuse ownership (most shareholders are individuals) and firms with
concentrated ownership (most shareholders are institutions). The variable IS
is used to refer to ownership. IS will be zero (one) if a firm is marked by
diffuse ownership (concentrated ownership).
To test my second hypothesis, I would further classify firms with
concentrated ownership (IS=1)into firms with most shareholders being short-
term oriented sh |
|
A*******s 发帖数: 3942 | 29 对了,我想问一个问题--一拨人说SVM是best classifier ever,另一拨人说boosting
tree是best off the shelf classifier... 这俩得对比有定论么? |
|
A*******s 发帖数: 3942 | 30 对了,我想问一个问题--一拨人说SVM是best classifier ever,另一拨人说boosting
tree是best off the shelf classifier... 这俩得对比有定论么? |
|
o*******e 发帖数: 61 | 31 Dear All,
I have two questions as follows:(need to find some method/statistics )
1. How to determinate an numeric variable is classified(binning) variable.
2 . How to run classified(binning) variables in Logistic Regression in SAS?
(use class statement??)
Thank you very much!!! |
|
p********a 发帖数: 5352 | 32 ☆─────────────────────────────────────☆
tamuer (hoho) 于 (Fri Oct 21 20:41:07 2011, 美东) 提到:
前面有人问bootstrap能不能更逼近真理。我不是这方面的专家,但是发表一点自己的简
介,希望和大家交流一下吧。
总体而言, 我觉得使用bootstrap不能说是逼近真理。但是有的时候,确实比不做boot
strap比更好,或者比使用单一样本更接近真理。大家都觉得bootstrap没有真正用处的
原因是觉得所有bootstrap重复抽样的样本都是从一个样本里出来的,所以用bootstrap
的效果不会比使用原来的样本好多少。这一点我也同意。
但是从另外一个角度来说,一个样本里面包含的信息是很丰富的,我们是否已经完全利
用了现有样本里面的信息呢?最简单的例子来说,一个样本,很多时候我们用就用samp
le mean来summarize样本信息,但是使用sample mean的时候又忽视了多少样本中原来的
信息呢? 比如各种quantile的信息之类。 类似的,换一个角度来说,bootstrap是在... 阅读全帖 |
|
R*****d 发帖数: 420 | 33 1.什么东西影响了marketing?
2.如果我想预测汽车的销售量,要建立一个回归模型去预测,想放一些非marketing方
面的variables到这个回归模型里做independent variables,你准备放哪些variables?
我现在都没搞清这里非marketing方面的variables指的是哪些种variables?是除了像
apr 这种有点技术含量的finance方面的variables吗?
3.常用的classifier有哪些?衡量一个classifier好坏的rule是什么?
欢迎大家讨论。希望对大家有帮助。 |
|
B****n 发帖数: 11290 | 34 3. 分的好(或說錯誤率低)的classifier就是好的阿 呵呵
常用的classifier太多了 建議看看machine learning的書
logistic regression, bayesian (posterior probability), neural network, fisher discriminant analysis, support vector machine, boosting, random forest |
|
b******2 发帖数: 654 | 35 有175个event,1059个non-event, 用conditional tree做prediction的结果(
classification table 如下) 太奇怪了!! 为什么175个event被 conditional tree
classify 为non-event? 1059个non-event全被classify为non-event???
> iv = ctree(aa ~ a + b+ c+d+e+f+g+h+i+j+k+l+m+n, data=f)
>
> plot(iv)
>
> table(predict(iv), f$aa)
No Yes
No 1059 175
Yes 0 0
新手一枚, 望大家帮帮忙!! 万分感谢!! |
|
s****e 发帖数: 1180 | 36 你为什么要classify?数据都是like的数据。就是说所有的人在这个数据里都like这个
movie.你要classify的话,感觉你的理解是在数据里,有人like,有人不like。 |
|
s**********8 发帖数: 25265 | 37 Federal Food, Drug, and Cosmetic ActFrom Wikipedia, the free encyclopedia (
Redirected from 510k)
Jump to: navigation, search
This article may require cleanup to meet Wikipedia's quality standards.
Please improve this article if you can. The talk page may contain
suggestions. (February 2011)
Federal Food, Drug, and Cosmetic Act
Acronym FFDCA, "FD&C Act"
Enacted by the 75th United States Congress
Citations
Public Law 75-717
Stat. 52 US Stat. 1040
Codification
U.S.C. sections created 21 U.S.C. §... 阅读全帖 |
|
a**e 发帖数: 5094 | 38 我的论据如下:
wiki "upper class in US"
"The main distinguishing feature of upper class is its ability to derive
enormous incomes from wealth through techniques such as investment and money
management, rather than simply engaging in wage-labor or salaried
employment.[2][3][4] Successful entrepreneurs, CEOs, politicians, investment
bankers, some lawyers and top flight physicians, heirs to fortunes,
successful venture capitalists, stockbrokers as well as celebrities are
considered members of this class b... 阅读全帖 |
|
e*******n 发帖数: 872 | 39 原题见
http://www.mitbbs.com/article_t/DataSciences/10029.html
专门开一个贴,尝试逐题解答。本人菜鸟,求大牛指导
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
1. Given a coin you don’t know it’s fair or unfair. Throw it 6 times and
get 1 tail and 5 head. Determine whether it’s fair or not. What’s your
confidence value?
我的答案是:
H0: the coin is fair
Ha: the coin is unfair
X is the number of heads
Rejection region: |X - 3| > 2, i.e., X = 0,1,5,or 6
significance level alpha:
alpha = P(reject H0 | H0 i... 阅读全帖 |
|
w**2 发帖数: 147 | 40 这题考的是evaluation metrics,像precision, recall, f1之类的。给的数据是
target variable是predicted target variable。
从confusion matrix入手。分析一下根据business model,是optimize precison还是
recall。然后具体怎么做。
为什么不是accuracy,你可以说accuracy可以很高,error很低的情况可能classifier
总是predict majority。
如果又多个model,你可以比较他们的auc score, indicating how well a classifier
separate the two classes
不要担心,多面几次就好了。 |
|
t*****9 发帖数: 33 | 41 印度人顾问公司,但好处是当lead data science第一人,和入职后直接跟大银行做 (
like GS,JPM,MS),
project已谈好。现在就要人,有兴趣的朋友可以试试。MS/PhD 加四年经验。
-Need for a data scientist in NYC to build some scorecards for a lending
business
-The person should have knowledge of consumer lending, credit bureaus and
predictive modeling.
-Developing data-drives decision to build consumer credit origination/
underwriting scorecards.
· Work with the teams to build the credit scoring, pricing, and risk
scorecards
· Implement and test data-mining project... 阅读全帖 |
|
|
|
c*********r 发帖数: 19468 | 44 come on
the Ford Contour they tested weighs 2851 lbs
(they do publish the curb weight of the vehicle they tested)
that is not terribly light for its years (95-00).
in contrast, the 98-02 Accord weighs 3047 lbs
(at that time, the Jetta was still classified as a small car)
the difference is less than 200 lbs
with their adjustment, I'm pretty sure they can make it less than 100 lbs
now they put the Jetta to the midsize group, where it's a lighter car
at the same time, the Impreza is still classifie |
|
w*******y 发帖数: 60932 | 45 Great fire protective storage for birth certificates, collectibles, and
other important documents. Sentry's fire-safe security chest features UL
classified 1/2 hour fire and explosion protection so you can rest assured
your most valued documents will always be safe. Also comes with a key lock
for privacy, and a convenient carrying handle.
SentrySafe 1100 1/2 Hour FIRE-SAFE Chest, 0.18 Cubic Feet $15
Get them while they last
at walmart:
http://www.walmart.com/ip/Sentry-Fire-Safe-reg-Security-Ches... 阅读全帖 |
|
d****7 发帖数: 2241 | 46 【 以下文字转载自 PraticalTrading 俱乐部 】
发信人: googer (cutie), 信区: PraticalTrading
标 题: 10 Bank Stocks Trading Below Book Value
发信站: BBS 未名空间站 (Fri Jun 18 16:41:23 2010, 美东)
这篇文章对几个小银行做了一个比较好的介绍,包括 fbp, crbc, pcbc, bbx, prwt,
snv, bpop, cse, mi & c. 每个包括目前状态,资本充足与否,可能风险,资产品质以及可
能前景。 可做参考。
10 Bank Stocks Trading Below Book Value
FBP
First Bancorp (FBP) of San Juan, Puerto Rico was trading for just 0.2 times
tangible book value on Thursday, when shares closed at $1.06, down 54% year-
to-date.
TARP Status: ... 阅读全帖 |
|