由买买提看人间百态

topics

全部话题 - 话题: subsetting
首页 上页 1 2 3 4 5 6 7 8 9 10 下页 末页 (共10页)
S*A
发帖数: 7142
1
来自主题: Programming版 - 大牛们对Object C有何点评?
简单的补充一下,C 是 Obj C 的一个严格 Subset.
但是 C 不是 C++ 的一个严格 Subset. C99 里面有
些东西和 C++ 有比较严重的语法冲突。
e****d
发帖数: 333
2
来自主题: Programming版 - linux, find command question
that does not work. we don't want to discard all path including timestamp
we want to have a subset of path which include timestamp.
The subset are the parent level of timestamp folders. if any timestamp file
is in a subfolder, the subfolder will be discarded in the results.
for example:
/parent/timestamp
/parent/children/timestamp
we want to have only /parent/timestamp in results.
thanks.
p*******n
发帖数: 2697
3
来自主题: Programming版 - 请教各路大神一个算法问题
第一遍扫,就是为了能找到缩小的subset
第二遍扫,才是在subset里二分
为啥不能像你说的那样预估,是因为那样不准呀
至于是以2为底递增,还是10,还是1024,还是更大完全取决于你的实际情况。先预处
理扫一遍只是一个处理的思路。
N********n
发帖数: 8363
4
来自主题: Programming版 - 求问个C# gc的问题

sorting
If all you do is filter/sort, you could distribute the work load
onto different nodes with each one holding a subset of data. Let
each node F/S the subset and then merge the sorted results from
them. Merging sorted lists are reasonably fast.
sg
发帖数: 83
5
来自主题: XML版 - There is something about XML
Just read some posts on this board and I believe some points there
do not make sense at all.
1. Someone claimed that ASN.1 is better than XML
Now ASN.1 is essentially a primitive data declaration language. Like much
of OSI, it is large, complex, and not especially efficient. Comparing to
ASN.1, XML is much more user-friendly and efficient. For your information,
XML is a subset of SGML(something IBM developped in 1970's), the main
reason to subset SGML is for usability and efficiency.
2. Why use
p**t
发帖数: 160
6
来自主题: Biology版 - mouse LN immunostaining 的问题
Please read this paper, you are misleading others.
-----------J Innate Immun. 2012;4(5-6):424-36. doi: 10.1159/000337007. Epub
2012 Apr 4.
Lymph node macrophages.
Gray EE, Cyster JG.
Source
Howard Hughes Medical Institute and Department of Microbiology and
Immunology, University of California San Francisco, San Francisco, CA 94143-
0414, USA.
Abstract
Lymph node (LN) macrophages have long been known for their efficient uptake
of lymph-borne antigens. A convergence of studies on innate and adapti... 阅读全帖
z*********8
发帖数: 1203
7
来自主题: Biology版 - 请教关于Regulatory T cells的问题
1.cd25是il-2 receptor,cd25 hi只能说明T cell activated 状态,不能用cd25来充当
foxp3 treg的surrogate marker。 treg也有foxp3+和foxp3-
2. 我不是太懂treg,但是我感觉你用helios和用neuropilin的区别是前者你FACS时候
得用intracellular staining,后者是surface marker直接可以FACS。helios 和
neuropilin尤其是后者是前不久才发现的,能不能可靠得作为2者得区别还要经过时间
考验。
3. Treg本身要想阔增得话,我想il-2肯定需要,别的不知道,需要TGF-beta etc么?
不懂,不好意思啊!t cell subset有很多plasticity,各种subset identity未必那么
清楚的
4. 没有真正做过treg的试验,不知道。
以上都是上课时候学过的,不保证100%涵盖所有正确答案
s****9
发帖数: 932
8
1. Do you see the B cell percentage increase in spleen,blood, and LNs?
2. Do you see the B cell subset changes in the lymphoid organs? MZ B cells,
FO B cells, GC B cells. In the BM, there are quite many immature B cells.
Which subset of B cells preferentially increases in the BM?
3. Do you see the level of changes in IgG, IgM, IgA in serum?
4. Is the effect B cell intrinsic? Say, to have mixed chimera, do you see
selective KO B cell accumulation.
No one can give you any answer based on your l... 阅读全帖
s******s
发帖数: 13035
9
不太明白这个均值为1有啥用途,可能是让你提到有些algorithm
要把predictor normalize吧
没做过biomarker, 不过这题目不是让你建模,而是让你找subset.
找subset要么自动找,要么手动stepwise找。前者可以用用lasso
啥的,后者就是把p-value大的,或者information gain多的predictor
一个一个加回去,直到predictor power(比如AUC)不增加为止。当
然最后要一个binary classifier, 那就在外面wrap一个logistic regression
就行了,比如logistic lasso.
机器学习玩的不多,有了feature selection,后面搞点naive bayes
或者decision tree就好了。random forest, neuron networks啥的不说
不管feature selection, 另外这些花样的结果比较难解释,药厂一般
不会搞这些吧。一般偏clinical side的都要容易解释.

现有
得范
达值
r********r
发帖数: 11248
10
NP is the set of problems which can be verified in poly-time, and P is a subset
of NP, and whether it's a proper subset or not is currently the biggest open
problem in computer science.
An example of non-NP was given in my previous posts, like the famous
halting problem {: program will never terminate} does not belong to
NP, coz you cannot verify a solution in poly-time.
NP-hard is the basically the problems "harder than"(at least as hard as)
NP problems. In scicentific defition, if a p
S**********n
发帖数: 250
11
来自主题: EE版 - Verilog vs VHDL
I am speaking for Verilog
"Verilog里wire和reg的定义都是全局的"
we are not talking about software programming here.
Both Veilog and VHDL are hardware DESCRIPTION language.
So, the only important thing is that you, as the circuit developer, make
sure your requirement analysis and circuit design are correct, then just
pick any language to DESCRIBE it.
That said, only a very small subset of Verilog is used by an engineer to
DESCRIBE a circuit.
Verilog can be viewed as two parts: synthesizable Verilog and non-... 阅读全帖
D******D
发帖数: 1022
12
I don't disagree with you on this. All I am suggesting if you have the
potential to get EB1B or EB1A, get GC ASAP then get into law. Life will be
much easier. There are tons of companies qualifies as EB1B sponsors, as
you already point out in your argument, one of the biggest drawback is no
EB1b for law firm. ( Essentially I am speaking arguing for one subset
people who could easily qualified for 1b or 1a, you are arguing for
another subset of people.)

law
don't
companies
D******D
发帖数: 1022
13
I don't disagree with you on this. All I am suggesting if you have the
potential to get EB1B or EB1A, get GC ASAP then get into law. Life will be
much easier. There are tons of companies qualifies as EB1B sponsors, as
you already point out in your argument, one of the biggest drawback is no
EB1b for law firm. ( Essentially I am speaking arguing for one subset
people who could easily qualified for 1b or 1a, you are arguing for
another subset of people.)

law
don't
companies
b****t
发帖数: 114
14
来自主题: Mathematics版 - a simple top question please?
How do I call a subset of R^n which is "dense" in the sense that
it has all the rational and irrational numbers within a bound?
I hope I present the Q clearily.
e.g. [1 2 3 4] is a subset of R, but if I have a set [1, 4] that has all the
real numbers in between 1 and 4. I cannot say this set[1,4] is dense in R
right?
What is an appropriate name for this set? complete or something else?
I am a newbie in pure math science. Thanks for your help.
Beet
s*****p
发帖数: 299
15
来自主题: Mathematics版 - 请教一个关于compact sets 的证明
Using the definition of compactness to prove: If s is a compact subset of R
and T is a closed subset of S, then T is compact.
Definition of compact set: A set S is said to be compact if whenever it is
contained in the union of a family F of open sets, then it is contained in
the union of some finite number of the sets in F. If F is a family of open
sets whose union contains S, then F is called an open cover of S. If Y<=F
and Y is also an open cover of S, then Y is called a sub cover of S. thus S
s**e
发帖数: 15
16
你看这个是不是:let X be the set of all real numbers. Y={A|A is a subset of
X, A or X\A is countable}. (X,Y) is not a topological space. Suppose it is a
topological space, then any one point subset is open. It follows that
interval [0,1] is open, which is not in Y. \sigma algebra 要求可数并, 拓扑
空间要求任意并,这大概是二者的本质差别。
e*******6
发帖数: 13
17
请教怎么证明这道题,没有想出来...
For a set E
1) m*(O-E) 2) m*(E-F) Prove they are equivalent.
我还有一个问题,是证明任何measurable function都是limit of a sequence of
continuous functions almost everywhere.
这是一道作业题,我用step function approximate measurable function. 然后把这
个step function sequence改成continuous function,感觉不严密。我找到的书上的证
明都在GTM里面用到functional analysis...想请教高人有没有更accessible的方法
w**********a
发帖数: 107
18

但是,连通分支的定义是,The maximal connected subsets of any topological
space are called the connected components of the space.Since this subset is
compact and connected, it is closed and bounded. I cann't figure out why
there are more than one connected components of its complement.
A****s
发帖数: 129
19
来自主题: Mathematics版 - a problem
【 以下文字转载自 Quant 讨论区 】
发信人: Allens (ffff), 信区: Quant
标 题: a problem
发信站: BBS 未名空间站 (Mon Jan 18 21:16:40 2010, 美东)
Consider a cartesian product over n given finite sets,
X=X1×X2×。。。×Xn
Try to prove that X cannot be partitioned into less than 2^n subsets,
each of which is in form of Y1×Y2×。。。×Yn, Y1...Yn are proper subsets
of X1...Xn, respectively.
N*******e
发帖数: 10
20
来自主题: Mathematics版 - 紧急求教一道实变函数题
请大侠指点,多谢!
Let (X, d) be a metric space. For nonempty subsets E, F in X, the distance
between E and F is defined to be dist(E,F) = inf{d(x,y): x in E, y in F}
Suppose E and F are disjoint closed subsets of X, E is compact, and X = R^n,
prove that there exist x in E and y in F such that d(x,y) = dist(E,F)
h********0
发帖数: 12056
21
来自主题: Mathematics版 - 请教:非线性子空间的线性变换
it seems the subset you described is not a subspace
since the subset is obtained from a set of nonlinear constraints.
k*********g
发帖数: 791
22
我觉得是。
当然,过去对以太的理解是不对的:整个宇宙空间铁板一块。
那种理解原自classical mechanics:rigid body classical mechanics;
实际上,几乎任何东西都会变形、移动;况且,“会变形、移动”是superset,“不变
、不动”是subset;在没有任何假设的情况下,我们应该先默认superset,使用
superset不需要理由;如果想使用subset,则需要给出理由;
所以,以太应该是流动的、会变形的、分布不均匀的东西;而不说铁板一块;
暗物质好像也是这样。
w**********y
发帖数: 1691
23
来自主题: Quant版 - R中By函数是什么意思 (转载)
自己在R中:
?by
不就行了。。。奇怪。。
Apply a Function to a Data Frame split by Factors
Description
Function by is an object-oriented wrapper for tapply applied to data frames.
Usage
by(data, INDICES, FUN, ...)
Arguments
data an R object, normally a data frame, possibly a matrix.
INDICES a factor or a list of factors, each of length nrow(data).
FUN a function to be applied to data frame subsets of data.
... further arguments to FUN.
Details
A data frame is split by row into data frames subsetted by the values
p**e
发帖数: 41
24
来自主题: Quant版 - 再向大家请教一道面试题
还有一道, 不难, 但是做出来不知道对不对, 因为result不简洁
Consider a set SIGMA with N distinct members, and a function f defined on
SIGMA tha takes the values 0 or 1 such that (1/N)* sum_{x belongs to SIGMA}
f(x) = p.
For a subset S belonging to SIGMA of size n, define the sample proportion:
q:= q(S) = (1/n)* sum{x belongs to S} f(x)
If each subset of size n is chosen with equal probability, calculate the
expectation and standard deviation of the r.v. q

competition.
x******a
发帖数: 6336
25
A'= the transpose of A?
the null spaces should be the same?
Ax=0 => A'Ax=0 => Null(A)\subset Null (A'A)
A'Ay=0 => y'A'Ay=0, i.e, (Ay, Ay)=0 => Ay=0
=> Null(A'A)\subset Null(A).
m*******r
发帖数: 98
26
Intput:
a graph G(V,E), two vertices s and t, and k subsets of V: S1, S2,...,Sk
Output:
is there a path from s to t that covers at least one vertex from each
subset?
Show this problem is NP-complete.
Hint: reduce from 3SAT.
想了一个晚上没想出来,大家看看
l*******l
发帖数: 248
27
来自主题: Quant版 - 问编程题若干
1.Suppose you have the following code:
void function() {
int * =new(a);
int *= new(b);
a=b;
delete(a);
delete(b);
return(a);
}
What are all the problems with this code? (There might be more than one)
2.Let a[n] be a sorted array of integers. Let b be an integer.
Write an (efficient) algorithm which checks whether there are indices i,j
such that a[i]+a[j]=b .
3.Write an algorithm which outputs the set of all subsets of the list {1,2,.
...,n}. How many subsets will there be?
4.Write an algorithm ... 阅读全帖
r**a
发帖数: 536
28
来自主题: Quant版 - a stochastic calculus problem
恩,我那个表述还是有问题。不过我先来回答你的这个问题“W(u)-W(t) is
independent of F(t) if u>t”
我们令X_u(\omega)=W_u(\omega)-W_t(\omega). 那么此时A\in F_u的选取需要使得X_u
可测,也就是说要同时满足W_u可测和W_t可测,那么实际上需要A\in F_t\subset F_u.
这样E(1_A|F_t)=1_A itself. So now \int_B E(1_A|F_t) dP(\omega)=\int_B 1_A
dP(\omega)=P(A\cap B) =\int_B P(1_A)(\omega) dP(\omega), where P(1_A)=E(1_A|
F_u)(\omega) is a constant function of \omega\in A. Further, since A\in F_t
, so we can conclude that E(1_A|F_u)(\omega)=E(1_A|F_t)(\omega)
关于你最原始的问题“W(u) is independe... 阅读全帖
k*****a
发帖数: 5
29
来自主题: Quant版 - Interview question help --set partion
Let N be an arbitrary set of ten distinct two-digit numbers.
Let X be the set of all subsets of N.
For any S \in X, let sigma(S) be the sum of numbers in S.
|X| = 2^10
| { sigma(S) : S \in X } | <= (90+99)*10/2 < 2^10
Therefore, there must be two distinct S1, S2 \in X such that sigma(S1) =
sigma(S2). Subtract away from S1 and S2 their intersection, respectively:
this gives us the required two subsets.
t********t
发帖数: 1264
30
大型卡商的data虽然只是subset,但是绝对算representative subset,加上适当
rescale就能推测出合理销售额。预测trend的准确性应该更高。做betting,不就是博
trend和market expectation的高低嘛。用这个数据,即使有风险也是极低的

sold
c*****x
发帖数: 429
31
来自主题: Science版 - Re: 生气、请教数学问题
The analytic solution of feasible a_i's might be
very hard to get. But one knows such set of
(a_1,...,a_k) is convex as a subset of R^k
since it is the (infinite) intersection
of half spaces of R^k of the form:
1+\sum_{n=1}^k a_n cos(nt)>=0
over all t in [0,2\pi]. Actually, one of the
following source (forget which one) contains
a very nice plot of this subset in the
case k=2:
http://www-inst.eecs.berkeley.edu/~ee290n/handouts.html
http://www.stanford.edu/class/ee364/
This is useful only if you
t***r
发帖数: 3
32
来自主题: Science版 - Re: simple problem
In the case sigma={0,1}, one choice can be the subset of all
strings of length n that contain even (or alternatively,
odd) number of 1's. There is no hope of finding a subset
containing more than 2^(n-1) strings that satisfies the
specified condition (just organize all 2^n strings into
pairs, each pair consisting of two strings that differ at
exactly one position. For example, xxxx0 paired with xxx1,
xxx ranges over all combinations. Then each set satisfying
the specified condition may contain a
y*****u
发帖数: 1
33
【 以下文字转载自 JobHunting 讨论区,原文如下 】
发信人: yuckfou (法可有), 信区: JobHunting
标 题: 一道非常难的面试题(JP Morgan Chase & Co),至今不得其解,请高手支招。
发信站: The unknown SPACE (Wed Jul 10 14:27:56 2002) WWW-POST
一道非常难的面试题,至今不得其解,请高手支招。
上个月去JP Morgan Chase & Co面试一个financial analyst的职位,被问倒。
现有一大堆数据 S (for example, around millions of positive integer numbers),
杂乱无章。当然你可以读取所有的数字来计算出一个准确分布曲线(histogram or
distribution)。
考虑这样一种情况:取出一组随机的数,i.e.,a random subset of the whole set。
理论上这组数字subset应该和whole set具有大致相同的分布曲线,但有一定的偏差。
如果我们已知最大可接受的分
s*********d
发帖数: 35
34
There is solution for Fortran , C and matlab.
http://www.csit.fsu.edu/~burkardt/cpp_src/subset/subset.html


l*****k
发帖数: 587
35
来自主题: Statistics版 - How to selectively plot data in R
I guess you can write an easy function that takes arguement
like CA, AC or FL, subset it in function then do the plot
you can make your funciton accepts multiple arguments so you can
do more complex subset and plotting.
you can also let your program iterate through all the varialbes,
plot to pdf or jpg file, so it runs automatically.
y****e
发帖数: 25
36
来自主题: Statistics版 - another R 问题
有个list,有些component是其他component的subset
有没有简单R command 除去subset
比如
A = list(c(1,2,3),c(2,3),c(1,3),c(3,4,5))
想得到
A = list(c(1,2,3),c(3,4,5))
谢谢!
s********p
发帖数: 637
37
来自主题: Statistics版 - 海量SAS data的处理
Thanks! But I said after data clean and subset, than random sampling.
Maybe you don't know what subset mean!
k*******a
发帖数: 772
38
土办法
先subset 奇数行
在subset 偶数行
然后把他们merge就可以啦
s*******o
发帖数: 392
39
来自主题: Statistics版 - logistic regression结果释疑,解读

hi,man, thanks lot for your reply first.
you are right, subset is a nice way. Correct me if I am wrong:
Subset: if you have 160 variables, the sas would go to see what is best for
modeling with 1 variable, what is best with 2 vars, what is best.....what is
best for 159 vars. And overall, it compare to see which is best out of best
of 160 way with 1, 2, 3, 4....160 variables to predict or modeling. I would
try.
But here I did not use stepwise, I use backward which method I read
somewhere telling... 阅读全帖
a**j
发帖数: 60
40

Using R:
#dendogram for unsplitted efron2004 data set
install.packages("care")
library("care")
install.packages("rpart")
library("rpart")
install.packages("partykit")
library("partykit")
data(efron2004)
attach(efron2004)
efron2004
efron2004_rpart<-rpart(y~x[,1]+x[,2]+x[,3]+x[,4]+x[,5]+x[,6]+x[,7]+x[,8]+x[,
9]+x[,10], data=efron2004, control=rpart.control(numsplit=10))
plot(as.party(efron2004_rpart),main = "Dendogram of Y~ Xis for Efron data
sets", font.main = 4)
#dendograms for splitted efron20... 阅读全帖
t*****w
发帖数: 254
41
来自主题: Statistics版 - 请问面试 R 应该怎么准备?
When I had my job interview, they always tested my SAS skill.However I use R
all the time. To help your preparation, read my R codes to see how much you
can understand it.
%in%
?keyword
a<-matrix(0,nrow=3,ncol=3,byrow=T)
a1 <- a1/(t(a1)%*%spooled%*%a1)^.5 #standadization in discrim
a1<- a>=2; a[a1]
abline(h = -1:5, v = -2:3, col = "lightgray", lty=3)
abline(h=0, v=0, col = "gray60")
abs(r2[i])>r0
aggregate(iris[,1:4], list(iris$Species), mean)
AND: &; OR: |; NOT: !
anova(lm(data1[,3]~data1[,1... 阅读全帖
w*******9
发帖数: 1433
42
来自主题: Statistics版 - 怎么用SAS transpose这两dataset呀?
1-->2: 对transpose不熟,有一个很笨的方法,取俩subset再append。要求按原来的顺
序的话,给原数据加一列标示obs id.
2-->1: 先output成俩subset,再用sql语句merge。
e***w
发帖数: 1160
43
来自主题: Statistics版 - 请教一个关与SAS data的问题
Here is the true result I have.
Data A, 15672 records, 63 variables, file size 6465K
Subset data A1: 5000 records, 63 variables, file size 40017K
Subset data A2: 10672 records, 63 variables, file size 85393K
w*****m
发帖数: 51
44
有一个subset drawn from a population.
怎样知道how representative of this subset comparing to the population?
比方说我们的目的是研究经常在网上购物的人群的分布(性别,年龄,。。。)
我们收集的数据只有1万条数据(一个user一条数据)
怎么说明这个收集的子集是否有代表性呢?有哪些统计方法可以用?
k*******a
发帖数: 772
45
来自主题: Statistics版 - R 扫描matrix
## function to find sums of all subset matrix: k_row * k_col
move_sum <- function(x, k) diff(c(0, cumsum(x)), k)
move_sum_mat <- function(mat, k_row, k_col) {
new_mat <- t(apply(mat, 1, move_sum, k = k_col))
new_mat <- apply(new_mat, 2, move_sum, k = k_row)
new_mat
}
## test matrix
mat <- matrix(sample(1:10, 100, replace=T), nrow=10, ncol=10)
## get sums for all subset matrix with 4 rows and 9 cols
mymat <- move_sum_mat(mat, 4, 9)
## find starting row and col for max sub matrix
which(mym... 阅读全帖
d******4
发帖数: 132
46
Here we need to tell if A is a subset of B, not A is similar to B.
If B outnumbers A with a large margin, then Jaccard would be very small even
if A is a true subset of B.
How about modifying Jaccard a little bit such as
size(A intersection B)/size(A)?
g****e
发帖数: 1829
47
你的scorecard是用tree么?我没做过,所以具体不是很清楚。知道high level一点具
体方法的话我还能给你一些其他的建议。比如regularization。
最最基本的,你可以用Sample1,2做best subset的k fold,或者repeated k fold。选
变量在第一组,测试在第二组。你说的不能用sample1,2选是你的理解有误。他应该说
的是不能用合起来的sample做univariate test,然后把自变量pool定死。
现实中,如果几百个变量就没法做best subset了,可能要用一些偷工减料的办法比如
forward step。具体还是要看方法。很多减小variance的办法都要落实到具体方法。
g****u
发帖数: 25
48
来自主题: DataSciences版 - 技术电面感受&请教
骑驴找马今天面了一个感觉牛逼哄哄的startup, 已挂,满满的挫败感。最大的感受是
平时做项目的时候machine learning的算法都是抓来用用,那个好用用哪个,但对算法
的理解都很肤浅,没有深入思考过。大牛们看看下面几个问题怎么回答才好?
1)第一个问题是我有一个项目用mape来evaluate模型,面试官问我为什么不能用rmse
,我说mape是客户要求的,所以我就没多想,我实在不好意思讲我当时瞎扯了点啥,
太打脸了
2)面试官问我favorite的算法是什么,我说没有,平时logistic regression和random
forest用的比较多,然后面试官就问我什么时候用lg什么时候用rf。虽然我理论上知
道一点各自的特点比如lg对outlier比较敏感,模型interpretability比较高, rf是non
-parametric的所以对outlier不敏感,不用担心colinearity的问题之类的,但用到具
体问题的时候,我就说不上来了,因为其实我是抓来都用用,如果两个算法
performance差的比较大,我就直接用好的那个。我临时就凭感觉说如果featu... 阅读全帖
A*********0
发帖数: 144
49
来自主题: DataSciences版 - 请教几个问题
What i the mean of "temp" when "month" is equal to 6?
What was the maximum ozone value in the month of May(i.e.Month is equal to 5
)?
Extract the subset of rows of the data frame where Ozone values are above 31
and Temp values are above 90. What is the mean of Solar. R in this subset?
What is the value of Ozone in the 47th row?
How many missing values are in the Ozone column of this data frame?
What is the mean of the Ozone column in this dataset? Exclude missing values
(code ad NA) from this ca... 阅读全帖
z*******n
发帖数: 1034
50
来自主题: MobileDevelopment版 - The State of Go Language for Android Native Development
The State of Go Language for Android Native Development
by Sergio De Simone on Jun 27, 2014 | Discuss
A couple of different proposals from Google engineers and independent
developers are aiming at making it possible to write native Android apps in
Go. This will not go so far as allowing Go apps to access the full Android
NDK, but it could allow access to a subset of it.
According to David Crawshaw, engineer at Google and author of a proposal to
add partial Go support for Android apps, "providing... 阅读全帖
首页 上页 1 2 3 4 5 6 7 8 9 10 下页 末页 (共10页)