第10页 - 关于pipelined的讨论汇总 - 话题女王

全部话题 - 话题: pipelined

s******n
发帖数: 876

instruction pipelining can increase throughput, however it's been done to
the best humanly possible, and pipeline will stall very frequently,
therefore latency is a significant factor in overall throughput. At the
scale of centimeters, you can't reduce latency to better than 10GHz due to
the limit of speed of light.
Of course CPU performance isn't the bottleneck anyway, memory access is the
deciding factor today. Replace 3GHz CPU with a 30GHz one, the computer won't
appear to be running any fast

g***b
发帖数: 16

来自主题: CS版 - PHP Web Developer Co-op Opportunity

PHP Web Developer Co-op Opportunity
Job Description:
The co-op will work to extend the function of the proteomics pipeline and
build a protein annotation and function database:
• Develop web-based tools supporting the analysis of large proteomics
data sets.
• Focus on the coding, development and the maintenance of back-end
functionality.
• Design and develop pipelines to automate established analysis so
that same type data can be analyzed in a robust manner.
This is a full-time... 阅读全帖

g***b
发帖数: 16

来自主题: CS版 - PHP Web Developer Co-op Opportunity

https://xjobs.brassring.com/TGWebHost/jobdetails.aspx?jobId=707209&partnerid
=169&siteid=5140&codes=JINDED
PHP Web Developer Co-op Opportunity
Job Description:
The co-op will work to extend the function of the proteomics pipeline and
build a protein annotation and function database:
• Develop web-based tools supporting the analysis of large proteomics
data sets.
• Focus on the coding, development and the maintenance of back-end
functionality.
• Design and develop pipelines to a... 阅读全帖

B*****g
发帖数: 34098

来自主题: Database版 - Statistician/Data Analyst Postion in Santa Monica (转载)

【以下文字转载自 JobHunting 讨论区】
发信人: vivianxia (redapple), 信区: JobHunting
标题: Statistician/Data Analyst Postion in Santa Monica
发信站: BBS 未名空间站 (Fri Aug 19 20:55:38 2011, 美东)
Data Analyst
Location: Santa Monica, CA
Who Are We Looking For? Right now, we’re looking for a Data Analyst who
will work on data management, data analytics, data pipeline implementation,
quality control, and special adhoc projects. You will be utilizing the
latest technologies to solve challenging problems, create innovative... 阅读全帖

b****u
发帖数: 1130

来自主题: Database版 - 一个关于big data 系统架构的设计问题

我最近也做了一个类似系统。其实取决于你们最后要用这些数据干什么，性能的要求（
是否要实时）。
很多情况下，需要一个datawarehouse，同时你需要建议个pipeline把数据拷备，同步
等。我用Scala和spark，因为spark有dataframe，处理转化数据非常方便。我们用
redshift做数据仓库，性能不错。
同时还需要一些visualization的软件，如tableau，looker， etc。有了数据仓库，
做一些更高级的分析也会很方便，python，R，Scala，都不错。
特别说明一下，现在已有很多商业化的datapipeline，但同步数据一半都要数据源的
log文件来做CDC （capture data change）.我们最后自己做pipeline，其实更放心。

m*******9
发帖数: 103

来自主题: Database版 - FB招聘data/BI engineer contractor

组里急需DE contractor，收到几十份agency提交的简历，清一色的烙印。来这里看看
有没有朋友感兴趣？学校不需要很好（任何一个美国学校都秒杀绝大多数印度学校），
专业无需CS，但要有几年相关经验。由于工作强度大，一定要有兴趣。very
competitive compensation. Core skills: SQL and Python. 有意或有问题发邮件到
： [email protected]/* */
*******
Build data expertise and own data quality for the awesome pipelines you
build
Architect, build and launch new data models that provide intuitive analytics
to your customers
Design, build and launch extremely efficient & reliable data pipelines to
move data (both large ... 阅读全帖

L***s
发帖数: 1148

来自主题: Hardware版 - Tips on Synology DSM disk group repair/extend

这几天给8盘位的Synology NAS做了大规模结构变更和数据转移，有一些经验教训（网
上搜不到相关帖子），简单地小结一下。一句话就是，RAID repair/extend是最耗时的
，而且在DSM的限制下只能顺序做而不能并行，为了节省时间、pipeline你的操作，最
好把repair/extend放在最后做。如果顺序不对，三天就可以完成的操作可能要等一周
以上。
具体来说（尼玛为什么找不到文档和讨论），就是，如果某一个 disk group 在repair
/extend的时候，下述操作是允许的：
0. Perform data scrubbing on another disk group
但下述操作均不被DSM允许，必须要等当前disk group repair完：
1. 终止repair/extend该disk group——唯一的终止办法是强行拔出该disk group里所
有硬盘，crash掉重来。
2. Repair/extend其他disk group——必须一个disk group一个disk group地顺序
repair，无法同时做。
3. Create or... 阅读全帖

B********s
发帖数: 3610

来自主题: Programming版 - Linux shell编程的问题

示意性代码如下：
parse() 函数返回一个pipeline p,包括多个command. execPipe()执行pipeline中的命
令。
static void execPipe(Pipe p)
{
int fds[2];
pid_t pid,parent;

if ((pid = fork()) <0)
exit(-1);
else if (pid == 0) { //child
for (all but the last command in this pipe) {

else if ((parent = fork()) == -1)
exit(-1);
else if (parent) { //parent process
dup2(fds[1],STDOUT_FILENO);
close(fds[0]);

n**x
发帖数: 30

来自主题: Programming版 - 汇编的mov指令

好像是因为就一个数据总线吧，一个指令在一个pipeline的stage里只能一次内存操作,
如果2个内存操作那么pipeline的，不管是inorder还是outorder都得加一个stage。

O*******d
发帖数: 20343

来自主题: Programming版 - 初级问题:StencilBuffer管理是屏幕坐标而不是ModelCoordinate吗?

The following graph shows the pipeline of image rendering in OpenGL.
Stencil buffer is used at Per-fragment operations which is the last step in
the pipeline. When data reached Per-fragment operation, coordinate has been
resolve to screen pixels.
http://www.cs.kent.edu/~farrell/graphics/opengl/openglprog1.2/opengl/render.html

c***d
发帖数: 996

来自主题: Programming版 - 这两天大家都在谈的即刻搜索的事

二十亿的市场价格是十万台server的capex，我不知道他们说几千台的用意是什么。
而且搜索的设备投资主要是crawl和index， serving本来就不花什么钱。 pipeline,
infrastructure, 和distribution这三点, infrastructure和distribution算是有进步
。我郁闷的是这么多年pipeline和ranking这个搜索技术的核心真是死气沉沉。search
platform成了纯粹的commodity。
你说即刻作到了google 2004年的水平，说实话我没觉得google的搜索比2004年进步了
多少。正因为技术到了瓶颈，才使靠规模和资金进行垄断成为可能。 google一家独
大的背后，是搜索行业和信息挖掘技术的停滞不前。

index,
click

S*A
发帖数: 7142

来自主题: Programming版 - 最高大上的 atoi

问题是现实中很多语言不是严格 LR(k) 的。如果有不是 LR（K）的扩展就
郁闷了
这个不一定，自动生成的都是间接跳转，这个在 Intel 体系上面会 flush
instruction pipeline。直接写的多数是直接调用，不会 flush instruction
pipeline。 YACC 真的是那末好用 gcc 和 LLVM 的人都是傻子不用？
特别是 gcc 还要放弃 YACC 从新写。人家是非常不情愿但是又是在没办法
了。

x**l
发帖数: 64

来自主题: Programming版 - 请教：属于google不到答案的问题

谢谢回复
测量方式是存在误差，绝对不会产生大于1000个clocks级别的误差，xeon上
instruction pipeline不会超过50级，instruction虽然execution out-of-order, 但
进pipeline(IF)和最后的WB肯定是按序的，所以不会超过50 clocks的误差
当然，为了让结果更准确，可以把rdtsc部分改成这样，这样你应该可以compile了。
asm volatile(
"cpuid;"
"rdtsc;"
"cpuid;"
: "=a" (a), "=d" (d));
即使把代码改成这样，最终结果应该不会有太大变化
而且即使考虑中断或者ISR中的serialize instruction.在两种运行情况下对working()
的影响应该是近似的，而不应该产生1100 vs 2500 clocks的差别。

S*A
发帖数: 7142

来自主题: Programming版 - 请教：属于google不到答案的问题

所以我很迷惑，你似乎知道很多内部细节，但是你又忽律
很多重要的技术细节。你不按照 Intel 建议的方式来测量，
然后抱怨有意想不到的结果。
你现在还是没有论据支持你的1000 cycle 的说法。我对 Intel
CPU 还算比较了解也不敢按照你那样乱推测。我感觉你的
测量就没有可靠性。
就算我退一步，按照你的思路，50个pipeline，先假设
Intel CPU 里面就是这样工作的。你的论据也不成立。
rdtsc 单单一个 instruction 就有 250-300 cycle。
你只要pipelien 里面有 4 个 rdtsc 那样的 instruction,
你的误差就有 1000 cycle 了。 50 个 pipeline 里面
出现 4 个 rdtscp 还是有可能的吧，特别是对空循环
的例子。
你不按照 Intel 建议的方式就没有什么信服力。
而且我的反例也有了。

50。

f*****e
发帖数: 57

来自主题: Programming版 - docker is next big deal

I agree that most of time rolling back is the right choice. But our devs
argues that sometimes they want to roll forward, such as when the database
schemas are changed at the same time.
We are not in production yet, so all the debates are based on imagination.
We plan to do true CI/CD: when a change is merged to the master, a jenkins
pipeline will kick off to bake a new AMI and run tests, and the change will
be deployed to production without any human intervention within an hour.
And the devs... 阅读全帖

z****e
发帖数: 54598

来自主题: Programming版 - One million HTTP RPS w/o load balancing is easy

http://www.techempower.com/blog/2014/03/04/one-million-http-rps
关键字undertow
As we and our collaborators prepare Round 9 of our Framework Benchmarks
project, we had an epiphany:
With high-performance software, a single modern server processes over 1
million HTTP requests per second.
Five months ago, Google talked about load-balancing to achieve 1 million
requests per second. We understand their excitement is about the performance
of their load balancer1. Part of what we do is performance consulti... 阅读全帖

d******e
发帖数: 2265

来自主题: Programming版 - 赵老师讲讲flink

Flink flips this on its head. Whereas Spark is a batch processing framework
that can approximate stream processing, Flink is primarily a stream
processing framework that can look like a batch processor. Immediately you
get the benefit of being able to use the same algorithms in both streaming
and batch modes (exactly as you do in Spark), but you no longer have to turn
to a technology like Apache Storm if you require low-latency responsiveness
. You get all you need in one framework, without the ... 阅读全帖

i*****9
发帖数: 3157

来自主题: Programming版 - 王银看kotlin（本文建议零售价￥15）

服务器端和客户端需求不一样，跑在pipeline里面的批处理过程的异常不处理，整个
pipeline重跑可能半个月时间就打水漂了。
客户端只要没写出死锁来都是重刷页面就解决的事。

t*****n
发帖数: 481

来自主题: Programming版 - Re: 请教板上老司机关于组和以后的发展方向

已快奔四半路出家
就是大家熟知的老帮菜
有2年左右经验主要用java spark scala (仅限用spark的时候)
一个是grubhub, 给 rsu
title: sr data engineer
report 给 lead, 老印, 目测比较nice, 仅限目测
技术栈: s3, emr, spark, hive, c*, elastic search, presto, azkaban
以后会做: 公司全新的data platform 算是infra
蛋疼1: 此三哥在江湖混迹20年你懂的
蛋疼2: 三哥告诉我这个platform 目前大概10个人在做 4个offshore 你懂的 2个
remote 估计我去了除了他办公室最多还有两个人写代码吧
一个是appnexus, 给 option
title: sde ii
report 给 sr sde, 白人小弟, 目测比较nice
技术栈: java, mapreduce, mysql, vertica, protocol buffer
以后会做: video ad reporting 的 pipeline, 主要是... 阅读全帖

d******e
发帖数: 1424

来自主题: Biology版 - Roche Dumps RNAi, Sends Shock Waves Through Alnylam, Tekmira

Interesting discussion on this topic recently:
http://pipeline.corante.com/archives/2010/11/17/roche_has_problems_but_rna_interference_has_more.php
And in 2008: http://pipeline.corante.com/archives/2008/08/21/rnai_bubble_or_not.php

c********e
发帖数: 598

来自主题: Biology版 - Roche Dumps RNAi, Sends Shock Waves Through Alnylam, Tekmira

http://pipeline.corante.com/archives/2010/11/17/roche_has_problems_but_
rna_interference_has_more.php
http://pipeline.corante.com/archives/2008/08/21/rnai_bubble_or_not.php
I found this:
8. non-pharma chemist on November 17, 2010 5:10 PM writes...
I'm not in pharma so I haven't been keeping myself abreast of
developments in this that or the other nucleic acids. However, one
friend of mine formerly worked at one of the companies with the siRNA
technology. Another friend formerly worked at the co... 阅读全帖

c*********r
发帖数: 1312

来自主题: Biology版 - Seeking advice on ChIP-Seq, RNA-Seq and/or miRNA array

我做了四个RNA-seq样品，前两个是mRNA-seq，后两个是自己用invitrogen的Ribo-
Minus除去rRNA，期望能得到long ncRNA。样品是学校的Genomic core facility制备的
，一个大概$400多，测序和分析是在collaborator那里做的，目前没要钱，所以还不知
道他家的价钱。
但学校Genomic core facility的报价是：2x75 GA v5 kits：$2060/lane.
去年七月份在new mexico的一个institute做了一个咨询，价格大概是这样的：
Given:
- 2 libraries (you have already used the illumina SIPE kit to make paired
end library)
- Request for mRNA sequencing (PE)
- Sea urchin. genome size is ~850Mbases http://www.scienceonline.org/cgi/content/full/314/5801/941 so... 阅读全帖

K**********e
发帖数: 188

来自主题: Biology版 - 鲁白谈事业选择：学术界与企业界科学家的区别ZT

华人科学家鲁白谈事业选择：学术界与企业界科学家的区别
来源：许艳军的日志
(根据在上海交通大学医学院讲座录音整理)（ZT自丁香园论坛求职招聘版）
谢谢大家，我要站着讲不坐着讲，和大家有个互动，我想感受下做超级歌星的滋味
。(笑声)大家都
是研究生，研究生都要做科研，大家一定在想，今后要做什么?走上科研的道路, 一般
比较多的是两条
路，一个是学术界，一个是工业界，也就是在大药厂或生物高科技公司做药物的研究开
发。大家对在学
校在研究所做研究有相当的了解，但对在医药工业界如何做研究，怎么做，却不太清楚
。我自己也是长
期以来一直在学术界工作，但现在在药厂从事药物的研究开发，有很多体会。我想和大
家分享一下，在
学术界与企业界做科研，到底有多少差别和类似的地方。学术界和工业界，对学生对科
学家有什么不同
的要求。
一选择适合自己的，自己喜欢的，自己的激情之所在
不管你今后要做什么，有一个基本点，就是怎样选择最适合你的事业。中国的社会
发展越来越快。即
使你今天你觉得以后要做什么，但过10年、20年后再看看，到底有多少人会坚持最初的
选择?社会在进
步，对人的要求在改变，人的兴趣也会改变... 阅读全帖

w******y
发帖数: 8040

来自主题: Biology版 - 问个whole exome capture之后出来的data要怎么分析

最简单的办法：花钱请人分析
硬件和系统维护， pipeline里参数的理解和选择，pipeline的优缺点
对不是专门做这个的都不是件容易的事

j*p
发帖数: 411

来自主题: Biology版 - 简单介绍 Bioinformatics Tools for NGS 分析

1. 我没有用过clc genomics benchwork，故而不知其是否好用。
2. 我用过不少需要licence的software，总体感觉不如free（open source）的
software来得好用，要钱的software flexibility远不如free sotware，故而我对那些
要钱的softwre有偏见，当然，也因为我比较穷。
3. 如果你想上手容易，可以尝试galaxy和genepattern，前者有很多DNA-seq pipeline
，还有非常好的tutorial，后者有很多RNA-seq pipeline。
4. 如果你们单位有licence，可以尝试partek 。

j*p
发帖数: 411

来自主题: Biology版 - 简单介绍 Bioinformatics Tools for NGS 分析

t*d
发帖数: 1290

来自主题: Biology版 - the Real Cost of Sequencing

http://www.genomeweb.com//node/980559?hq_e=el&hq_m=1103092&hq_l
----------------------------------------
A recent study by scientists at Yale University suggests that the actual
cost of sequencing may be much higher than some current estimates indicate
since those figures may not factor in the analysis costs that are necessary
for a successful sequencing project.
In the paper, published in Genome Biology last month, Yale's Mark Gerstein
and colleagues consider costs that weren’t taken into accou... 阅读全帖

n*********4
发帖数: 99

来自主题: Biology版 - bioinformatics postdoc poition($35,000 - $40,000)

BIOINFORMATICS for Next Generation-Sequencing analysis @ ICBR of University
of Florida
RESPONSIBILITIES:
• Provide high level bioinformatics support to molecular biologists
by applying information technology and statistics to genomic and proteomic
projects;
• Evaluate available bioinformatics tools for next generation-
sequencing data and prototype new methods and algorithms to enhance existing
analysis capabilities and pipelines;
• Provide computational support for in... 阅读全帖

f****b
发帖数: 2410

来自主题: Biology版 - bioinformatics postdoc poition($35,000 - $40,000)

BIOINFORMATICS for Next Generation-Sequencing analysis @ ICBR of University of Florida
RESPONSIBILITIES:
• Provide high level bioinformatics support to molecular biologists
by applying information technology and statistics to genomic and proteomic
projects;
• Evaluate available bioinformatics tools for next generation-
sequencing data and prototype new methods and algorithms to enhance existing
analysis capabilities and pipelines;
• Provide computational support for in... 阅读全帖

u*********1
发帖数: 2518

来自主题: Biology版 - 有谁谈谈从零开始学NGS数据分析都需要具备什么知识？

作为一个曾经0基础的菜鸟，我还是蛮有体会的。
想想一年前我连linux里的grep都不晓得是啥。老板说“grep”，我说gre。。啥？greb
吗？老板摇摇头说you really have a lot to learn...不过老板超好，想办法给我把
各种基础的东西讲清楚。。。包括RAM是啥。。汗。。。
做NGS/bioinformatics的，我觉得核心思想还是：如何利用计算机手段解决生物问题。
说起来简单但未必每个人都深刻体会的到。什么python/bash/perl啥啥的，要入门很快
，但也绝对不是什么两个星期就搞定。我现在和python打交道也一年了，但也完全就是
个皮毛，主要是你自己的project决定的。。如果你永远只需要简单的process下你的
text，而且text如果不大比如100MB，你可以永远for line in text。。或者readlines
（），但如果碰到很大的text，就不能readlines（）了因为cluster可能没有那么大的
memory to load the whole text.
所以我觉得就是现学现用，除非你是CS系科班搞计算出身... 阅读全帖

u*********1
发帖数: 2518

来自主题: Biology版 - 大家对NGS的发展如何看？

我觉得要考虑几个方面：
1.如何定义“遗传因素比较大”？想确定到底是familial还是sporadic，肯定要收集
足够的sample吧。。有时候对于有的疾病sample都很难收集。这不仅是人力，精力，
funding的问题，有的疾病病人死的很快，或者sample本身很少，总是很难碰到一个好
的大的pedigree的
2.NGS技术本身我当然是很看好的了，肯定越来越精确成熟，而且越来越便宜。但到底
能有多便宜？什么时候可以很轻松的给每个人做全基因组测序？read length可以达到
长?（肯定是越长越好）计算机的硬件能跟上NGS数据发展的趋势吗？
3.生物信息分析。我只能说现在的bioinformatic pipeline，除了read alignment和
SNP calling变的非常成熟（不仅sensitivity/specificity很高，而且可以做
population-level的分析），在其他方面，要么很艰难，要么很混乱。就是说无法达成
一个统一大家公认的最好的pipeline，我开发一个方法，你开发一个软件，最后把使用
者都搞的糊里糊涂的。比如indel cal... 阅读全帖

z*****7
发帖数: 367

来自主题: Biology版 - 生物phd做pharm equity research 有优势么？

就是好奇，做fundamental时候，对pipeline的理解，phD真是帮助么？还是其实
science 本科就足够了。
而且这些pipeline也是对研究初级市场用处大吧？不知道二级市场是不是也一样。

d***e
发帖数: 1215

来自主题: Biology版 - 大药厂的研发如何？

最近几年好些公司大砍project,存活下来的pipeline质量有很大提高。倒是那些没怎么
砍过project的公司很有可能有很多滥竽充数的pipeline.
FDA不会对买来的药放水,所有的development和clinical trial还不都得老老实实去做.
小公司买来的东西基本上所有的development得从头做起.至少在development这块我可
以告诉你每家大公司都做得差不太远.

b******k
发帖数: 2321

来自主题: Biology版 - 何去何从！

我觉得挺好的至少funding/杂务不用愁。做的东西当然会是pipeline上的，但是如果
没记错的话可以有一定百分比的时间做点自己的东西，如果有NIBR postdoc的话那更可
以做点能发文章的东西。
NIBR我觉得做的东西算是很不单一的了，他家还有人做fly做fish，这个别家药厂很难
想象。。
当然你如果100%的确定要faculty那NIBR也许不是最好的选择。如果有工业界的打算的
话会是很好的选择，NIBR PI应该也或多或少的参与一些pipeline大方向的讨论这种。。
我曾经有个类似的位置去NIBR，我当时一门心思faculty，给拒了，后来很后悔，呵呵
。。。

d*******e
发帖数: 1649

来自主题: Biology版 - NGS数据分析的流程

if you trust the procedure and pipeline, it surely is.
I would suggest using different pipelines to get VCF files and do the
comparison if you can. There were quite a lot difference between methods.
Everybody declared his own method is the best.

s******r
发帖数: 1245

来自主题: Biology版 - NGS数据分析的流程

怎么分析vcf确实是主要的
不过不同的pipeline出来的vcf都不一样
如果之前有建好的pipeline，参数都定好，能省很多事情，现搭的话还是要optimize一
下的

V******t
发帖数: 444

来自主题: Biology版 - 鲁白：事业选择：学术界与企业界科学家的区别 zz

原文链接：http://www.sciencenet.cn/m/user_content.aspx?id=400372
事业选择：学术界与企业界科学家的区别
（根据在上海交通大学医学院讲座录音整理）
谢谢大家，我要站着讲不坐着讲，和大家有个互动，我想感受下做超级歌星的滋味。（
笑声）大家都是研究生，研究生都要做科研，大家一定在想，今后要做什么？走上科研
的道路, 一般比较多的是两条路，一个是学术界，一个是工业界，也就是在大药厂或生
物高科技公司做药物的研究开发。大家对在学校在研究所做研究有相当的了解，但对在
医药工业界如何做研究，怎么做，却不太清楚。我自己也是长期以来一直在学术界工作
，但现在在药厂从事药物的研究开发，有很多体会。我想和大家分享一下，在学术界与
企业界做科研，到底有多少差别和类似的地方。学术界和工业界，对学生对科学家有什
么不同的要求。
不管你今后要做什么，有一个基本点，就是怎样选择最适合你的事业。中国的社会发展
越来越快。即使你今天你觉得以后要做什么，但过10年、20年后再看看，到底有多少人
会坚持最初的选择？社会在进步，对人的要求在改变，人的兴趣也会改变。现在比较好... 阅读全帖

c*********r
发帖数: 1312

来自主题: Biology版 - 有没有谁有鼓捣生物信息服务公司的想法？

很同意楼上aber，genetics123 等几位ID的观点。
我的感觉是，除了常规模式生物的常规测序方法，很多Project都没有统一pipeline来
分析。
这几天在MSU NGS summer course上学到了很多。同样版本的软件同样的参数，跑不同
的物种和样品，结果都会很不一样。都需要根据不同的sample去调试去train软件。
同样的物种，不同的aligner、assembler的结果也会不同，主流的几个软件各有特长，
要有耐心慢慢品才能找到对于特定物种的和样品的比较合适的pipeline。
还有就是随着技术的进步，reads的长短、准确率、测序类型的变化，现有的软件都得
随着这些变化来改进。有的assembler只能处理短reads，有的只能处理长reads。我不
太懂计算底层的东西，reads变长了对运算的影响还是很大的吧。到时候nanopore出来
了，如果谁能第一时间开发出有效的分析软件，那绝对会占有这个市场的一大份额。
这次来参加课程的，做什么系统什么问题的都有。微生物、非典型模式生物、
metagenomics占了好大一部分。虽然不能反应市场里的情况，但是... 阅读全帖

f*******9
发帖数: 74

来自主题: Biology版 - 有没有谁有鼓捣生物信息服务公司的想法？

很多讨论都很好，尤其认同这位观点。生物信息从业人员，可能很多都有开公司的想法
，我来说一下我的考虑吧：
1。用户的认可度问题。怎样说服用户，你的pipeline好，比别人的好。那么你有引用
率很高业界非常认可的软件和文章吗，效率高又准确，像muscle/uclust/usearch...那
种，引用率都好几千，那个作者自己不用出门都有一堆人上门求咨询；或者你在学术界
有一定地位，PI/Professor等，在这个领域有相当的积累，不是一两篇文章的事。还有
，为什么你的收费pipeline就比免费的好。市面上免费的太多了，这年头你在主流刊物
发表生物信息方法型的文章，都需要附程序或源码，有些免费软件还真的维护得很好，文
档清晰丰富，功能强大，使用简单，并不需要用户写什么脚本。
2。技术水平的保持领先问题。开公司这种事情都得考虑的比较长远，除了市场之外，
技术也得有一定的不可替代性。如果你的技术并不复杂，在最初的成功之后，别人也想
做一样的事，那么你能保证你的技术一直领先？机器学习那一块，就是楼主说的
training model那些，国内并不差。实际上国内的数学基础教育很好，有大把的人才... 阅读全帖

o*****h
发帖数: 293

来自主题: Biology版 - Position available - Scientist (Data Curation/Analytics)

Scientist (Data Curation/Analytics)
Omicsoft - Cary, NC
Company Description:
Omicsoft Corporation is the leader in providing enterprise solutions for
next-generation sequencing and Omics data analysis, management, and
visualization. Our flagship product, Array Suite (Array Studio and Array
Server), provides full data management with server-client architecture.
Array Server stores and manages all of the large data for users, while Array
Studio provides the graphical user interface and full visual... 阅读全帖

g**********y
发帖数: 423

来自主题: Biology版 - NGS生物信息工作

我写的一个在cluster上并行运行pipeline的工具：
======================================================

../tools/run_cmd_pbs/run_cmd_pbs.py pipeline_rnaseq_gene_exp.sh -s 1-2 -p 2:
mem=10gb
PBS setting when it starts running the commands:

mem = 16gb

nodes = 1
... 阅读全帖

n****v
发帖数: 17

来自主题: Biology版 - 一个35B的药厂在招Sr Director of Bioinformatics

Linkedin上收到的inmail，发信人是该公司的recruiter。估计是有选择的群发，但
inmail数量也该是比较有限的。
本人暂时无意换工作，但这个工作机会看着不错，所以来广告一下。职务相当senior，
请背景合适且有兴趣的把linkedin profile发我站内，我酌情帮忙connect，剩下就靠
你自己了。恕不一一回信。
I am with the Talent Acquisition Team at XX pharmaceuticals and I would like
to network with you. I am seeking a talented leader (Sr. Director) to
create a Computational Genomics capability at XX which will provide advanced
bioinformatics and genomic analytics support to R&D activities across all
programs and phases of the dru... 阅读全帖

V******t
发帖数: 444

来自主题: Biology版 - 凤凰周刊：十字路口的北生所

FMI是独立做基础研究的和诺华的pipeline沒关系
诺华还有一个novartis biomedical research institute 才是pipeline的
FMI就是诺华制药公司的花瓶

V******t
发帖数: 444

来自主题: Biology版 - 凤凰周刊：十字路口的北生所

FMI是独立做基础研究的和诺华的pipeline沒关系
诺华还有一个novartis biomedical research institute 才是pipeline的
FMI就是诺华制药公司的花瓶

n******7
发帖数: 12463

来自主题: Biology版 - 转行 bioinformatics

我靠，几乎跟你一样的顺序！
最开始是homology modeling, MD (amber) , protein docking
后来是 SVR for gene expression prediction (based on TFBS)
后来换了一个地方，我提了一个microarray处理的方案，用来做co-expression
network
再后来大家都开始NGS了...
中间还做过一些其他的projects
我怀疑你跟我差不多时间开始做的，每个时间点的热点比较有限
我觉得我们这样的，去很多bioinfo公司做pipeline是一点问题没有，但是这个活没啥
上升空间
就是不断的跟进新流行的技术/软件，了解里面的各种细节，然后根据自己的生物知识
和编程技能做出一个最合理的流程，进去的是raw data，出来的是用户需要的各种数据
指标
我在几年前意识到了这个问题，决心做点有技术“含量的东西”
我先是做了一个介乎pipeline和独立工具之间的东西，有意识的运用了一些简单的统计
工具。这个东西是我为我们合作者一种特别的技术专门设计的，没有现成的工具可以做
好这个事情。混了几篇还不错... 阅读全帖

topics