z********o 发帖数: 428 | 1 老板让用RNA-Seq 的data。我是转行到Bioinformatics的,生物小白。有没有人能给推
荐一下该看哪些资料,用哪个算法?如何入手,入门?
谢谢大家 |
c****y 发帖数: 373 | |
t*d 发帖数: 1290 | 3 The fastest way to learn it is to practice Trapnell's pipeline, and get
familiar with all the results.
【在 z********o 的大作中提到】 : 老板让用RNA-Seq 的data。我是转行到Bioinformatics的,生物小白。有没有人能给推 : 荐一下该看哪些资料,用哪个算法?如何入手,入门? : 谢谢大家
|
g******a 发帖数: 51 | 4 Standford 的 SFG:
The Simple Fool’s Guide to Population Genomics via RNA-Seq
http://sfg.stanford.edu/ |
z********o 发帖数: 428 | |
h****n 发帖数: 2552 | |
j****x 发帖数: 1704 | 7 往往需要根据具体的实验设置(depth,replicate)来选择。一般而言,首选Limma,
其次DESeq。多用几种方法然后比较一下也没有坏处
【在 h****n 的大作中提到】 : cufflink和DESEQ到底哪个更好?
|
z********o 发帖数: 428 | 8 谁能给解释一下:depth 和 library, library size 这几个概念?在网上浏览了一天
,也没有搞清楚
谢谢。 |
c********e 发帖数: 598 | 9
Why limma is the first choice?
【在 j****x 的大作中提到】 : 往往需要根据具体的实验设置(depth,replicate)来选择。一般而言,首选Limma, : 其次DESeq。多用几种方法然后比较一下也没有坏处
|
a**r 发帖数: 352 | 10 limma 历史悠久,老牌软件了。 速度快, 表现也很好
deseq的话,根据这篇文章,表现还是比较一般。bayseq看起来要好一些
http://genomebiology.com/2013/14/9/R95 |
|
|
a******k 发帖数: 1190 | 11 I have to say, I have very bad experience with Trapnell's pipeline.
Tophat is complained to be very slow. If you have only tens of millions of
reads, you can go with Tophat, but with a huge dataset like me, it runs
forever (I admit that I have a billion reads). STAR is recommend by a lot of
people and I also found it is very good. It finishes the job in one hour on
our super cluster.
Cufflink is also very slow. For my dataset, it has been running for one week
(16 parallel jobs), and likely another several weeks or forever. It has
some bugs or maybe more fair to say very bad taste on default parameter
settings (you can tweak it if you are getting familiar with it). For example
, it uses an extrapolate algorithm that results in very bad estimation of
abundance of short transcripts (this is complained everywhere). For another
example, it only builds transcript models for a tiny fraction of my data in
testing run. The reason is, it make a guess there is a complex gene model in
one region which is longer than one of its default parameters, then it skip
the whole bundle, i.e., reads falls into that region. So I get warning
messages like this:
Warning! Skip large bundles [chr1: 10000-20000000]. That's almost the entire
chromosome. WTF!
You can adjust the parameters, but it takes a lot time to figure out, and
remember, it is f*** slow. Right now, I have to stick with Cufflink, because
it is kind of the only software that declares to reconstruct transcripts
with reference transcriptome.
【在 t*d 的大作中提到】 : The fastest way to learn it is to practice Trapnell's pipeline, and get : familiar with all the results.
|
h****n 发帖数: 2552 | 12 难道cufflink就差到没人愿意提了?
【在 a**r 的大作中提到】 : limma 历史悠久,老牌软件了。 速度快, 表现也很好 : deseq的话,根据这篇文章,表现还是比较一般。bayseq看起来要好一些 : http://genomebiology.com/2013/14/9/R95
|
l**********1 发帖数: 5204 | 13 Pls check,
i) Trapnell C et al., (2012).
Differential gene and transcript expression analysis of RNA-seq experiments
with TopHat and Cufflinks.
Nat Protoc 7: 562–578.
ii) Li H et al., (2009).
The sequence alignment/Map format and SAMtools.
Bioinformatics 25: 2078–2079.
plus
Weikard R et al., (2013).
Identification of novel transcripts and noncoding RNAs in bovine skin by
deep next generation sequencing.
BMC Genomics. 14: 789. [Epub ahead of print]
>http://www.ncbi.nlm.nih.gov/pubmed/24225384
cited,
>Reannotation, mapping and bioinformatic data analysis
>Read alignment to the reference genome was performed
using the Bowtie/ TopHat/ Cufflinks/ Cuffmerge pipeline
[44]. A filtering step using SAMtools and Linux commands
[45] was performed to eliminate those reads showing
more than two mismatches to the reference genome
and reads with multiple mapping hits. A guided transcript
assembly using the bovine reference genome assembly
UMD3.1 (ftp://ftp.ncbi.nlm.nih.gov/genomes/Bos_taurus/,
downloaded 28/02/2012) on top of the Ensembl reference
annotation, release 66, (ftp://ftp.ensembl.org/pub/release-
66/gtf/bos_taurus/, downloaded 28/02/2012) was carried
out for each sample file separately. This strategy considered
the reference genome annotation and additionally,
allowed inclusion of sequence reads mapping to chromosome
regions or transcription units not yet annotated in
the underlying reference transcript assembly. The separate
analysis of the individual transcript assembly for each
sample enabled the identification of potential differently
spliced transcripts of pigmented and nonpigmented phenotypes.
Thus, the generated final transcriptome assembly
comprising transcripts from both phenotypes will provide
novel transcripts, genes and isoforms in addition to the
reannotated known reference loci.
Finally, the resulting individual transcript assemblies
were merged to form a single transcript assembly using
the Cuffmerge option. The merged transcript assembly
(final GTF file) was applied for locus and transcript quantification
using Cuffdiff v1.3. The final dataset represents
the joint transcriptome of pigmented and nonpigmented
skin samples including all transcripts (annotated and nonannotated)
that contain at least one exon and reveal expression
either in pigmented or nonpigmented skin
samples. A further filtering step was included to eliminate
transcripts having a very low expression level. All transcripts
which had a lower bound of zero for the 95% confidence
interval on the FPKM (fragments per kb for a
million reads) of the object were excluded from the dataset.
Transcript and locus assemblies were visualised by inspection
of the BAM files of the samples and the final
annotation with the IGV viewer [46].
original post,
http://www.mitbbs.com/article_t/Biology/31863359.html
2nd flor
【在 z********o 的大作中提到】 : 谁能给解释一下:depth 和 library, library size 这几个概念?在网上浏览了一天 : ,也没有搞清楚 : 谢谢。
|
t**********y 发帖数: 374 | 14 limma was originally designed for array.
【在 a**r 的大作中提到】 : limma 历史悠久,老牌软件了。 速度快, 表现也很好 : deseq的话,根据这篇文章,表现还是比较一般。bayseq看起来要好一些 : http://genomebiology.com/2013/14/9/R95
|
B*********r 发帖数: 19 | 15 what do you mean by "reconstruct transcripts with reference transcriptome"?
【在 a******k 的大作中提到】 : I have to say, I have very bad experience with Trapnell's pipeline. : Tophat is complained to be very slow. If you have only tens of millions of : reads, you can go with Tophat, but with a huge dataset like me, it runs : forever (I admit that I have a billion reads). STAR is recommend by a lot of : people and I also found it is very good. It finishes the job in one hour on : our super cluster. : Cufflink is also very slow. For my dataset, it has been running for one week : (16 parallel jobs), and likely another several weeks or forever. It has : some bugs or maybe more fair to say very bad taste on default parameter : settings (you can tweak it if you are getting familiar with it). For example
|
x***u 发帖数: 297 | 16 Tophat is slow but result is OK. Cufflinks have been reported to have issues
Trinity + PASA 可以做transcripts reconstruct:
http://pasa.sourceforge.net/
it is kind of the only software that declares to reconstruct transcripts
with reference transcriptome.
【在 a******k 的大作中提到】 : I have to say, I have very bad experience with Trapnell's pipeline. : Tophat is complained to be very slow. If you have only tens of millions of : reads, you can go with Tophat, but with a huge dataset like me, it runs : forever (I admit that I have a billion reads). STAR is recommend by a lot of : people and I also found it is very good. It finishes the job in one hour on : our super cluster. : Cufflink is also very slow. For my dataset, it has been running for one week : (16 parallel jobs), and likely another several weeks or forever. It has : some bugs or maybe more fair to say very bad taste on default parameter : settings (you can tweak it if you are getting familiar with it). For example
|
a******k 发帖数: 1190 | 17 My understanding is that it will first try to reconcile aligned reads/
fragments to known transcripts in a reference transcriptome (user input).
Then for those could not find supporting annotations, it tries to construct
novel transcripts.
【在 B*********r 的大作中提到】 : what do you mean by "reconstruct transcripts with reference transcriptome"?
|
a******k 发帖数: 1190 | 18 That's approximately the same conclusion I got.
Could you please be a little more specific about the problem Cufflinks has?
As far as I can tell, most of the problems are because of its bad default
parameter settings. I am not quite sure whether it is fundamentally flawed.
Strange it tries to make decisions for biologists by doing algorithmic
adjustments to the results outputted. But I believe most researchers would
like to see direct results of raw data.
There are many software/tools can do transcripts reconstruct. But I am
tempted by its idea of using knowledge of existing transcriptome as guidance
in the process of transcripts reconstruction.
issues
【在 x***u 的大作中提到】 : Tophat is slow but result is OK. Cufflinks have been reported to have issues : Trinity + PASA 可以做transcripts reconstruct: : http://pasa.sourceforge.net/ : : it is kind of the only software that declares to reconstruct transcripts : with reference transcriptome.
|
j****x 发帖数: 1704 | 19 这和annotated genome guarded transcript reconstruction有什么本质的区别吗?
construct
【在 a******k 的大作中提到】 : My understanding is that it will first try to reconcile aligned reads/ : fragments to known transcripts in a reference transcriptome (user input). : Then for those could not find supporting annotations, it tries to construct : novel transcripts.
|
l**********1 发帖数: 5204 | 20 Now we have GIIRA
pls check,
HTTPS double dot //sourceforge.net/projects/giira/
GIIRA – RNA-Seq Driven Gene Finding Incorporating Ambiguous Reads
Posted on October 16, 2013 By RNA-Seq Blog Administrator
Reply
The reliable identification of genes is a major challenge in genome research
since further analysis depends on the correctness of this initial step.
With high-throughput RNA-Seq data reflecting currently expressed genes, a
particularly meaningful source of information has become commonly available
for gene finding. However, practical application in automated gene
identification is still not the standard case. A particular challenge in
including RNA-Seq data is the difficult handling of ambiguously mapped reads.
Researchers at the Robert Koch-Institute, Germanyhave developed GIIRA (Gene
Identification Incorporating RNA-Seq data and Ambiguous reads), a novel
prokaryotic and eukaryotic gene finder that is exclusively based on a RNA-
Seq mapping and inherently includes ambiguously mapped reads. GIIRA extracts
candidate regions supported by a sufficient number of mappings and
reassigns ambiguous reads to their most likely origin using a maximum-flow
approach. This avoids the exclusion of genes that are predominantly
supported by ambiguous mappings. Evaluation on simulated and real data and
comparison with existing methods incorporating RNA-Seq information highlight
the accuracy of GIIRA in identifying the expressed genes.
AVAILABILITY: GIIRA is implemented in Java and is available from above HTTPS
link..
CONTACT: renardB'@'rki.de.
GIIRA – RNA-Seq Driven Gene Finding Incorporating Ambiguous Reads is a post
from: RNA-Seq Blog
HTTP double dot//www.informaticsblogs.com/author/rna-seq-blog-administrator/
or
Zickmann F et al., (2013).
GIIRA--RNA-Seq driven gene finding incorporating ambiguous reads.
Bioinformatics. Oct 27. [Epub ahead of print]
http://www.ncbi.nlm.nih.gov/pubmed/24123675
or PPTX slide link: (now converted to PDF format)
HTTP double dot//mendel.informatics.indiana.edu/~yye/lab/teaching/get.php?
course=fall2013-I519&name=GIIRA_paper3.pdf
or below attached figure
?
.
guidance
【在 a******k 的大作中提到】 : That's approximately the same conclusion I got. : Could you please be a little more specific about the problem Cufflinks has? : As far as I can tell, most of the problems are because of its bad default : parameter settings. I am not quite sure whether it is fundamentally flawed. : Strange it tries to make decisions for biologists by doing algorithmic : adjustments to the results outputted. But I believe most researchers would : like to see direct results of raw data. : There are many software/tools can do transcripts reconstruct. But I am : tempted by its idea of using knowledge of existing transcriptome as guidance : in the process of transcripts reconstruction.
|
|
|
l**********1 发帖数: 5204 | 21 sure,
plus Trans-ABySS, Oases or Scripture etc..
pls refer one review,
Martin JA et al., (2011).
Next-generation transcriptome assembly.
Nat Rev Genet. 12: 671-82.
and its Table 2
also 'SOAPdenovo'
noted on another review,
Yandell M et al., (2012).
A beginner's guide to eukaryotic genome annotation.
Nat Rev Genet. 13: 329-42.
http://www.ncbi.nlm.nih.gov/pubmed/22510764
or PDF link,
HTTP double dot//www.yandell-lab.org/publications/pdf/euk_genome_annotation_
review.pdf
issues
【在 x***u 的大作中提到】 : Tophat is slow but result is OK. Cufflinks have been reported to have issues : Trinity + PASA 可以做transcripts reconstruct: : http://pasa.sourceforge.net/ : : it is kind of the only software that declares to reconstruct transcripts : with reference transcriptome.
|
t***q 发帖数: 65 | 22 用CLC吧!现在也就卖5000刀,这都基本是白菜价了。
忽悠老板买下来,以后就轻松了,呵呵,当然别让老板知道CLC超级强大的功能 |
a******k 发帖数: 1190 | 23 Sounds the same idea. What else software can you recommend besides Cufflink?
【在 j****x 的大作中提到】 : 这和annotated genome guarded transcript reconstruction有什么本质的区别吗? : : construct
|
a******k 发帖数: 1190 | 24 have you ever read the posts carefully?
【在 l**********1 的大作中提到】 : sure, : plus Trans-ABySS, Oases or Scripture etc.. : pls refer one review, : Martin JA et al., (2011). : Next-generation transcriptome assembly. : Nat Rev Genet. 12: 671-82. : and its Table 2 : also 'SOAPdenovo' : noted on another review, : Yandell M et al., (2012).
|
j****x 发帖数: 1704 | 25 http://www.ncbi.nlm.nih.gov/pubmed/24185837
Cufflink?
【在 a******k 的大作中提到】 : Sounds the same idea. What else software can you recommend besides Cufflink?
|
a******k 发帖数: 1190 | 26 I've seen that paper. My impression is that Cufflink remains the only one
that can do reference-based transcript assembly. Please correct me if I am
wrong.
Some tools like Augustus are originally designed to identify transcript(gene
models) without RNA-seq data. They now can use RNA-seq data, but
interesting is that the performance does not increase a lot with more
information. Most of the other tools do de-novo transcript reconstruction.
SLIDE seems another unique tool that do transcript abundance estimation with
reference transcriptome (i.e., no de novo finding). I only tried it briefly
so not quite sure.
【在 j****x 的大作中提到】 : http://www.ncbi.nlm.nih.gov/pubmed/24185837 : : Cufflink?
|
l**********1 发帖数: 5204 | 27 RE:
>is that Cufflink remains the only one
Yes, if by overlap graph mode, Cufflinks is the only one,
but now we have 'Traph' which by splicing graph mode,
it was used minimum-cost network flows princeple..
details pls check,
Tomescu AI et al., (2013).
A Novel Combinatorial Method for Estimating
Transcript Expression with RNA-Seq:
Bounding the Number of Paths
Abstract. RNA-Seq technology oers new high-throughput ways for transcript
identication and quantication based on short reads, and has recently
attracted great interest. The problem is usually modeled by a weighted
splicing graph whose nodes stand for exons and whose edges
stand for split alignments to the exons.
ignored
In order to obtain a practical tool,
we implement three optimizations and heuristics, which achieve better
performance on real data, and similar or better performance on simulated
data, than state-of-the-art tools Cuinks, IsoLasso and SLIDE. Our tool,
called Traph, is available at http://www.cs.helsinki.fi/gsa/traph/
PDF link,
>HHTP double dot//arxiv.org/pdf/1307.7811
or from
http://www.cs.helsinki.fi/en/gsa/traph/
there is one slide PPTX file which posted on upper floor:
gene
with
briefly |
l**********1 发帖数: 5204 | 28 Sorry,
now pls check this one,
one historic review about almost soft what used in RNA-seq de novo assembly
task..
its slide PPTX file link, (n.b. already converted to PDF format)
HTTP double dot//www.cs.helsinki.fi/u/tomescu/traph/TKRM-HITSEQ.pdf
or attached Table here,
>
发信人: aablackk (black), 信区: Biology
标 题: Re: 如何处理RNA-Seq
发信站: BBS 未名空间站 (Sun Jan 5 02:03:36 2014, 美东)
omitted
Most of the other tools do de-novo transcript reconstruction. SLIDE seems
another unique tool that do transcript abundance estimation with
reference transcriptome (i.e., no de novo finding). I only tried it briefly
so not quite sure.
>>
【在 a******k 的大作中提到】 : have you ever read the posts carefully?
|
l**********1 发帖数: 5204 | 29 2nd slide attched here,
from under floor that PPTX file link..
what
【在 l**********1 的大作中提到】 : RE: : >is that Cufflink remains the only one : Yes, if by overlap graph mode, Cufflinks is the only one, : but now we have 'Traph' which by splicing graph mode, : it was used minimum-cost network flows princeple.. : details pls check, : Tomescu AI et al., (2013). : A Novel Combinatorial Method for Estimating : Transcript Expression with RNA-Seq: : Bounding the Number of Paths
|
l**********1 发帖数: 5204 | 30 RNA-seq SGS (Next Generation Sequencing 2.0) might be already older protocol
, (n.b. likes celluar Phone 3G service)
pls refer TGS(Next Generation Sequencing 3.0)and RNA-seq SGS hybrid protocol
new paper (n.b. likes cell phone 3.5 G service )
Au KF at al., (2013).
Characterization of the human ESC transcriptome by hybrid sequencing.
Proc Natl Acad Sci U S A. 110: E4821-30.
http://www.ncbi.nlm.nih.gov/pubmed/24282307
【在 z********o 的大作中提到】 : 老板让用RNA-Seq 的data。我是转行到Bioinformatics的,生物小白。有没有人能给推 : 荐一下该看哪些资料,用哪个算法?如何入手,入门? : 谢谢大家
|
|
|
j****x 发帖数: 1704 | 31 与cufflinks类似的genome-guided assembly自然还有Scripture,后者速度更快但是对
低丰度的转录本似乎效果不佳。最近有朋友推荐过RNA-eXpress,不过我还没有试过。
另外,前面有帖子提到CLC Genomics,如果不差钱,确实是好选择。
此外,DRUT和RABT是值得考虑的辅助工具,对于你的需求可能有帮助。
BTW,SLIDE不了解,但是作者在这行里应该还算pp了,呵呵
gene
with
briefly
【在 a******k 的大作中提到】 : I've seen that paper. My impression is that Cufflink remains the only one : that can do reference-based transcript assembly. Please correct me if I am : wrong. : Some tools like Augustus are originally designed to identify transcript(gene : models) without RNA-seq data. They now can use RNA-seq data, but : interesting is that the performance does not increase a lot with more : information. Most of the other tools do de-novo transcript reconstruction. : SLIDE seems another unique tool that do transcript abundance estimation with : reference transcriptome (i.e., no de novo finding). I only tried it briefly : so not quite sure.
|
a******k 发帖数: 1190 | 32 genome-guided assembly的很多,我说的是transcriptome-guided
呵呵,SLIDE作者好像AP了
【在 j****x 的大作中提到】 : 与cufflinks类似的genome-guided assembly自然还有Scripture,后者速度更快但是对 : 低丰度的转录本似乎效果不佳。最近有朋友推荐过RNA-eXpress,不过我还没有试过。 : 另外,前面有帖子提到CLC Genomics,如果不差钱,确实是好选择。 : 此外,DRUT和RABT是值得考虑的辅助工具,对于你的需求可能有帮助。 : BTW,SLIDE不了解,但是作者在这行里应该还算pp了,呵呵 : : gene : with : briefly
|
D*a 发帖数: 6830 | 33 明天开课
University of Toronto
Bioinformatic Methods I
https://www.coursera.org/course/bioinfomethods1 |
l**********1 发帖数: 5204 | 34 中文论坛问不出答案的话
去Google Group Tophat 英文论坛问下 如何?
HTTPS double dot//groups.google.com/forum/#!topic/tuxedo-tools-users/
HQkjCNXx2-Y
HTTPS //groups.google.com/forum/#!forum/tuxedo-tools-users
from
http://tophat.cbcb.umd.edu/igenomes.shtml
【在 a******k 的大作中提到】 : genome-guided assembly的很多,我说的是transcriptome-guided : 呵呵,SLIDE作者好像AP了
|