由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Biology版 - 如何处理RNA-Seq
相关主题
市场调查 Bioinformatics RNA-seq preprocessing toolRNA-seq结果分析求助
求科普RNA-sequencing关于RNA-seq的问题
paper help!简单介绍 Bioinformatics Tools for NGS 分析
请教RNA-seq 软件的安装调试的问题!大鼠的RNA-seq应该使用那个reference genome?
RNA-seq map工具RNA seq 数据统计分析问题请教
truth about RNAseq vs MicroarrayRNA-seq 表达量问题
non strand specific RNA-seq数据分析如何检测 long noncoding RNA
请问大家 RNA-Seq assembly 都用啥软件呢?RNA seq分析求教
相关话题的讨论汇总
话题: rna话题: seq话题: transcript话题: reads
进入Biology版参与讨论
1 (共1页)
z********o
发帖数: 428
1
老板让用RNA-Seq 的data。我是转行到Bioinformatics的,生物小白。有没有人能给推
荐一下该看哪些资料,用哪个算法?如何入手,入门?
谢谢大家
c****y
发帖数: 373
t*d
发帖数: 1290
3
The fastest way to learn it is to practice Trapnell's pipeline, and get
familiar with all the results.

【在 z********o 的大作中提到】
: 老板让用RNA-Seq 的data。我是转行到Bioinformatics的,生物小白。有没有人能给推
: 荐一下该看哪些资料,用哪个算法?如何入手,入门?
: 谢谢大家

g******a
发帖数: 51
4
Standford 的 SFG:
The Simple Fool’s Guide to Population Genomics via RNA-Seq
http://sfg.stanford.edu/
z********o
发帖数: 428
5
谢谢大家。
祝新年快乐
h****n
发帖数: 2552
6
cufflink和DESEQ到底哪个更好?
j****x
发帖数: 1704
7
往往需要根据具体的实验设置(depth,replicate)来选择。一般而言,首选Limma,
其次DESeq。多用几种方法然后比较一下也没有坏处

【在 h****n 的大作中提到】
: cufflink和DESEQ到底哪个更好?
z********o
发帖数: 428
8
谁能给解释一下:depth 和 library, library size 这几个概念?在网上浏览了一天
,也没有搞清楚
谢谢。
c********e
发帖数: 598
9

Why limma is the first choice?

【在 j****x 的大作中提到】
: 往往需要根据具体的实验设置(depth,replicate)来选择。一般而言,首选Limma,
: 其次DESeq。多用几种方法然后比较一下也没有坏处

a**r
发帖数: 352
10
limma 历史悠久,老牌软件了。 速度快, 表现也很好
deseq的话,根据这篇文章,表现还是比较一般。bayseq看起来要好一些
http://genomebiology.com/2013/14/9/R95
相关主题
truth about RNAseq vs MicroarrayRNA-seq结果分析求助
non strand specific RNA-seq数据分析关于RNA-seq的问题
请问大家 RNA-Seq assembly 都用啥软件呢?简单介绍 Bioinformatics Tools for NGS 分析
进入Biology版参与讨论
a******k
发帖数: 1190
11
I have to say, I have very bad experience with Trapnell's pipeline.
Tophat is complained to be very slow. If you have only tens of millions of
reads, you can go with Tophat, but with a huge dataset like me, it runs
forever (I admit that I have a billion reads). STAR is recommend by a lot of
people and I also found it is very good. It finishes the job in one hour on
our super cluster.
Cufflink is also very slow. For my dataset, it has been running for one week
(16 parallel jobs), and likely another several weeks or forever. It has
some bugs or maybe more fair to say very bad taste on default parameter
settings (you can tweak it if you are getting familiar with it). For example
, it uses an extrapolate algorithm that results in very bad estimation of
abundance of short transcripts (this is complained everywhere). For another
example, it only builds transcript models for a tiny fraction of my data in
testing run. The reason is, it make a guess there is a complex gene model in
one region which is longer than one of its default parameters, then it skip
the whole bundle, i.e., reads falls into that region. So I get warning
messages like this:
Warning! Skip large bundles [chr1: 10000-20000000]. That's almost the entire
chromosome. WTF!
You can adjust the parameters, but it takes a lot time to figure out, and
remember, it is f*** slow. Right now, I have to stick with Cufflink, because
it is kind of the only software that declares to reconstruct transcripts
with reference transcriptome.

【在 t*d 的大作中提到】
: The fastest way to learn it is to practice Trapnell's pipeline, and get
: familiar with all the results.

h****n
发帖数: 2552
12
难道cufflink就差到没人愿意提了?

【在 a**r 的大作中提到】
: limma 历史悠久,老牌软件了。 速度快, 表现也很好
: deseq的话,根据这篇文章,表现还是比较一般。bayseq看起来要好一些
: http://genomebiology.com/2013/14/9/R95

l**********1
发帖数: 5204
13
Pls check,
i) Trapnell C et al., (2012).
Differential gene and transcript expression analysis of RNA-seq experiments
with TopHat and Cufflinks.
Nat Protoc 7: 562–578.
ii) Li H et al., (2009).
The sequence alignment/Map format and SAMtools.
Bioinformatics 25: 2078–2079.
plus
Weikard R et al., (2013).
Identification of novel transcripts and noncoding RNAs in bovine skin by
deep next generation sequencing.
BMC Genomics. 14: 789. [Epub ahead of print]
>http://www.ncbi.nlm.nih.gov/pubmed/24225384
cited,
>Reannotation, mapping and bioinformatic data analysis
>Read alignment to the reference genome was performed
using the Bowtie/ TopHat/ Cufflinks/ Cuffmerge pipeline
[44]. A filtering step using SAMtools and Linux commands
[45] was performed to eliminate those reads showing
more than two mismatches to the reference genome
and reads with multiple mapping hits. A guided transcript
assembly using the bovine reference genome assembly
UMD3.1 (ftp://ftp.ncbi.nlm.nih.gov/genomes/Bos_taurus/,
downloaded 28/02/2012) on top of the Ensembl reference
annotation, release 66, (ftp://ftp.ensembl.org/pub/release-
66/gtf/bos_taurus/, downloaded 28/02/2012) was carried
out for each sample file separately. This strategy considered
the reference genome annotation and additionally,
allowed inclusion of sequence reads mapping to chromosome
regions or transcription units not yet annotated in
the underlying reference transcript assembly. The separate
analysis of the individual transcript assembly for each
sample enabled the identification of potential differently
spliced transcripts of pigmented and nonpigmented phenotypes.
Thus, the generated final transcriptome assembly
comprising transcripts from both phenotypes will provide
novel transcripts, genes and isoforms in addition to the
reannotated known reference loci.
Finally, the resulting individual transcript assemblies
were merged to form a single transcript assembly using
the Cuffmerge option. The merged transcript assembly
(final GTF file) was applied for locus and transcript quantification
using Cuffdiff v1.3. The final dataset represents
the joint transcriptome of pigmented and nonpigmented
skin samples including all transcripts (annotated and nonannotated)
that contain at least one exon and reveal expression
either in pigmented or nonpigmented skin
samples. A further filtering step was included to eliminate
transcripts having a very low expression level. All transcripts
which had a lower bound of zero for the 95% confidence
interval on the FPKM (fragments per kb for a
million reads) of the object were excluded from the dataset.
Transcript and locus assemblies were visualised by inspection
of the BAM files of the samples and the final
annotation with the IGV viewer [46].
original post,
http://www.mitbbs.com/article_t/Biology/31863359.html
2nd flor

【在 z********o 的大作中提到】
: 谁能给解释一下:depth 和 library, library size 这几个概念?在网上浏览了一天
: ,也没有搞清楚
: 谢谢。

t**********y
发帖数: 374
14
limma was originally designed for array.

【在 a**r 的大作中提到】
: limma 历史悠久,老牌软件了。 速度快, 表现也很好
: deseq的话,根据这篇文章,表现还是比较一般。bayseq看起来要好一些
: http://genomebiology.com/2013/14/9/R95

B*********r
发帖数: 19
15
what do you mean by "reconstruct transcripts with reference transcriptome"?

【在 a******k 的大作中提到】
: I have to say, I have very bad experience with Trapnell's pipeline.
: Tophat is complained to be very slow. If you have only tens of millions of
: reads, you can go with Tophat, but with a huge dataset like me, it runs
: forever (I admit that I have a billion reads). STAR is recommend by a lot of
: people and I also found it is very good. It finishes the job in one hour on
: our super cluster.
: Cufflink is also very slow. For my dataset, it has been running for one week
: (16 parallel jobs), and likely another several weeks or forever. It has
: some bugs or maybe more fair to say very bad taste on default parameter
: settings (you can tweak it if you are getting familiar with it). For example

x***u
发帖数: 297
16
Tophat is slow but result is OK. Cufflinks have been reported to have issues
Trinity + PASA 可以做transcripts reconstruct:
http://pasa.sourceforge.net/

it is kind of the only software that declares to reconstruct transcripts
with reference transcriptome.

【在 a******k 的大作中提到】
: I have to say, I have very bad experience with Trapnell's pipeline.
: Tophat is complained to be very slow. If you have only tens of millions of
: reads, you can go with Tophat, but with a huge dataset like me, it runs
: forever (I admit that I have a billion reads). STAR is recommend by a lot of
: people and I also found it is very good. It finishes the job in one hour on
: our super cluster.
: Cufflink is also very slow. For my dataset, it has been running for one week
: (16 parallel jobs), and likely another several weeks or forever. It has
: some bugs or maybe more fair to say very bad taste on default parameter
: settings (you can tweak it if you are getting familiar with it). For example

a******k
发帖数: 1190
17
My understanding is that it will first try to reconcile aligned reads/
fragments to known transcripts in a reference transcriptome (user input).
Then for those could not find supporting annotations, it tries to construct
novel transcripts.

【在 B*********r 的大作中提到】
: what do you mean by "reconstruct transcripts with reference transcriptome"?
a******k
发帖数: 1190
18
That's approximately the same conclusion I got.
Could you please be a little more specific about the problem Cufflinks has?
As far as I can tell, most of the problems are because of its bad default
parameter settings. I am not quite sure whether it is fundamentally flawed.
Strange it tries to make decisions for biologists by doing algorithmic
adjustments to the results outputted. But I believe most researchers would
like to see direct results of raw data.
There are many software/tools can do transcripts reconstruct. But I am
tempted by its idea of using knowledge of existing transcriptome as guidance
in the process of transcripts reconstruction.

issues

【在 x***u 的大作中提到】
: Tophat is slow but result is OK. Cufflinks have been reported to have issues
: Trinity + PASA 可以做transcripts reconstruct:
: http://pasa.sourceforge.net/
:
: it is kind of the only software that declares to reconstruct transcripts
: with reference transcriptome.

j****x
发帖数: 1704
19
这和annotated genome guarded transcript reconstruction有什么本质的区别吗?

construct

【在 a******k 的大作中提到】
: My understanding is that it will first try to reconcile aligned reads/
: fragments to known transcripts in a reference transcriptome (user input).
: Then for those could not find supporting annotations, it tries to construct
: novel transcripts.

l**********1
发帖数: 5204
20
Now we have GIIRA
pls check,
HTTPS double dot //sourceforge.net/projects/giira/
GIIRA – RNA-Seq Driven Gene Finding Incorporating Ambiguous Reads
Posted on October 16, 2013 By RNA-Seq Blog Administrator
Reply
The reliable identification of genes is a major challenge in genome research
since further analysis depends on the correctness of this initial step.
With high-throughput RNA-Seq data reflecting currently expressed genes, a
particularly meaningful source of information has become commonly available
for gene finding. However, practical application in automated gene
identification is still not the standard case. A particular challenge in
including RNA-Seq data is the difficult handling of ambiguously mapped reads.
Researchers at the Robert Koch-Institute, Germanyhave developed GIIRA (Gene
Identification Incorporating RNA-Seq data and Ambiguous reads), a novel
prokaryotic and eukaryotic gene finder that is exclusively based on a RNA-
Seq mapping and inherently includes ambiguously mapped reads. GIIRA extracts
candidate regions supported by a sufficient number of mappings and
reassigns ambiguous reads to their most likely origin using a maximum-flow
approach. This avoids the exclusion of genes that are predominantly
supported by ambiguous mappings. Evaluation on simulated and real data and
comparison with existing methods incorporating RNA-Seq information highlight
the accuracy of GIIRA in identifying the expressed genes.
AVAILABILITY: GIIRA is implemented in Java and is available from above HTTPS
link..
CONTACT: renardB'@'rki.de.
GIIRA – RNA-Seq Driven Gene Finding Incorporating Ambiguous Reads is a post
from: RNA-Seq Blog
HTTP double dot//www.informaticsblogs.com/author/rna-seq-blog-administrator/
or
Zickmann F et al., (2013).
GIIRA--RNA-Seq driven gene finding incorporating ambiguous reads.
Bioinformatics. Oct 27. [Epub ahead of print]
http://www.ncbi.nlm.nih.gov/pubmed/24123675
or PPTX slide link: (now converted to PDF format)
HTTP double dot//mendel.informatics.indiana.edu/~yye/lab/teaching/get.php?
course=fall2013-I519&name=GIIRA_paper3.pdf
or below attached figure

?
.
guidance

【在 a******k 的大作中提到】
: That's approximately the same conclusion I got.
: Could you please be a little more specific about the problem Cufflinks has?
: As far as I can tell, most of the problems are because of its bad default
: parameter settings. I am not quite sure whether it is fundamentally flawed.
: Strange it tries to make decisions for biologists by doing algorithmic
: adjustments to the results outputted. But I believe most researchers would
: like to see direct results of raw data.
: There are many software/tools can do transcripts reconstruct. But I am
: tempted by its idea of using knowledge of existing transcriptome as guidance
: in the process of transcripts reconstruction.

相关主题
大鼠的RNA-seq应该使用那个reference genome?如何检测 long noncoding RNA
RNA seq 数据统计分析问题请教RNA seq分析求教
RNA-seq 表达量问题Seeking advice on ChIP-Seq, RNA-Seq and/or miRNA array
进入Biology版参与讨论
l**********1
发帖数: 5204
21
sure,
plus Trans-ABySS, Oases or Scripture etc..
pls refer one review,
Martin JA et al., (2011).
Next-generation transcriptome assembly.
Nat Rev Genet. 12: 671-82.
and its Table 2
also 'SOAPdenovo'
noted on another review,
Yandell M et al., (2012).
A beginner's guide to eukaryotic genome annotation.
Nat Rev Genet. 13: 329-42.
http://www.ncbi.nlm.nih.gov/pubmed/22510764
or PDF link,
HTTP double dot//www.yandell-lab.org/publications/pdf/euk_genome_annotation_
review.pdf

issues

【在 x***u 的大作中提到】
: Tophat is slow but result is OK. Cufflinks have been reported to have issues
: Trinity + PASA 可以做transcripts reconstruct:
: http://pasa.sourceforge.net/
:
: it is kind of the only software that declares to reconstruct transcripts
: with reference transcriptome.

t***q
发帖数: 65
22
用CLC吧!现在也就卖5000刀,这都基本是白菜价了。
忽悠老板买下来,以后就轻松了,呵呵,当然别让老板知道CLC超级强大的功能
a******k
发帖数: 1190
23
Sounds the same idea. What else software can you recommend besides Cufflink?

【在 j****x 的大作中提到】
: 这和annotated genome guarded transcript reconstruction有什么本质的区别吗?
:
: construct

a******k
发帖数: 1190
24
have you ever read the posts carefully?

【在 l**********1 的大作中提到】
: sure,
: plus Trans-ABySS, Oases or Scripture etc..
: pls refer one review,
: Martin JA et al., (2011).
: Next-generation transcriptome assembly.
: Nat Rev Genet. 12: 671-82.
: and its Table 2
: also 'SOAPdenovo'
: noted on another review,
: Yandell M et al., (2012).

j****x
发帖数: 1704
25
http://www.ncbi.nlm.nih.gov/pubmed/24185837

Cufflink?

【在 a******k 的大作中提到】
: Sounds the same idea. What else software can you recommend besides Cufflink?
a******k
发帖数: 1190
26
I've seen that paper. My impression is that Cufflink remains the only one
that can do reference-based transcript assembly. Please correct me if I am
wrong.
Some tools like Augustus are originally designed to identify transcript(gene
models) without RNA-seq data. They now can use RNA-seq data, but
interesting is that the performance does not increase a lot with more
information. Most of the other tools do de-novo transcript reconstruction.
SLIDE seems another unique tool that do transcript abundance estimation with
reference transcriptome (i.e., no de novo finding). I only tried it briefly
so not quite sure.

【在 j****x 的大作中提到】
: http://www.ncbi.nlm.nih.gov/pubmed/24185837
:
: Cufflink?

l**********1
发帖数: 5204
27
RE:
>is that Cufflink remains the only one
Yes, if by overlap graph mode, Cufflinks is the only one,
but now we have 'Traph' which by splicing graph mode,
it was used minimum-cost network flows princeple..
details pls check,
Tomescu AI et al., (2013).
A Novel Combinatorial Method for Estimating
Transcript Expression with RNA-Seq:
Bounding the Number of Paths
Abstract. RNA-Seq technology o ers new high-throughput ways for transcript
identi cation and quanti cation based on short reads, and has recently
attracted great interest. The problem is usually modeled by a weighted
splicing graph whose nodes stand for exons and whose edges
stand for split alignments to the exons.
ignored
In order to obtain a practical tool,
we implement three optimizations and heuristics, which achieve better
performance on real data, and similar or better performance on simulated
data, than state-of-the-art tools Cuinks, IsoLasso and SLIDE. Our tool,
called Traph, is available at http://www.cs.helsinki.fi/gsa/traph/
PDF link,
>HHTP double dot//arxiv.org/pdf/1307.7811
or from
http://www.cs.helsinki.fi/en/gsa/traph/
there is one slide PPTX file which posted on upper floor:
gene
with
briefly
l**********1
发帖数: 5204
28
Sorry,
now pls check this one,
one historic review about almost soft what used in RNA-seq de novo assembly
task..
its slide PPTX file link, (n.b. already converted to PDF format)
HTTP double dot//www.cs.helsinki.fi/u/tomescu/traph/TKRM-HITSEQ.pdf
or attached Table here,
>
发信人: aablackk (black), 信区: Biology
标 题: Re: 如何处理RNA-Seq
发信站: BBS 未名空间站 (Sun Jan 5 02:03:36 2014, 美东)
omitted
Most of the other tools do de-novo transcript reconstruction. SLIDE seems
another unique tool that do transcript abundance estimation with
reference transcriptome (i.e., no de novo finding). I only tried it briefly
so not quite sure.
>>

【在 a******k 的大作中提到】
: have you ever read the posts carefully?
l**********1
发帖数: 5204
29
2nd slide attched here,
from under floor that PPTX file link..

what

【在 l**********1 的大作中提到】
: RE:
: >is that Cufflink remains the only one
: Yes, if by overlap graph mode, Cufflinks is the only one,
: but now we have 'Traph' which by splicing graph mode,
: it was used minimum-cost network flows princeple..
: details pls check,
: Tomescu AI et al., (2013).
: A Novel Combinatorial Method for Estimating
: Transcript Expression with RNA-Seq:
: Bounding the Number of Paths

l**********1
发帖数: 5204
30
RNA-seq SGS (Next Generation Sequencing 2.0) might be already older protocol
, (n.b. likes celluar Phone 3G service)
pls refer TGS(Next Generation Sequencing 3.0)and RNA-seq SGS hybrid protocol
new paper (n.b. likes cell phone 3.5 G service )
Au KF at al., (2013).
Characterization of the human ESC transcriptome by hybrid sequencing.
Proc Natl Acad Sci U S A. 110: E4821-30.
http://www.ncbi.nlm.nih.gov/pubmed/24282307

【在 z********o 的大作中提到】
: 老板让用RNA-Seq 的data。我是转行到Bioinformatics的,生物小白。有没有人能给推
: 荐一下该看哪些资料,用哪个算法?如何入手,入门?
: 谢谢大家

相关主题
请教RNA-Seq分析问题求科普RNA-sequencing
能否用solexa测序找到差别表达基因呀?paper help!
市场调查 Bioinformatics RNA-seq preprocessing tool请教RNA-seq 软件的安装调试的问题!
进入Biology版参与讨论
j****x
发帖数: 1704
31
与cufflinks类似的genome-guided assembly自然还有Scripture,后者速度更快但是对
低丰度的转录本似乎效果不佳。最近有朋友推荐过RNA-eXpress,不过我还没有试过。
另外,前面有帖子提到CLC Genomics,如果不差钱,确实是好选择。
此外,DRUT和RABT是值得考虑的辅助工具,对于你的需求可能有帮助。
BTW,SLIDE不了解,但是作者在这行里应该还算pp了,呵呵

gene
with
briefly

【在 a******k 的大作中提到】
: I've seen that paper. My impression is that Cufflink remains the only one
: that can do reference-based transcript assembly. Please correct me if I am
: wrong.
: Some tools like Augustus are originally designed to identify transcript(gene
: models) without RNA-seq data. They now can use RNA-seq data, but
: interesting is that the performance does not increase a lot with more
: information. Most of the other tools do de-novo transcript reconstruction.
: SLIDE seems another unique tool that do transcript abundance estimation with
: reference transcriptome (i.e., no de novo finding). I only tried it briefly
: so not quite sure.

a******k
发帖数: 1190
32
genome-guided assembly的很多,我说的是transcriptome-guided
呵呵,SLIDE作者好像AP了

【在 j****x 的大作中提到】
: 与cufflinks类似的genome-guided assembly自然还有Scripture,后者速度更快但是对
: 低丰度的转录本似乎效果不佳。最近有朋友推荐过RNA-eXpress,不过我还没有试过。
: 另外,前面有帖子提到CLC Genomics,如果不差钱,确实是好选择。
: 此外,DRUT和RABT是值得考虑的辅助工具,对于你的需求可能有帮助。
: BTW,SLIDE不了解,但是作者在这行里应该还算pp了,呵呵
:
: gene
: with
: briefly

D*a
发帖数: 6830
33
明天开课
University of Toronto
Bioinformatic Methods I
https://www.coursera.org/course/bioinfomethods1
l**********1
发帖数: 5204
34
中文论坛问不出答案的话
去Google Group Tophat 英文论坛问下 如何?
HTTPS double dot//groups.google.com/forum/#!topic/tuxedo-tools-users/
HQkjCNXx2-Y
HTTPS //groups.google.com/forum/#!forum/tuxedo-tools-users
from
http://tophat.cbcb.umd.edu/igenomes.shtml

【在 a******k 的大作中提到】
: genome-guided assembly的很多,我说的是transcriptome-guided
: 呵呵,SLIDE作者好像AP了

1 (共1页)
进入Biology版参与讨论
相关主题
RNA seq分析求教RNA-seq map工具
Seeking advice on ChIP-Seq, RNA-Seq and/or miRNA arraytruth about RNAseq vs Microarray
请教RNA-Seq分析问题non strand specific RNA-seq数据分析
能否用solexa测序找到差别表达基因呀?请问大家 RNA-Seq assembly 都用啥软件呢?
市场调查 Bioinformatics RNA-seq preprocessing toolRNA-seq结果分析求助
求科普RNA-sequencing关于RNA-seq的问题
paper help!简单介绍 Bioinformatics Tools for NGS 分析
请教RNA-seq 软件的安装调试的问题!大鼠的RNA-seq应该使用那个reference genome?
相关话题的讨论汇总
话题: rna话题: seq话题: transcript话题: reads