n******7 发帖数: 12463 | 1 最近处理一些数据,鉴定到了几百个unique的transcripts,对应一百来个基因。这样
很多
transcripts其实是一个gene的不同isoform。现在因为要annotate这些isoform而有些头
疼。
1. 哪里有高质量又比较全的isoform数据呢?
我希望用已知的isoform的一个集合做reference,来确定我们鉴定的isoform,哪些是
之前已
经被发现的,哪些是我们新近鉴定出来的。
我个人喜欢用RefSeq数据,但是挑了几个基因,发现RefSeq记录的isoform数量还是挺
少的。
UCSC Known gene没有经过human curation,很多记录仅仅基于genbank数据。我担心会有
很多artificial的序列
GenCode/ENSEMBLE 数据,一直没太搞明白,Gendoce的level 1+2的数据似乎质量还可
以,
但是也不知道他们具体的annotation的流程
CCDS似乎就是ENSEMBLE和RefSeq的交集,coverage估计是个问题
Alternative Splicing的数据我不太熟悉,看过一些数据库,很... 阅读全帖 |
|
d****m 发帖数: 3 | 2 NCBI Reference Sequences
Definition: The NCBI Reference Sequence project (RefSeq) provides reference
sequence standards for the naturally occurring molecules of the central dogm
a, from chromosomes to mRNAs to proteins. Toward this goal, intermediate lar
ger genomic regions, instantiated as accessions of the format NG_123456 (gen
omic sequence with curated annotation) or NT_123456 (computed assembly and a
nnotation) are also produced. RefSeq standards provide a foundation for the
functional anno |
|
j*p 发帖数: 411 | 3 For example:
Go to this link:
http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=184582003&clade=m
choose the right clade/genome/assembly, and choose refseq genes, and click
summary/statistics.
If it doesn't work give a file name and click "get output", you should
download a file and you can count in Excel how many refseq genes are
annotated. |
|
e******6 发帖数: 6 | 4 Job Description
Cooperation Introduction
GenScript is a contract research organization (CRO) specialized in
biological research and drug discovery services. Ever since its inception in
2002, GenScript has experienced rapid, constant, and organic growth. Now
GenScript has become a leading biology contract research organization (CRO)
in the world, with a global operating team of over 900 dedicated scientists,
staffs.
Headquartered in Piscataway, New Jersey, GenScript has become a leading
Biology C... 阅读全帖 |
|
n******7 发帖数: 12463 | 5 另外,我记得ncbi的ftp上有rna基因组坐标文件的,起码对refseq是这样 |
|
q******c 发帖数: 741 | 6 The experiment is on mouse cells. I mapped the readings to mm9(latest
version of mouse genome). For the annotation genome, you can select Refseq,
UCSC genes, Ensembl etc. |
|
R***R 发帖数: 166 | 7 要是手边有芯片结果中的上万个基因信息,如Refseq,id,怎么快捷地找到这些基因的
功能注释? 一个一个查好像工作量太大了些 。
有什么软件可以推荐的吗? 谢谢板上的朋友 |
|
j*p 发帖数: 411 | 8 本人在wet lab里面做纯数据分析,for NGS data analysis, 简单介绍一些自己接触过
,并且觉得挺有用的工具,说的有点杂,权作抛砖引玉,还请不吝赐教。
Next-Gen sequencing(NGS)和现在正在发展的3rd-gen sequencing将会在生物学研究中
被越来越广泛应用。不管你信不信,反正我信了。一是基于实验成本的降低($1k
whole-genome sequencing is coming),越来越多的实验室可以操作;二是可以提供
相对low throughput experiment多的多的数据和信息,可以看到很多从前看不到的东
西;三是sequencer本身对测序的准确性正在逐渐提高,所以实验固有错误率降低;四
是各种算法的成熟应用,这使得很多由于实验产生的误差在出数据后通过对数据的分析
得以过滤。按照library preparation来分,NGS主要有DNA-seq和RNA-seq
DNA-seq is usually used as ChIP-seq to study transcription factor(TF)-DNA
bi... 阅读全帖 |
|
j*p 发帖数: 411 | 9 本人在wet lab里面做纯数据分析,for NGS data analysis, 简单介绍一些自己接触过
,并且觉得挺有用的工具,说的有点杂,权作抛砖引玉,还请不吝赐教。
Next-Gen sequencing(NGS)和现在正在发展的3rd-gen sequencing将会在生物学研究中
被越来越广泛应用。不管你信不信,反正我信了。一是基于实验成本的降低($1k
whole-genome sequencing is coming),越来越多的实验室可以操作;二是可以提供
相对low throughput experiment多的多的数据和信息,可以看到很多从前看不到的东
西;三是sequencer本身对测序的准确性正在逐渐提高,所以实验固有错误率降低;四
是各种算法的成熟应用,这使得很多由于实验产生的误差在出数据后通过对数据的分析
得以过滤。按照library preparation来分,NGS主要有DNA-seq和RNA-seq
DNA-seq is usually used as ChIP-seq to study transcription factor(TF)-DNA
bi... 阅读全帖 |
|
|
n******7 发帖数: 12463 | 11 这些序列都是cDNA clone
在对比了测序结果和Y2H结果之后,我觉得很蹊跷
比如附件的那个'F07',跟对应的REFSEQ序列比,只有最两端的序列,我怎么都觉得不
像是splicing的产物 |
|
j*p 发帖数: 411 | 12 input file is just a list of genes,
Gene symbol, Gene ID, Refseq ID, AffyID 等等等等,都可以.
不需要说明,试试就知道很简单了 |
|
|
u*********1 发帖数: 2518 | 14 作为一个曾经0基础的菜鸟,我还是蛮有体会的。
想想一年前我连linux里的grep都不晓得是啥。老板说“grep”,我说gre。。啥?greb
吗?老板摇摇头说you really have a lot to learn...不过老板超好,想办法给我把
各种基础的东西讲清楚。。。包括RAM是啥。。汗。。。
做NGS/bioinformatics的,我觉得核心思想还是:如何利用计算机手段解决生物问题。
说起来简单但未必每个人都深刻体会的到。什么python/bash/perl啥啥的,要入门很快
,但也绝对不是什么两个星期就搞定。我现在和python打交道也一年了,但也完全就是
个皮毛,主要是你自己的project决定的。。如果你永远只需要简单的process下你的
text,而且text如果不大比如100MB,你可以永远for line in text。。或者readlines
(),但如果碰到很大的text,就不能readlines()了因为cluster可能没有那么大的
memory to load the whole text.
所以我觉得就是现学现用,除非你是CS系科班搞计算出身... 阅读全帖 |
|
l******g 发帖数: 1623 | 15 不想做广告,你随便用你origene里的RefSeq ID 加上 “cDNA clone“或者类似字样,
google一下,会有很多公司在卖 |
|
l***s 发帖数: 841 | 16 That's weird. We have tried thousands of primers from the database, and they
mostly worked well (>90%). As to isoforms, the primers were designed to
cover all known isoforms of a gene based on NCBI RefSeq. This was done by
designing primers from the common regions of all isoforms. Not sure why your
Ct values are low. Did you follow the protocol from the website (primer
conc, annealing temperature, etc)? Another consideration is that not all
genes are expressed in all cells.
WELL |
|
c*********r 发帖数: 1312 | 17 河北这篇引用的一篇关于argonaute的14年的综述,里边只分析了RefSeq里的487个Ago
蛋白,其中就有河北这篇的Natronobacterium gregoryi Ago。如果他们是从这篇综述
里开始选候选基因开始研究的话,那还有好多Ago没有被研究过。 |
|
k******d 发帖数: 76 | 18 NM_020975.4:c.135G>A 这个是refseq,我理解的是它和基因组的build没关系 |
|
k******d 发帖数: 76 | 19 谢谢。可能是因为这个原因而不一致:
https://www.ncbi.nlm.nih.gov/refseq/rsg/about/
Criterion 2. Standard allele
The default implementation of 'standard allele' will be the sequence from
the public reference assembly. If, however, there is published evidence,
evidence from locus-specific databases, or evidence from clinical testers,
that the sequence in the Reference assembly is not standard, the RefSeqGene
sequence will be constructed from an alternate source sequence, or locally
modified. |
|
c***y 发帖数: 615 | 20 working on mouse data. Would like to integrate information from refseq, ucsc
genome browser, GO consortium, and uniprotkb. They all have different IDs.
Is there an easy way to convert those IDs across different database?
Thank you very much for any inputs! |
|