由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Biology版 - 科学家呼吁关注全球基因组数据库污染Contaminated genomes
相关主题
qPCR with genomic DNA backgroundRe: where to find whole sequence of a gene?
请教如何排除RNA sample 中的genomic DNA contamination(急)求助:要submit sequence data 到Genbank,怎么产生需要的feature table?
完成的cancer genome sequencing 数据现在有吗?求助:老鼠centremere genome position和centremere seqence的数据
2nd generation sequencing能这么用吗?the Real Cost of Sequencing
问几个十分十分基础的生物问题, 请好心人解释下。。。包子答谢Why you should not do bioinformatics
求助DNA测序结果分析软件how to create non-redundant DNA sequence dataset
更新一代测序技术,哪一个最有可能成功?请教一个DNA sonication的问题
[转载] SARS is sequenced大家都用什么免费软件来替代 Vector NTi 啊?
相关话题的讨论汇总
话题: dna话题: human话题: genome话题: sequences
进入Biology版参与讨论
1 (共1页)
g*********d
发帖数: 233
1
科学家呼吁关注全球基因组数据库污染
样品处理有可能是导致DNA数据库广泛污染的最主要原因
2月16日发表在《公共科学图书馆·综合》(PLoS ONE)期刊上的一份研究报告称康涅
狄格大学的遗
传学家Mark Longo及同事发现由顶级公共测序机构提供的测序结果构建的基因组数据库
中的大约
1/5的细菌、植物和非灵长类动物基因组数据受到了人类DNA的污染,样品处理有可能是
导致DNA数据
库广泛污染的最主要原因。这一研究报告引起了生物研究人员及各大权威媒体的高度关
注,《科学
家》(The Scientist)杂志以及《自然》(Nature)杂志均在其官方网络上第一时间
对这一事件
进行了报道。
Mark Longo等在报告中呼吁科学家们需更加努力以确保测序获得的基因组不受到污染,
并应对来自
公共基因组数据库的基因组进行潜在污染检测。
“基因组污染是一个大问题,但却不是一个新问题,”加州大学进化生物学家、美国能
源部联合基因组
研究所系统发育基因组学计划负责人Jonathan Eisen说:“这篇论文或可帮助提醒人们
注意这一问
题。”
污染有可能在测序的任何一个阶段导入到基因组序列中。有可能是空气中的细菌落到了
样品中,或是
灭菌后仍残留在试剂中的DNA片段。但最常见的污染原因则可能是科学家自身,譬如在
扩增前将自身
的一个细胞落入了样品中。
“研究人员戴手套进行实验操作不仅是对自身的保护,并且也可避免样品受到自身的污
染,”论文的作
者之一、康涅狄格大学分子遗传学家Rachel O'Neill说道。
O'Neill实验室的一位大学生在对基因组数据库进行保守序列筛查时,兴奋地发现大量
物种间均存在
一个相同的序列。然而当他尝试在实验室重复这些结果时却失败了。这位学生不禁开始
质疑数据库基
因组是否存在污染,于是他与实验室的同事合作对四个公共数据库(UCSC Genome
Browser数据
库,NCBI的GenBank数据库,DOE Joint Genome Institute数据库以及Ensembl)中保存的
所有非灵长类基因组进行了人类特异性重复序列Alu元件的筛查。
在搜索的2057个原始序列基因组中,研究人员发现有454个存在人类Alu元件序列的污染
,约占总数
的22.39%。“我们发现污染的水平高到足以应引起人们注意的程度,”O'Neill说;“
而这还仅只是
来自于人类的污染,可以想象会有多少来自实验室常见物种例如大肠杆菌等的污染存在
。”
Eisen指出:“大量论文曾报道过物种间的水平基因转移,如今不禁让人质疑是否都仅
是人类DNA污染
数据所致。人类污染的频率将要求科学家们不得不完成一些超过标准的实验以确证他们
的实验结果。”
“当涉及到人类测序时,高水平的序列污染有可能会引发严重的后果,”O'Neill说:
“在一个鱼样本
中找到Alu元件非常的简单。但是在一个人类样本中寻找另一个人的样本就非常的困难
。根据如此高
污染的序列来决定个体化治疗的策略有可能会导致难以想象的悲剧。“
“随着研究的不断推进,科学家们必须投入更多的资金进行质量控制,然而在追求研究
数据的压力下质
量控制的重要性却被人抛在了脑后,”Eisen说:“如果每个人都能意识到数据质量的
重要性将会有多
好啊,但是目前这似乎很难做到。”
Read more: Contaminated genomes - The Scientist - Magazine of the Life
Sciences
http://www.the-scientist.com/news/display/57990/#ixzz1EpmRfwMO
http://www.the-scientist.com/news/display/57990/
Contaminated genomes
Human DNA sequences are found in nearly a quarter of the publically-
available non-primate genomes, emphasizing the need for better quality
control measures
[Published 16th February 2011 10:00 PM GMT]
More than 20 percent of non-primate genome sequences from the top public
sequencing facilities are contaminated with human DNA, reports a study
published today (February 16) in PLoS ONE.
A Sanger sequencing read
Image: Wikimedia commons, Loris
This research calls for scientists to work harder to ensure that the
genomes they're sequencing do not become contaminated during the
sequencing process, and, more importantly, to check for potential
contamination in genomes pulled from the public databases on which
genomes are normally deposited.
"Genome contamination is a big problem -- but it's not new," said
Jonathan Eisen, evolutionary biologist at the University of California,
Davis and lead of the phylogenomics program at the United States
Department of Energy Joint Genome Institute. "This paper might help
remind people of this [issue]."
Contamination can be introduced into a genomic sequence at any number of
stages. It could be airborne bacteria landing in a sample, or even DNA
fragments floating around in reagents, left behind after sterilization.
But probably the most common contaminant is the scientist herself. It
just takes a skin cell falling into the sample before amplification.
"Are you wearing gloves to protect yourself from your sample or your
sample from you?" Rachel O'Neill, paper author and molecular geneticist
at the University of Connecticut, wondered. "I think it's a little bit
of both."
A graduate student in O'Neill's lab was screening genome databases for
conserved sequences, and was excited to find the same sequence across
diverse species. However, when he tried to replicate the results in the
lab, he failed, suggesting that the database genomes were contaminated.
So he decided to screen all non-primate genomes housed in four public
databases -- University of California, Santa Cruz's genome browser,
National Center for Biotechnology Information's GenBank, the Joint
Genome Institute, and Ensembl -- for human-specific repetitive sequences
known as AluY elements.
Of the 2,057 raw sequence genomes searched, 454 contained this human DNA
sequence, or 22.39 percent. "The level of contamination we have found is
high enough to show concern," said O'Neill. And that's just
contamination from human sources, she added -- just imagine how much
contamination could exist from species like E. coli or others commonly
found in the lab.
Eisen noted the flurry of papers reporting horizontal gene transfers
between species, such as the report this week in mBio of human DNA
acquired by gonorrhea, and wondered if this could simply be an issue of
human DNA contaminating the data.
The frequency of human contamination requires scientists to do extra
experiments, to go above and beyond the norm to confirm their results,
Eisen argued. "All you need is one cell to do something weird and you
have the potential for all kinds of anomalies."
"There is always that lingering doubt," Mark Pallen, a microbial
genomicist at the University of Birmingham, said of the gonorrhea
sequence, though he added he thinks the gonorrhea example is probably a
case of bona fide DNA transfer.
The high level of sequence contamination could spell real trouble when
it comes to human sequencing, O'Neill said. "Finding an Alu element from
a human in a fish sample is very straightforward," she said. "Finding a
human sample in a human sample is where the difficulty comes in."
Relying on sequencing with such high human contamination to make
decisions about personal health could be catastrophic.
Moving forward, scientists must invest more in quality control, Eisen
said, but the importance of this step can be lost behind the pressure to
generate more data. "It would be nice if everybody took a step back and
said that the quality of data is also important," he said. "But it's a
hard argument to win; it's hard to convince myself in some cases."
Longo, M.S., et al. "Abundant Human DNA Contamination Identified in Non-
Primate Genome Databases." PLoS ONE, DOI: 10.1371/journal.pone.0016410
==================
Genome databases suffer from the human touch
Contamination of non-primate DNA archives with human sequences indicates
that better screening is needed.
Melissa Lee Phillips
http://www.nature.com/news/2011/110216/full/news.2011.99.html
Sample-handling may be responsible for widespread contamination of DNA
databases with human sequences.Ingram Publishing
Around a fifth of non-primate genome databases seem to be contaminated
with human DNA sequences, according to a study.
Mark Longo, a geneticist at the University of Connecticut in Storrs, and
his colleagues found that 18% of public databases of bacterial, plant
and animal genome sequences contain stretches of human DNA, possibly as
a result of researchers handling samples during sequencing. Their
findings are published today in PLoS ONE1.
David Haussler, a biomolecular engineer at the University of California,
Santa Cruz, says that many genomics researchers are already aware of the
presence of human DNA artefacts in genome assemblies, but this
contamination has never been quantified.
Forensic scientists and researchers working with ancient DNA take
extreme measures to avoid contamination, but most sequencing projects
aren't so stringent. "It would be hugely expensive," says Haussler.
Filter failure
Longo and his colleagues decided to investigate genome-database
contamination after discovering human sequences in a project on the
genome of the zebrafish (Danio rerio). They scanned non-primate genome
databases for genetic 'Alu' elements — short stretches of DNA
characterized by the action of a particular enzyme — that are abundant
in the human genome and are known to be specific to primates.
The researchers found human DNA sequences in 492 out of 2,749 sequencing
archives that they checked. Contamination showed up in raw sequencing
data and in final assemblies of data that had been pieced together by
computers to compile a complete genome sequence.
Most contamination in assembled sequences consisted of only a few
hundred DNA bases at a time, although in a minority of cases stretches
of more than a thousand human bases were seen in assembled non-primate
sequences.
It is not surprising to find contamination in raw sequencing reads, says
Robert Waterston, a genomicist at the University of Washington in
Seattle. These are "unvarnished raw data", he says, and "some
contamination seems inevitable". However, "the presence of human
sequences in assemblies is another matter", he adds.
The computer algorithms that assemble sequences ought to spot
contamination and remove artefacts, says Haussler. The latest findings
"represent a failure of that filter".
Longo and his team speculate that contaminating sequences could come
either from skin and hair cells from the people who handle samples, or
from other DNA libraries that are kept at the same facilities. The team
also recorded contamination from species other than humans, which lends
credence to the latter possibility, says Rachel O'Neill, a cell
biologist at the University of Connecticut and the study's lead author.
The researchers found, for example, evidence that databases of platypus
(Ornithorhynchus anatinus) DNA contain some sequences that probably
originate from tammar wallabies (Macropus eugenii).
Spring cleaning
"It would be great if we could clean up" these artefacts, says Haussler.
Using forensics-level precautions would be prohibitively expensive for
most projects, but bioinformatics specialists need to improve their
contamination filters before final assemblies are released, he says.
Most sequences are routinely updated, so "I would hope that the next
versions will have this contamination eliminated", he adds.
O'Neill says that the major concern surrounding contamination lies not
so much in the errors that have been introduced so far, but, "looking
forward, in translating the next wave of genomics into clinical
practice".
It is often straightforward to spot human contamination in non-human
genomes, but it will not be so easy to identify contamination of one
human genome with another. As more labs and companies begin to sequence
whole genomes of individual people for personalized medicine or to study
how genetic differences affect disease, potential contamination "is
going to be very difficult to track", says O'Neill.
1 (共1页)
进入Biology版参与讨论
相关主题
大家都用什么免费软件来替代 Vector NTi 啊?问几个十分十分基础的生物问题, 请好心人解释下。。。包子答谢
求教: 细菌genome分析求助DNA测序结果分析软件
请教:如何sequence biotinylated ss DNA oligo?更新一代测序技术,哪一个最有可能成功?
困惑,knock down在蛋白水平work,但qpcr检测不到差别[转载] SARS is sequenced
qPCR with genomic DNA backgroundRe: where to find whole sequence of a gene?
请教如何排除RNA sample 中的genomic DNA contamination(急)求助:要submit sequence data 到Genbank,怎么产生需要的feature table?
完成的cancer genome sequencing 数据现在有吗?求助:老鼠centremere genome position和centremere seqence的数据
2nd generation sequencing能这么用吗?the Real Cost of Sequencing
相关话题的讨论汇总
话题: dna话题: human话题: genome话题: sequences