![]() In other vertebrates, the genome of green turtle ( Chelonia mydas) is leading with 33 Kbp of human sequence. It’s surprising to see many LHO’s in the rat genome, as it has been studies and refined extensively and is now at build number 6.0. Among mammals, cat genome is the most contaminated, containing over 15 Kbp of human DNA. Tables 1– 4 list genomes containing over 2 Kbp of apparently human sequence. The BED files allow masking the LHO regions in non-human genome using bedtools, with the following command: “bedtools maskfasta -fi genome.fa -bed lho.bed -fo masked.fa”. We also provide the coordinates of LHO regions in BED format in S2 Dataset. The name of each LHO sequence consists of the original sequence names, followed by the 1-based coordinates of the LHO within that sequence, inclusive on both ends. We provide all LHO sequences in FASTA format in S1 Dataset, which contains the following files: “LHO-sequences-non-primate-mammals.fna”, “LHO-sequences-non-mammal-vertebrates.fna”, “LHO-sequences-non-vertebrate-eukaryotes.fna”, and “LHO-sequences-prokaryotes.fna”. ![]() Prokaryote genomes containing at least 2 kbp of likely human originated (LHO) sequence. We were able to find numerous instances of sequence that can only be reasonably explained as contamination by human DNA. In this study we used massive homology search and lineage specificity in order to discard the instances of true conservation and detect signals consistent with contamination. Therefore homology alone is not enough to conclude that contamination has occurred. Ultraconserved sequences of 100% identity spanning over 200 bp are known to exist between human and mouse. In this study we set out to systematically survey available genome sequences for the evidence of contamination from non-repetitive human DNA. investigated human contamination in non-primates by looking at SINE sequence in 2,749 genomes. However outside of the field of ancient DNA this problem receives little attention. Ancient DNA is particularly affected by human contamination. Human is another important source of contamination, since human is present at all stages of sample handling and lab procedures. The problem of contamination is known for over two decades. Contamination present in the reference genome sequence could lead to incorrect or confusing results. E.g., in metagenomics a good reference of genome sequences is important. Funding for open access charge was provided by Tokai University School of Medicine.Ĭompeting interests: The authors have declared that no competing interests exist.ĭatabases of reference genome sequences is an important resource in vast number of biological and medical studies. ![]() This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.ĭata Availability: All relevant data are within the paper and its Supporting Information files.įunding: This research was partially supported by the Health Labour Sciences Research Grant from the Ministry of Health, Labour and Welfare of Japan, and by the Japan Initiative for Global Research Network on Infectious Diseases (J-GRID) Grant from the Ministry of Education, Culture, Sport, Science and Technology of Japan, and Japan Agency for Medical Research and Development (AMED). Received: ApAccepted: JPublished: September 9, 2016Ĭopyright: © 2016 Kryukov, Imanishi. PLoS ONE 11(9):Įditor: Deyou Zheng, Yeshiva University Albert Einstein College of Medicine, UNITED STATES Citation: Kryukov K, Imanishi T (2016) Human Contamination in Public Genome Assemblies. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |