論文

査読有り
2015年6月

Necessary relations for nucleotide frequencies

JOURNAL OF THEORETICAL BIOLOGY
  • Robert Sinclair

374
開始ページ
179
終了ページ
182
記述言語
英語
掲載種別
DOI
10.1016/j.jtbi.2015.03.025
出版者・発行元
ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

Genome composition analysis of di-, tri- and tetra-nucleotide frequencies is known to be evolutionarily informative, and useful in metagenomic studies, where binning of raw sequence data is often an important first step. Patterns appearing in genome composition analysis may be due to evolutionary processes or purely mathematical relations. For example, the total number of dinucleotides in a sequence is equal to the sum of the individual totals of the sixteen types of dinucleotide, and this is entirely independent of any assumptions made regarding mutation or selection, or indeed any physical or chemical process. Before any statistical analysis can be attempted, a knowledge of all necessary mathematical relations is required. I show that 25% of di-, tri- and tetra-nucleotide frequencies can be written as simple sums and differences of the remainder. The vast majority of organisms have circular genomes, for which these relations are exact and necessary. In the case of linear molecules, the absolute error is very nearly zero, and does not grow with contiguous sequence length. As a result of the new, necessary relations presented here, the foundations of the statistical analysis of di-, tri- and tetra-nucleotide frequencies, and k-mer analysis in general, need to be revisited. (C) 2015 Elsevier Ltd. All rights reserved.


リンク情報
DOI
https://doi.org/10.1016/j.jtbi.2015.03.025
PubMed
https://www.ncbi.nlm.nih.gov/pubmed/25843217
Web of Science
https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000354667000017&DestApp=WOS_CPL
URL
http://europepmc.org/abstract/med/25843217
URL
http://orcid.org/0000-0002-9646-4322
ID情報
  • DOI : 10.1016/j.jtbi.2015.03.025
  • ISSN : 0022-5193
  • eISSN : 1095-8541
  • ORCIDのPut Code : 17091914
  • PubMed ID : 25843217
  • Web of Science ID : WOS:000354667000017

エクスポート
BibTeX RIS