Papers

Peer-reviewed
Jun 25, 2015

CG-containing oligonucleotides and transcriptionfactor-binding motifs are enrichedin human pericentric regions

Genes and Genetic Systems
  • Yoshiko Wada
  • ,
  • Yuki Iwasaki
  • ,
  • Takashi Abe
  • ,
  • Kennosuke Wada
  • ,
  • Ikuo Tooyama
  • ,
  • Toshimichi Ikemura

Volume
90
Number
1
First page
43
Last page
53
Language
English
Publishing type
Research paper (scientific journal)
DOI
10.1266/ggs.90.43
Publisher
Genetics Society of Japan

Unsupervised data mining capable of extracting a wide range of information from big sequence data without prior knowledge or particular models is highly desirable in an era of big data accumulation for research on genes, genomes and genetic systems. By handling oligonucleotide compositions in genomic sequences as high-dimensional data, we have previously modified the conventional SOM (self-organizing map) for genome informatics and established BLSOM for oligonucleotide composition, which can analyze more than ten million sequences simultaneously and is thus suitable for big data analyses. Oligonucleotides often represent motif sequences responsible for sequence-specific binding of proteins such as transcription factors. The distribution of such functionally important oligonucleotides is probably biased in genomic sequences, and may differ among genomic regions. When constructing BLSOMs to analyze pentanucleotide composition in 50-kb sequences derived from the human genome in this study, we found that BLSOMs did not classify human sequences according to chromosome but revealed several specific zones, which are enriched for a class of CG-containing pentanucleotides
these zones are composed primarily of sequences derived from pericentric regions. The biological significance of enrichment of these pentanucleotides in pericentric regions is discussed in connection with cell type- and stage-dependent formation of the condensed heterochromatin in the chromocenter, which is formed through association of pericentric regions of multiple chromosomes.

Link information
DOI
https://doi.org/10.1266/ggs.90.43
PubMed
https://www.ncbi.nlm.nih.gov/pubmed/26119665
ID information
  • DOI : 10.1266/ggs.90.43
  • ISSN : 1880-5779
  • ISSN : 1341-7568
  • Pubmed ID : 26119665
  • SCOPUS ID : 84934783155

Export
BibTeX RIS