Papers

Peer-reviewed International journal
Mar 10, 2022

Unsupervised explainable AI for molecular evolutionary study of forty thousand SARS-CoV-2 genomes.

BMC microbiology
  • Yuki Iwasaki
  • ,
  • Takashi Abe
  • ,
  • Kennosuke Wada
  • ,
  • Yoshiko Wada
  • ,
  • Toshimichi Ikemura

Volume
22
Number
1
First page
73
Last page
73
Language
English
Publishing type
Research paper (scientific journal)
DOI
10.1186/s12866-022-02484-3

BACKGROUND: Unsupervised AI (artificial intelligence) can obtain novel knowledge from big data without particular models or prior knowledge and is highly desirable for unveiling hidden features in big data. SARS-CoV-2 poses a serious threat to public health and one important issue in characterizing this fast-evolving virus is to elucidate various aspects of their genome sequence changes. We previously established unsupervised AI, a BLSOM (batch-learning SOM), which can analyze five million genomic sequences simultaneously. The present study applied the BLSOM to the oligonucleotide compositions of forty thousand SARS-CoV-2 genomes. RESULTS: While only the oligonucleotide composition was given, the obtained clusters of genomes corresponded primarily to known main clades and internal divisions in the main clades. Since the BLSOM is explainable AI, it reveals which features of the oligonucleotide composition are responsible for clade clustering. Additionally, BLSOM also provided information concerning the special genomic region possibly undergoing RNA modifications. CONCLUSIONS: The BLSOM has powerful image display capabilities and enables efficient knowledge discovery about viral evolutionary processes, and it can complement phylogenetic methods based on sequence alignment.

Link information
DOI
https://doi.org/10.1186/s12866-022-02484-3
PubMed
https://www.ncbi.nlm.nih.gov/pubmed/35272618
PubMed Central
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8907386
ID information
  • DOI : 10.1186/s12866-022-02484-3
  • Pubmed ID : 35272618
  • Pubmed Central ID : PMC8907386

Export
BibTeX RIS