Evaluation of critical data processing steps for reliable prediction of gene co-expression from large collections of RNA-seq data.

PloS one

Alexis Vandenbon

巻: 17
号: 1
開始ページ: e0263344
終了ページ
記述言語: 英語
掲載種別: 研究論文（学術雑誌）
DOI: 10.1371/journal.pone.0263344

MOTIVATION: Gene co-expression analysis is an attractive tool for leveraging enormous amounts of public RNA-seq datasets for the prediction of gene functions and regulatory mechanisms. However, the optimal data processing steps for the accurate prediction of gene co-expression from such large datasets remain unclear. Especially the importance of batch effect correction is understudied. RESULTS: We processed RNA-seq data of 68 human and 76 mouse cell types and tissues using 50 different workflows into 7,200 genome-wide gene co-expression networks. We then conducted a systematic analysis of the factors that result in high-quality co-expression predictions, focusing on normalization, batch effect correction, and measure of correlation. We confirmed the key importance of high sample counts for high-quality predictions. However, choosing a suitable normalization approach and applying batch effect correction can further improve the quality of co-expression estimates, equivalent to a >80% and >40% increase in samples. In larger datasets, batch effect removal was equivalent to a more than doubling of the sample size. Finally, Pearson correlation appears more suitable than Spearman correlation, except for smaller datasets. CONCLUSION: A key point for accurate prediction of gene co-expression is the collection of many samples. However, paying attention to data normalization, batch effects, and the measure of correlation can significantly improve the quality of co-expression estimates.

リンク情報

DOI: https://doi.org/10.1371/journal.pone.0263344
PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35089979
PubMed Central: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8797241

ID情報

DOI : 10.1371/journal.pone.0263344
PubMed ID : 35089979
PubMed Central 記事ID : PMC8797241

エクスポート: BibTeX RIS

バンデンボンアレクシス

論文

Evaluation of critical data processing steps for reliable prediction of gene co-expression from large collections of RNA-seq data.

メニュー

共著者の一覧