An unsupervised gene selection method based on multivariate normalized mutual information of genes

AuthorsMohsen Rahmanian - Eghbal G. Mansoori
JournalChemometrics and Intelligent Laboratory Systems
Page number104512
Volume number2222
Paper TypeFull Paper
Published At2022-01-15
Journal GradeISI (WOS)
Journal TypeElectronic
Journal CountryNetherlands
Journal IndexCurrent Contents Analytical Abstracts ASCA BioSciences Information Service Cambridge Scientific Abstracts Chemical Abstracts Chromatography Abstracts Current Index to Statistics Embase INSPEC Web of Science Scopus CIS


Gene expression data analysis has always been challenging due to complex and high-dimensional samples and genes. Generally, the number of samples is much smaller than the number of genes in microarray gene expression data. Handling this imbalance data as machine learning tasks have the risk of generating an over-fitted learning model, reducing predictability, and unreadability of genetic data. These problems can be significantly decreased by choosing the more informative genes. Unsupervised gene selection techniques can estimate the relation among genes well. Though using mutual information and symmetric uncertainty can estimate the genes' relevancy well, their bivariate measures ignore the possible dependencies among several genes. To address this issue, we propose an unsupervised gene selection scheme based on information theoretic measures. It uses a similarity-based algorithm for gene clustering and then introduces some virtual genes as representatives of gene clusters. These representative genes will have the most common information with the genes in clusters and the least similarity with the representatives of other clusters. The experimental results on benchmark microarray gene expression datasets demonstrate the effectiveness of our approach, as compared to some information theoretic schemes beside to prototype- and density-based clustering methods in both unsupervised and supervised scenarios.

Paper URL

tags: Unsupervised gene selection, Gene clustering, Microarray gene expression, Information theory, Total correlation, Multivariate normalized mutual information