An unsupervised gene selection method based on multivariate normalized mutual information of genes

نویسندگانMohsen Rahmanian - Eghbal G. Mansoori
نشریهChemometrics and Intelligent Laboratory Systems
شماره صفحات104512
شماره مجلد2222
ضریب تاثیر (IF)3.491
نوع مقالهFull Paper
تاریخ انتشار2022-01-15
رتبه نشریهISI (WOS)
نوع نشریهالکترونیکی
کشور محل چاپهلند
نمایه نشریهCurrent Contents Analytical Abstracts ASCA BioSciences Information Service Cambridge Scientific Abstracts Chemical Abstracts Chromatography Abstracts Current Index to Statistics Embase INSPEC Web of Science Scopus CIS

چکیده مقاله

Gene expression data analysis has always been challenging due to complex and high-dimensional samples and genes. Generally, the number of samples is much smaller than the number of genes in microarray gene expression data. Handling this imbalance data as machine learning tasks have the risk of generating an over-fitted learning model, reducing predictability, and unreadability of genetic data. These problems can be significantly decreased by choosing the more informative genes. Unsupervised gene selection techniques can estimate the relation among genes well. Though using mutual information and symmetric uncertainty can estimate the genes' relevancy well, their bivariate measures ignore the possible dependencies among several genes. To address this issue, we propose an unsupervised gene selection scheme based on information theoretic measures. It uses a similarity-based algorithm for gene clustering and then introduces some virtual genes as representatives of gene clusters. These representative genes will have the most common information with the genes in clusters and the least similarity with the representatives of other clusters. The experimental results on benchmark microarray gene expression datasets demonstrate the effectiveness of our approach, as compared to some information theoretic schemes beside to prototype- and density-based clustering methods in both unsupervised and supervised scenarios.

لینک ثابت مقاله

tags: Unsupervised gene selection, Gene clustering, Microarray gene expression, Information theory, Total correlation, Multivariate normalized mutual information