این سایت در حال حاضر پشتیبانی نمی شود و امکان دارد داده های نشریات بروز نباشند
Cell Journal، جلد ۲۵، شماره ۵، صفحات ۳۴۷-۳۵۳

عنوان فارسی
چکیده فارسی مقاله
کلیدواژه‌های فارسی مقاله

عنوان انگلیسی The Performance Evaluation of The Random Forest Algorithm for A Gene Selection in Identifying Genes Associated with Resectable Pancreatic Cancer in Microarray Dataset: A Retrospective Study
چکیده انگلیسی مقاله Objective: In microarray datasets, hundreds and thousands of genes are measured in a small number of samples,
and sometimes due to problems that occur during the experiment, the expression value of some genes is recorded as
missing. It is a difficult task to determine the genes that cause disease or cancer from a large number of genes. This
study aimed to find effective genes in pancreatic cancer (PC). First, the K-nearest neighbor (KNN) imputation method
was used to solve the problem of missing values (MVs) of gene expression. Then, the random forest algorithm was
used to identify the genes associated with PC.
Materials and Methods: In this retrospective study, 24 samples from the GSE14245 dataset were examined. Twelve
samples were from patients with PC, and 12 samples were from healthy control. After preprocessing and applying the
fold-change technique, 29482 genes were used. We used the KNN imputation method to impute when a particular
gene had MVs. Then, the genes most strongly associated with PC were selected using the random forest algorithm. We
classified the dataset using support vector machine (SVM) and naïve bayes (NB) classifiers, and F-score and Jaccard
indices were reported.
Results: Out of the 29482 genes, 1185 genes with fold-changes greater than 3 were selected. After selecting the most
associated genes, 21 genes with the most important value were identified. S100P and GPX3 had the highest and
lowest importance values, respectively. The F-score and Jaccard value of the SVM and NB classifiers were 95.5, 93,
92, and 92 percent, respectively.
Conclusion: This study is based on the application of the fold change technique, imputation method, and random
forest algorithm and could find the most associated genes that were not identified in many studies. We therefore
suggest researchers use the random forest algorithm to detect the related genes within the disease of interest.
کلیدواژه‌های انگلیسی مقاله Classification, Microarray Analysis, Neoplasms, Pancreas

نویسندگان مقاله Niloofar Rabiei |
Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran

Ali Reza Soltanian |
Modeling of Noncommunicable Diseases Research Center, Hamadan University of Medical Sciences, Hamadan, Iran

Maryam Farhadian |
Research Center for Health Sciences, Hamadan University of Medical Sciences, Hamadan, Iran

Fatemeh Bahreini |
Department of Molecular Medicine and Genetics, School of Medicine, Hamadan University of Medical Sciences, Hamadan, Iran


نشانی اینترنتی https://www.celljournal.org/article_701828_d353673e2ff3e7778f08a76086db265f.pdf
فایل مقاله فایلی برای مقاله ذخیره نشده است
کد مقاله (doi)
زبان مقاله منتشر شده en
موضوعات مقاله منتشر شده
نوع مقاله منتشر شده
برگشت به: صفحه اول پایگاه   |   نسخه مرتبط   |   نشریه مرتبط   |   فهرست نشریات