(Publisher of Peer Reviewed Open Access Journals)

International Journal of Advanced Technology and Engineering Exploration (IJATEE)

ISSN (Print):2394-5443    ISSN (Online):2394-7454
Volume-8 Issue-82 September-2021
Full-Text PDF
Paper Title : A comparative performance of breast cancer classification using hyper-parameterized machine learning models
Author Name : Kristoffersen Edward Mayce R. Lomboy and Rowell M. Hernandez
Abstract :

Breast cancer is the second most common cancer and has the second-highest mortality rate in women among all cancer types. Accurate cancer diagnosis plays a great part in breast cancer treatment. The application of machine learning methods in cancer classification has grown popular and has provided an accurate classification of malignant (cancerous) and benign (non-cancerous) breast cancer. This paper presents the application of three machine learning methods to classify malignant and benign breast cancer. The three machine learning methods used in this study are Support Sector Machine (SVM), Logistic Regression (LR), and Neural Network (NN) for breast cancer classification. For each machine learning method, multiple models had been tested with every model having a unique set of parameter values. This study used the breast cancer Wisconsin diagnostic (BCWD) dataset. The performance of the models is evaluated using the k-fold cross-validation technique and confusion matrix. The result shows that SVM outperformed both LR and NN in terms of classification accuracy, precision, recall, and specificity with k-fold cross validation technique. On the other hand, when the train-test split was used to validate the proposed model, the NN outperformed both SVM and LR achieving accuracy of 99.4%.

Keywords : Breast cancer, Breast cancer Wisconsin (diagnostic) data set, Support vector machines, Logistic regression, Neural network.
Cite this article : Lomboy KE, Hernandez RM. A comparative performance of breast cancer classification using hyper-parameterized machine learning models. International Journal of Advanced Technology and Engineering Exploration. 2021; 8(82):1080-1101. DOI:10.19101/IJATEE.2021.874380.
References :
[1]Ghanbari A, Rahmatpour P, Hosseini N, Khalili M. Social determinants of breast cancer screening among married women: a cross-sectional study. Journal of Research in Health Sciences. 2020; 20(1):e00467.
[Crossref] [Google Scholar]
[2]Zitvogel L, Tesniere A, Kroemer G. Cancer despite immunosurveillance: immunoselection and immunosubversion. Nature Reviews Immunology. 2006; 6(10):715-27.
[Crossref] [Google Scholar]
[3]Rakoff-nahoum S. Cancer issue: why cancer and inflammation? The Yale Journal of Biology and Medicine. 2006; 79(3-4):123-30.
[Google Scholar]
[4]Akay MF. Support vector machines combined with feature selection for breast cancer diagnosis. Expert Systems with Applications. 2009; 36(2):3240-7.
[Crossref] [Google Scholar]
[5]Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians. 2018; 68(6):394-424.
[Crossref] [Google Scholar]
[6]Sun YS, Zhao Z, Yang ZN, Xu F, Lu HJ, Zhu ZY, et al. Risk factors and preventions of breast cancer. International Journal of Biological Sciences. 2017; 13(11):1387-97.
[Crossref] [Google Scholar]
[7]Drukteinis JS, Mooney BP, Flowers CI, Gatenby RA. Beyond mammography: new frontiers in breast cancer screening. The American Journal of Medicine. 2013; 126(6):472-9.
[Crossref] [Google Scholar]
[8]Mckinney SM, Sieniek M, Godbole V, Godwin J, Antropova N, Ashrafian H, et al. International evaluation of an AI system for breast cancer screening. Nature. 2020; 577(7788):89-94.
[Crossref] [Google Scholar]
[9]Sahu B, Mohanty S, Rout S. A hybrid approach for breast cancer classification and diagnosis. EAI Endorsed Transactions on Scalable Information Systems. 2019; 6(20):1-8.
[Google Scholar]
[10]Cho SB, Won HH. Machine learning in DNA microarray analysis for cancer classification. In proceedings of the first Asia-pacific bioinformatics conference on bioinformatics 2003 (pp. 189-98).
[Google Scholar]
[11]Liu Y. Active learning with support vector machine applied to gene expression data for cancer classification. Journal of Chemical Information and Computer Sciences. 2004; 44(6):1936-41.
[Crossref] [Google Scholar]
[12]Jordan MI, Mitchell TM. Machine learning: trends, perspectives, and prospects. Science. 2015; 349(6245):255-60.
[Crossref] [Google Scholar]
[13]Jin X, Xu A, Bie R, Guo P. Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles. In international workshop on data mining for biomedical applications 2006 (pp. 106-15). Springer, Berlin, Heidelberg.
[Google Scholar]
[14]Lavanya D, Rani DK. Analysis of feature selection with classification: breast cancer datasets. Indian Journal of Computer Science and Engineering. 2011; 2(5):756-63.
[Google Scholar]
[15]Salama GI, Abdelhalim M, Zeid MA. Breast cancer diagnosis on three different datasets using multi-classifiers. International Journal of Computer and Information Technology. 2012; 1(1):36-43.
[Google Scholar]
[16]Utomo CP, Kardiana A, Yuliwulandari R. Breast cancer diagnosis using artificial neural networks with extreme learning techniques. International Journal of Advanced Research in Artificial Intelligence. 2014; 3(7):10-4.
[Google Scholar]
[17]Obaid OI, Mohammed MA, Ghani MK, Mostafa A, Taha F. Evaluating the performance of machine learning techniques in the classification of Wisconsin breast cancer. International Journal of Engineering & Technology. 2018; 7(4.36):160-6.
[Crossref] [Google Scholar]
[18]Dhanya R, Paul IR, Akula SS, Sivakumar M, Nair JJ. A comparative study for breast cancer prediction using machine learning and feature selection. In international conference on intelligent computing and control systems 2019 (pp. 1049-55). IEEE.
[Crossref] [Google Scholar]
[19]Omondiagbe DA, Veeramani S, Sidhu AS. Machine learning classification techniques for breast cancer diagnosis. In IOP conference series: materials science and engineering 2019 (pp.1-16). IOP Publishing.
[Crossref] [Google Scholar]
[20]Gupta P, Garg S. Breast cancer prediction using varying parameters of machine learning models. Procedia Computer Science. 2020; 171:593-601.
[Crossref] [Google Scholar]
[21]Balaraman S. Comparison of classification models for breast cancer identification using google colab. Preprints 2020.
[Crossref] [Google Scholar]
[22]Laghmati S, Cherradi B, Tmiri A, Daanouni O, Hamida S. Classification of patients with breast cancer using neighbourhood component analysis and supervised machine learning techniques. In 3rd international conference on advanced communication technologies and networking 2020 (pp. 1-6). IEEE.
[Crossref] [Google Scholar]
[23]Durgesh KS, Lekha B. Data classification using support vector machine. Journal of Theoretical and Applied Information Technology. 2010; 12(1):1-7.
[Google Scholar]
[24]Matsumoto A, Aoki S, Ohwada H. Comparison of random forest and SVM for raw data in drug discovery: prediction of radiation protection and toxicity case study. International Journal of Machine Learning and Computing. 2016; 6(2):145-8.
[Crossref] [Google Scholar]
[25]Chai H, Huang HH, Jiang HK, Liang Y, Xia LY. Protein-protein interaction network construction for cancer using a new L1/2-penalized Net-SVM model. Genetics and molecular research: GMR. 2016; 15(3).
[Crossref] [Google Scholar]
[26]Tirzïte M, Bukovskis M, Strazda G, Jurka N, Taivans I. Detection of lung cancer with electronic nose and logistic regression analysis. Journal of Breath Research. 2018; 13(1):1-9.
[Google Scholar]
[27]Alarabeyyat A, Alhanahnah M. Breast cancer detection using k-nearest neighbor machine learning algorithm. In international conference on developments in eSystems engineering 2016 (pp. 35-9). IEEE.
[Crossref] [Google Scholar]
[28]Patrício M, Pereira J, Crisóstomo J, Matafome P, Gomes M, Seiça R, et al. Using resistin, glucose, age and BMI to predict the presence of breast cancer. BMC Cancer. 2018; 18(1):1-8.
[Google Scholar]
[29]Zhang YD, Satapathy SC, Guttery DS, Górriz JM, Wang SH. Improved breast cancer classification through combining graph convolutional network and convolutional neural network. Information Processing & Management. 2021; 58(2).
[Crossref] [Google Scholar]
[30]Mohammed MA, Al-khateeb B, Rashid AN, Ibrahim DA, Abd GMK, Mostafa SA. Neural network and multi-fractal dimension features for breast cancer classification from ultrasound images. Computers & Electrical Engineering. 2018; 70:871-82.
[Crossref] [Google Scholar]
[31]Higa A. Diagnosis of breast cancer using decision tree and artificial neural network algorithms. International Journal of Computer Applications Technology and Research. 2018;7(1): 23-7.
[Google Scholar]
[32]Vijayakumar T. Neural network analysis for tumor investigation and cancer prediction. Journal of Electronics. 2019; 1(2):89-98.
[Crossref] [Google Scholar]
[33]http://archive.ics.uci.edu/ml. Accessed 26 May 2021.
[34]Hazra A, Mandal SK, Gupta A. Study and analysis of breast cancer cell detection using naïve bayes, SVM and ensemble algorithms. International Journal of Computer Applications. 2016; 145(2):39-45.
[Google Scholar]
[35]Seddik AF, Shawky DM. Logistic regression model for breast cancer automatic diagnosis. In SAI intelligent systems conference 2015 (pp. 150-4). IEEE.
[Crossref] [Google Scholar]
[36]Thein HT, Tun KM. An approach for breast cancer diagnosis classification using neural network. Advanced Computing. 2015; 6(1):1-11.
[Google Scholar]
[37]Ukil A. Support vector machine. In intelligent systems and signal processing in power engineering 2007 (pp. 161-226). Springer, Berlin, Heidelberg.
[Crossref] [Google Scholar]
[38]Cruz JA, Wishart DS. Applications of machine learning in cancer prediction and prognosis. Cancer Informatics. 2006; 2:59-78.
[Google Scholar]
[39]Byvatov E, Schneider G. Support vector machine applications in bioinformatics. Applied Bioinformatics. 2003; 2(2):67-77.
[Google Scholar]
[40]Pisner DA, Schnyer DM. Support vector machine. In Machine Learning 2020 (pp. 101-21). Academic Press.
[Crossref] [Google Scholar]
[41]Bayrak EA, Kırcı P, Ensari T. Comparison of machine learning methods for breast cancer diagnosis. In scientific meeting on electrical-electronics & biomedical engineering and computer science 2019 (pp. 1-13). IEEE.
[Crossref] [Google Scholar]
[42]Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI. Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal. 2015; 13:8-17.
[Crossref] [Google Scholar]
[43]Hussain M, Wajid SK, Elzaart A, Berbar M. A comparison of SVM kernel functions for breast cancer detection. In eighth international conference computer graphics, imaging and visualization 2011 (pp. 145-50). IEEE.
[Crossref] [Google Scholar]
[44]Cherkassky V, Ma Y. Practical selection of SVM parameters and noise estimation for SVM regression. Neural Networks. 2004; 17(1):113-26.
[Crossref] [Google Scholar]
[45]Lin SW, Lee ZJ, Chen SC, Tseng TY. Parameter determination of support vector machine and feature selection using simulated annealing approach. Applied Soft Computing. 2008; 8(4):1505-12.
[Crossref] [Google Scholar]
[46]Ruiz A, Villa N. Storms prediction: logistic regression vs random forest for unbalanced data. arXiv preprint arXiv:0804.0650. 2008.
[Google Scholar]
[47]Yusuff H, Mohamad N, Ngah UK, Yahaya A. Breast cancer analysis using logistic regression. International Journal of Research and Reviews in Applied Sciences. 2012; 10(1):14-22.
[Google Scholar]
[48]Murtirawat R, Panchal S, Singh VK, Panchal Y. Breast cancer detection using k-nearest neighbors, logistic regression and ensemble learning. In international conference on electronics and sustainable communication systems 2020 (pp. 534-40). IEEE.
[Crossref] [Google Scholar]
[49]Graja O, Azam M, Bouguila N. Breast cancer diagnosis using quality control charts and logistic regression. In 9th international symposium on signal, image, video and communications 2018 (pp. 215-20). IEEE.
[Crossref] [Google Scholar]
[50]Sharma A, Kulshrestha S, Daniel S. Machine learning approaches for breast cancer diagnosis and prognosis. In international conference on soft computing and its engineering applications 2017 (pp. 1-5). IEEE.
[Crossref] [Google Scholar]
[51]Goodman J. Exponential priors for maximum entropy models. In proceedings of the human language technology conference of the north american chapter of the association for computational linguistics: HLT-NAACL 2004 (pp. 305-12).
[Google Scholar]
[52]Lee SI, Lee H, Abbeel P, Ng AY. Efficient l~ 1 regularized logistic regression. In AAAI 2006 (pp. 401-8).
[Google Scholar]
[53]Salehi F, Abbasi E, Hassibi B. The impact of regularization on high-dimensional logistic regression. arXiv preprint arXiv:1906.03761. 2019.
[Google Scholar]
[54]Ng AY. Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In proceedings of the twenty-first international conference on machine learning 2004 (p. 78).
[Crossref] [Google Scholar]
[55]Demir-Kavuk O, Kamada M, Akutsu T, Knapp EW. Prediction using step-wise L1, L2 regularization and feature selection for small data sets with large number of features. BMC Bioinformatics. 2011; 12(1):1-10.
[Google Scholar]
[56]Jaiswal S, Mehta A, Nandi GC. Investigation on the effect of L1 an L2 regularization on image features extracted using restricted boltzmann machine. In second international conference on intelligent computing and control systems 2018 (pp. 1548-53). IEEE.
[Crossref] [Google Scholar]
[57]Li M, Nanda G, Chhajedss S, Sundararajan R. Machine learning-based decision support system for early detection of breast cancer. Indian Journal of Pharmaceutical Education and Research. 2020; 54(3):S705- 15.
[Crossref] [Google Scholar]
[58]Tian H, Cai H, Wen J, Li S, Li Y. A music recommendation system based on logistic regression and eXtreme gradient boosting. In international joint conference on neural networks 2019 (pp. 1-6). IEEE.
[Crossref] [Google Scholar]
[59]Floyd JCE, Lo JY, Yun AJ, Sullivan DC, Kornguth PJ. Prediction of breast cancer malignancy using an artificial neural network. Cancer: Interdisciplinary International Journal of the American Cancer Society. 1994; 74(11):2944-8.
[Google Scholar]
[60]Karabatak M, Ince MC. An expert system for detection of breast cancer based on association rules and neural network. Expert systems with Applications. 2009; 36(2):3465-9.
[Crossref] [Google Scholar]
[61]Heidari AA, Faris H, Aljarah I, Mirjalili S. An efficient hybrid multilayer perceptron neural network with grasshopper optimization. Soft Computing. 2019; 23(17):7941-58.
[Crossref] [Google Scholar]
[62]Bui DT, Tuan TA, Klempe H, Pradhan B, Revhaug I. Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides. 2016; 13(2):361-78.
[Crossref] [Google Scholar]
[63]Pham BT, Nguyen MD, Bui KT, Prakash I, Chapi K, Bui DT. A novel artificial intelligence approach based on multi-layer perceptron neural network and biogeography-based optimization for predicting coefficient of consolidation of soil. Catena. 2019; 173:302-11.
[Crossref] [Google Scholar]
[64]Sharma S, Sharma S. Activation functions in neural networks. Towards Data Science. 2017; 6(12):310-6.
[Google Scholar]
[65]Karlik B, Olgac AV. Performance analysis of various activation functions in generalized MLP architectures of neural networks. International Journal of Artificial Intelligence and Expert Systems. 2011; 1(4):111-22.
[Google Scholar]
[66]Liao TW, Chen LJ. A neural network approach for grinding processes: modelling and optimization. International Journal of Machine Tools and Manufacture. 1994; 34(7):919-37.
[Crossref] [Google Scholar]
[67]Amrane M, Oukid S, Gagaoua I, Ensari T. Breast cancer classification using machine learning. In electric electronics, computer science, biomedical engineering meeting 2018 (pp. 1-4). IEEE.
[Crossref] [Google Scholar]
[68]Faraggi D, Simon R. A simulation study of cross‐validation for selecting an optimal cutpoint in univariate survival analysis. Statistics in Medicine. 1996; 15(20):2203-13.
[Crossref] [Google Scholar]
[69]Nematzadeh Z, Ibrahim R, Selamat A. Comparative studies on breast cancer classifications with k-fold cross validations using machine learning techniques. In Asian control conference 2015 (pp. 1-6). IEEE.
[Crossref] [Google Scholar]
[70]Mojarad SA, Dlay SS, Woo WL, Sherbet GV. Breast cancer prediction and cross validation using multilayer perceptron neural networks. In international symposium on communication systems, networks & digital signal processing 2010 (pp. 760-4). IEEE.
[Crossref] [Google Scholar]
[71]Kumar GR, Ramachandra GA, Nagamani K. An efficient prediction of breast cancer data using data mining techniques. International Journal of Innovations in Engineering and Technology. 2013; 2(4):139-44.
[Google Scholar]
[72]Alakus TB, Turkoglu I. Comparison of deep learning approaches to predict COVID-19 infection. Chaos, Solitons & Fractals. 2020; 140:110120.
[Crossref] [Google Scholar]
[73]Vabalas A, Gowen E, Poliakoff E, Casson AJ. Machine learning algorithm validation with a limited sample size. PloS one. 2019; 14(11):1-20.
[Crossref] [Google Scholar]
[74]De MBAF, Miraglia JL, Donato TH, Chiavegatto FAD. COVID-19 diagnosis prediction in emergency care patients: a machine learning approach. medRxiv. 2020.
[Crossref] [Google Scholar]
[75]Chen HL, Yang B, Liu J, Liu DY. A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis. Expert Systems with Applications. 2011; 38(7):9014-22.
[Crossref] [Google Scholar]
[76]Hernandez RM, Hernandez AA. Classification of Nile Tilapia using convolutional neural network. In 9th international conference on system engineering and technology 2019 (pp. 126-31). IEEE.
[Crossref] [Google Scholar]