(Publisher of Peer Reviewed Open Access Journals)

International Journal of Advanced Computer Research (IJACR)

ISSN (Print):2249-7277    ISSN (Online):2277-7970
Volume-12 Issue-61 July-2022
Full-Text PDF
Paper Title : Improving medical diagnostics with machine learning: a study on data classification algorithms
Author Name : Abhishek Kumar and Sujeet Gautam
Abstract :

This paper investigates the effectiveness of the logistic regression (LR) and random forest (RF) algorithms for classifying breast cancer using the Breast Cancer Wisconsin Dataset, consisting of 699 instances and 10 attributes. After pre-processing the data and performing feature extraction to retain relevant information, the dataset is split into training, validation, and test portions to evaluate the LR and RF algorithms. The LR algorithm achieves an accuracy level ranging from 96% to 97% across different split ratios, and its error rate decreases with larger training sets. The RF algorithm achieves an accuracy level ranging from 96% to 98% across different split ratios. The results indicate that both algorithms are effective for classifying the data, and the figures highlight the impact of different split ratios on accuracy and error rate. Proper selection of the split ratio is essential for obtaining reliable results.

Keywords : LR, RF, Machine learning, Data selection.
Cite this article : Kumar A, Gautam S. Improving medical diagnostics with machine learning: a study on data classification algorithms . International Journal of Advanced Computer Research. 2022; 12(61):31-42. DOI:10.19101/IJACR.2021.1152067.
References :
[1]Abideen ZU, Mazhar T, Razzaq A, Haq I, Ullah I, Alasmary H, et al. Analysis of enrollment criteria in secondary schools using machine learning and data mining approach. Electronics. 2023; 12(3):1-25.
[Crossref] [Google Scholar]
[2]Suiçmez Ç, Yılmaz C, Kahraman HT, Cengiz E, Suiçmez A. Prediction of hepatitis C disease with different machine learning and data mining technique. In smart applications with advanced machine learning and human-centred problem design 2023(pp. 375-98). Cham: Springer International Publishing.
[Crossref] [Google Scholar]
[3]Dubey AK, Gupta U, Jain S. Comparative study of K-means and fuzzy C-means algorithms on the breast cancer data. International Journal on Advanced Science, Engineering and Information Technology. 2018; 8(1):18-29.
[Google Scholar]
[4]Hussin SK, Omar YM, Abdelmageid SM, Marie MI. Traditional machine learning and big data analytics in virtual screening: a comparative study. International Journal of Advanced Computer Research. 2020; 10(47):72-88.
[Crossref] [Google Scholar]
[5]Mumtaz G, Akram S, Iqbal W, Ashraf MU, Almarhabi KA, Alghamdi AM, et al. Classification and prediction of significant cyber incidents (SCI) using data mining and machine learning (DM-ML). IEEE Access. 2023.
[Crossref] [Google Scholar]
[6]Sanjeetha R, Raj A, Saivenu K, Ahmed MI, Sathvik B, Kanavalli A. Detection and mitigation of botnet based DDoS attacks using catboost machine learning algorithm in SDN environment. International Journal of Advanced Technology and Engineering Exploration. 2021; 8(76):445-61.
[Crossref] [Google Scholar]
[7]Saha JK, Patidar K, Kushwah R, Saxena G. Object oriented quality prediction through artificial intelligence and machine learning: a survey. ACCENTS Transactions on Information Security. 2020; 5(17): 1-5.
[Google Scholar]
[8]Dubey AK, Gupta U, Jain S. Computational measure of cancer using data mining and optimization. In sustainable communication networks and application: ICSCN 2019 2020 (pp. 626-32). Springer International Publishing.
[Crossref] [Google Scholar]
[9]Mohammady M. Badland erosion susceptibility mapping using machine learning data mining techniques, Firozkuh watershed, Iran. Natural Hazards. 2023:1-9.
[Crossref] [Google Scholar]
[10]Nemade V, Pathak S, Dubey AK. A systematic literature review of breast cancer diagnosis using machine intelligence techniques. Archives of Computational Methods in Engineering. 2022; 29(6):4401-30.
[Crossref] [Google Scholar]
[11]Ashtiani MN, Raahmei B. News-based intelligent prediction of financial markets using text mining and machine learning: a systematic literature review. Expert Systems with Applications. 2023.
[Crossref] [Google Scholar]
[12]Kannan R, Nandwana P. Accelerated alloy discovery using synthetic data generation and data mining. Scripta Materialia. 2023.
[Crossref] [Google Scholar]
[13]Sher T, Rehman A, Kim D. COVID-19 outbreak prediction by using machine learning algorithms. Computers, Materials and Continua. 2023:1561-74.
[Google Scholar]
[14]Dubey A, Gupta U, Jain S. Medical data clustering and classification using TLBO and machine learning algorithms. Computers, Materials and Continua. 2021; 70(3):4523-43.
[Crossref] [Google Scholar]
[15]Nemade V, Pathak S, Dubey AK, Barhate D. A review and computational analysis of breast cancer using different machine learning techniques. International Journal of Emerging Technology and Advanced Engineering. 2022; 12(3):111-8.
[Crossref] [Google Scholar]
[16]Mahoto NA, Shaikh A, Sulaiman A, Al Reshan MS, Rajab A, Rajab K. A machine learning based data modeling for medical diagnosis. Biomedical Signal Processing and Control. 2023.
[Crossref] [Google Scholar]
[17]Cheng LC, Lu WT, Yeo B. Predicting abnormal trading behavior from internet rumor propagation: a machine learning approach. Financial Innovation. 2023; 9(1).
[Crossref] [Google Scholar]
[18]Chahar R, Dubey AK, Narang SK. A review and meta-analysis of machine intelligence approaches for mental health issues and depression detection. International Journal of Advanced Technology and Engineering Exploration. 2021; 8(83):1279-314.
[Crossref] [Google Scholar]
[19]Ananthi J, Sengottaiyan N, Anbukaruppusamy S, Upreti K, Dubey AK. Forest fire prediction using IoT and deep learning. International Journal of Advanced Technology and Engineering Exploration. 2022; 9(87):246-56.
[Crossref] [Google Scholar]
[20]Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine. 2001; 23(1):89-109.
[Crossref] [Google Scholar]
[21]Kamra V, Kumar P, Mohammadian M. Formulation of an elegant diagnostic approach for an intelligent disease recommendation system. In 9th international conference on cloud computing, data science & engineering (Confluence) 2019 (pp. 278-81). IEEE.
[Crossref] [Google Scholar]
[22]Xiang Z, Jinghua C, Tao W. Review of machine learning algorithms for health-care management medical big data systems. In international conference on inventive computation technologies (ICICT) 2020 (pp. 651-4). IEEE.
[Crossref] [Google Scholar]
[23]Juddoo S, George C. A qualitative assessment of machine learning support for detecting data completeness and accuracy issues to improve data analytics in big data for the healthcare industry. In 3rd international conference on emerging trends in electrical, electronic and communications engineering (ELECOM) 2020 (pp. 58-66). IEEE.
[Crossref] [Google Scholar]
[24]Leung CK, Chen Y, Hoi CS, Shang S, Cuzzocrea A. Machine learning and OLAP on big COVID-19 data. In IEEE international conference on big data (Big Data) 2020 (pp. 5118-27). IEEE.
[Crossref] [Google Scholar]
[25]Jayatilake SM, Ganegoda GU. Involvement of machine learning tools in healthcare decision making. Journal of healthcare engineering. 2021:1-20.
[Crossref] [Google Scholar]
[26]Tchito Tchapga C, Mih TA, Tchagna Kouanou A, Fozin Fonzin T, Kuetche Fogang P, Mezatio BA, et al. Biomedical image classification in a big data architecture using machine learning algorithms. Journal of Healthcare Engineering. 2021; 2021:1-11.
[Crossref] [Google Scholar]
[27]Chahar R. Computational decision support system in healthcare: a review and analysis. International Journal of Advanced Technology and Engineering Exploration. 2021; 8(75):199-220.
[Crossref] [Google Scholar]
[28]Mustafa A, Rahimi Azghadi M. Automated machine learning for healthcare and clinical notes analysis. Computers. 2021; 10(2):1-31.
[Crossref] [Google Scholar]
[29]Aldahiri A, Alrashed B, Hussain W. Trends in using IoT with machine learning in health prediction system. Forecasting. 2021; 3(1):181-206.
[Crossref] [Google Scholar]
[30]Vokinger KN, Feuerriegel S, Kesselheim AS. Mitigating bias in machine learning for medicine. Communications Medicine. 2021; 1(1).
[Crossref] [Google Scholar]
[31]Rafi TH, Shubair RM, Farhan F, Hoque MZ, Quayyum FM. Recent advances in computer-aided medical diagnosis using machine learning algorithms with optimization techniques. IEEE Access. 2021; 9:137847-68.
[Crossref] [Google Scholar]
[32]Sun W, Zhang P, Wang Z, Li D. Prediction of cardiovascular diseases based on machine learning. ASP Transactions on Internet of Things. 2021; 1(1):30-5.
[Crossref] [Google Scholar]
[33]Dhinakaran M, Phasinam K, Alanya-Beltran J, Srivastava K, Babu DV, Singh SK. A system of remote patients’ monitoring and alerting using the machine learning technique. Journal of Food Quality. 2022:1-7.
[Crossref] [Google Scholar]
[34]Elyan E, Vuttipittayamongkol P, Johnston P, Martin K, McPherson K, Jayne C, et al. Computer vision and machine learning for medical image analysis: recent advances, challenges, and way forward. Artificial Intelligence Surgery. 2022:1-25.
[Crossref] [Google Scholar]
[35] Hinterwimmer F, Lazic I, Suren C, Hirschmann MT, Pohlig F, Rueckert D, et al. Machine learning in knee arthroplasty: specific data are key-a systematic review. Knee Surgery, Sports Traumatology, Arthroscopy. 2022; 30(2):376-88.
[Crossref] [Google Scholar]
[36]Zhang A, Xing L, Zou J, Wu JC. Shifting machine learning for healthcare from development to deployment and from models to data. Nature Biomedical Engineering. 2022:1-6.
[Crossref] [Google Scholar]
[37]Severn C, Suresh K, Görg C, Choi YS, Jain R, Ghosh D. A pipeline for the implementation and visualization of explainable machine learning for medical imaging using radiomics features. Sensors. 2022; 22(14):1-16.
[Crossref] [Google Scholar]
[38]Zhu S, Gilbert M, Chetty I, Siddiqui F. The 2021 landscape of FDA-approved artificial intelligence/machine learning-enabled medical devices: an analysis of the characteristics and intended use. International Journal of Medical Informatics. 2022.
[Crossref] [Google Scholar]
[39]Kobashi S, Hossain B, Nii M, Kambara S, Morooka T, Okuno M, Yoshiya S. Prediction of post-operative implanted knee function using machine learning in clinical big data. In 2016 international conference on machine learning and cybernetics (ICMLC) 2016 (pp. 195-200). IEEE.
[Crossref] [Google Scholar]
[40]Lu YC, Lu CJ, Chang CC, Lin YW. A hybrid of data mining and ensemble learning forecasting for recurrent ovarian cancer. In 2017 international conference on intelligent informatics and biomedical sciences (ICIIBMS) 2017 (pp. 216-6). IEEE.
[Crossref] [Google Scholar]
[41]Pitoglou S, Koumpouros Y, Anastasiou A. Using electronic health records and machine learning to make medical-related predictions from non-medical data. In international conference on machine learning and data engineering (iCMLDE) 2018 (pp. 56-60). IEEE.
[Crossref] [Google Scholar]
[42]Reamaroon N, Sjoding MW, Lin K, Iwashyna TJ, Najarian K. Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE Journal of Biomedical and Health Informatics. 2018; 23(1):407-15.
[Crossref] [Google Scholar]
[43]Liu Y, Leng Q, Wang S. Learning medical diagnosis via scaled convex hull-based SK algorithm. In 8th data driven control and learning systems conference (DDCLS) 2019 (pp. 377-81). IEEE.
[Crossref] [Google Scholar]
[44]Chang W, Liu Y, Xiao Y, Yuan X, Xu X, Zhang S, Zhou S. A machine-learning-based prediction method for hypertension outcomes based on medical data. Diagnostics. 2019; 9(4):1-21.
[Crossref] [Google Scholar]
[45]Khushi M, Shaukat K, Alam TM, Hameed IA, Uddin S, Luo S, et al. A comparative performance analysis of data resampling methods on imbalance medical data. IEEE Access. 2021; 9:109960-75.
[Crossref] [Google Scholar]
[46]Yang H, Li X, Cao H, Cui Y, Luo Y, Liu J, Zhang Y. Using machine learning methods to predict hepatic encephalopathy in cirrhotic patients with unbalanced data. Computer Methods and Programs in Biomedicine. 2021.
[Crossref] [Google Scholar]
[47]Bharti R, Khamparia A, Shabaz M, Dhiman G, Pande S, Singh P. Prediction of heart disease using a combination of machine learning and deep learning. Computational Intelligence and Neuroscience. 2021:1-11.
[Crossref] [Google Scholar]
[48]Ram A, Vishwakarma H. Diabetes prediction using machine learning and data mining methods. In IOP conference series: materials science and engineering 2021 (pp. 1-11). IOP Publishing.
[Crossref] [Google Scholar]
[49]Khan S, Saravanan VN, Lakshmi TJ, Deb N, Othman NA. Privacy protection of healthcare data over social networks using machine learning algorithms. Computational Intelligence and Neuroscience. 2022:1-8.
[Crossref] [Google Scholar]
[50]Urban S, Błaziak M, Jura M, Iwanek G, Zdanowicz A, Guzik M, et al. Novel phenotyping for acute heart failure-unsupervised machine learning-based approach. Biomedicines. 2022; 10(7):1-20.
[Crossref] [Google Scholar]
[51]Lee KH, Dong JJ, Kim S, Kim D, Hyun JH, Chae MH, et al. Prediction of bacteremia based on 12-year medical data using a machine learning approach: effect of medical data by extraction time. Diagnostics. 2022; 12(1):1-13.
[Crossref] [Google Scholar]
[52]Ahmad GN, Fatima H, Ullah S, Saidi AS. Efficient medical diagnosis of human heart diseases using machine learning techniques with and without GridSearchCV. IEEE Access. 2022; 10:80151-73.
[Crossref] [Google Scholar]
[53]Dong Z, Wang Q, Ke Y, Zhang W, Hong Q, Liu C, et al. Prediction of 3-year risk of diabetic kidney disease using machine learning based on electronic medical records. Journal of Translational Medicine. 2022; 20(1):1-10.
[Crossref] [Google Scholar]
[54]Tanioka S, Yago T, Tanaka K, Ishida F, Kishimoto T, Tsuda K, et al. Machine learning prediction of hematoma expansion in acute intracerebral hemorrhage. Scientific Reports. 2022; 12(1):1-8.
[Crossref] [Google Scholar]