(Publisher of Peer Reviewed Open Access Journals)

International Journal of Advanced Technology and Engineering Exploration (IJATEE)

ISSN (Print):2394-5443    ISSN (Online):2394-7454
Volume-7 Issue-73 December-2020
Full-Text PDF
Paper Title : An efficient distance estimation and centroid selection based on k-means clustering for small and large dataset
Author Name : Girdhar Gopal Ladha and Ravi Kumar Singh Pippal
Abstract :

In this paper an efficient distance estimation and centroid selection based on k-means clustering for small and large dataset. Data pre-processing was performed first on the dataset. For the complete study and analysis PIMA Indian diabetes dataset was considered. After pre-processing distance and centroid estimation was performed. It includes initial selection based on randomization and then centroids updations were performed till the iterations or epochs determined. Distance measures used here are Euclidean distance (Ed), Pearson Coefficient distance (PCd), Chebyshev distance (Csd) and Canberra distance (Cad). The results indicate that all the distance algorithms performed approximately well in case of clustering but in terms of time Cad outperforms in comparison to other algorithms.

Keywords : K-means, Distance estimation, Centroid selection, Distance methods.
Cite this article : Ladha GG, Pippal RK. An efficient distance estimation and centroid selection based on k-means clustering for small and large dataset. International Journal of Advanced Technology and Engineering Exploration. 2020; 7(73):234-240. DOI:10.19101/IJATEE.2020.762109.
References :
[1]Fard MM, Thonet T, Gaussier E. Deep k-means: Jointly clustering with k-means and learning representations. Pattern Recognition Letters. 2020;138:185-92.
[Crossref] [Google Scholar]
[2]Tavse P, Khandelwal A. An Efficient K-means Clustering approach in Wireless Network for data sharing. International Journal of Advanced Technology and Engineering Exploration. 2015; 2(2):9-16.
[Google Scholar]
[3]Dubey AK, Gupta U, Jain S. Analysis of k-means clustering approach on the breast cancer Wisconsin dataset. International Journal of Computer Assisted Radiology and Surgery. 2016; 11(11):2033-47.
[Crossref] [Google Scholar]
[4]Pan Q, Xiang L, Jin Y. Rare association rules mining of diabetic complications based on improved rarity algorithm. In international conference on bioinformatics and computational biology 2019 (pp. 115-9). IEEE.
[Crossref] [Google Scholar]
[5]Cios KJ, Moore GW. Uniqueness of medical data mining. Artificial Intelligence in Medicine. 2002; 26(1-2):1-24.
[Crossref] [Google Scholar]
[6]Chahar R, Kaur D. A systematic review of the machine learning algorithms for the computational analysis in different domains. International Journal of Advanced Technology and Engineering Exploration. 2020; 7 (71): 147-64.
[Crossref]
[7]Aljumah AA, Ahamad MG, Siddiqui MK. Application of data mining: diabetes health care in young and old patients. Journal of King Saud University-Computer and Information Sciences. 2013; 25(2):127-36.
[Crossref] [Google Scholar]
[8]Kumari I, Sharma V. A review for the efficient clustering based on distance and the calculation of centroid. International Journal of Advanced Technology and Engineering Exploration. 2020; 7(63):48-52.
[Crossref] [Google Scholar]
[9]Dubey AK, Gupta U, Jain S. Comparative study of K-means and fuzzy C-means algorithms on the breast cancer data. International Journal on Advanced Science, Engineering and Information Technology. 2018; 8(1):18-29.
[Google Scholar]
[10]Pebesma J, Martinez-Millana A, Sacchi L, Fernandez-Llatas C, De Cata P, Chiovato L, et al. Clustering cardiovascular risk trajectories of patients with type 2 diabetes using process mining. In annual international conference of the engineering in medicine and biology society 2019 (pp. 341-4). IEEE.
[Crossref] [Google Scholar]
[11]Iyer A, Jeyalatha S, Sumbaly R. Diagnosis of diabetes using classification mining techniques. arXiv preprint arXiv:1502.03774. 2015.
[Google Scholar]
[12]Hao J, Zheng Y, Xu C, Yan Z, Li H. Feature assessment and classification of diabetes employing concept lattice. In 23rd international conference on computer supported cooperative work in design 2019 (pp. 333-8). IEEE.
[Crossref] [Google Scholar]
[13]Yaacob H, Omar H, Handayani D, Hassan R. Emotional profiling through supervised machine learning of interrupted EEG interpolation. International Journal of Advanced Computer Research. 2019; 9(43):242-51.
[Crossref] [Google Scholar]
[14]Syafitri N, Labellapansa A, Kadir EA, Saian R, Zahari NN, Anwar NH, Shaharuddin NE. Early detection of fire hazard using fuzzy logic approach. International Journal of Advanced Computer Research. 2019; 9(43):252-9.
[Crossref] [Google Scholar]
[15]Abood LH, Karam EH, Issa AH. Design of adaptive neuro sliding mode controller for anesthesia drug delivery based on biogeography based optimization. International Journal of Advanced Computer Research. 2019; 9(42):146-55.
[Crossref] [Google Scholar]
[16]Wang F, Wang Q, Nie F, Li Z, Yu W, Ren F. A linear multivariate binary decision tree classifier based on K-means splitting. Pattern Recognition. 2020; 107:107521.
[Crossref] [Google Scholar]
[17]Wu H, Yang S, Huang Z, He J, Wang X. Type 2 diabetes mellitus prediction model based on data mining. Informatics in Medicine Unlocked. 2018; 10:100-7.
[Crossref] [Google Scholar]
[18]Dubey AK. An efficient variable distance measure k-means [VDMKM] algorithm for cluster head selection in WSN. International Journal of Innovative Technology and Exploring Engineering. 2019; 9(1):87-92.
[Crossref] [Google Scholar]
[19]Mahajan A, Kumar S, Bansal R. Diagnosis of diabetes mellitus using PCA and genetically optimized neural network. In international conference on computing, communication and automation 2017 (pp. 334-8). IEEE.
[Crossref] [Google Scholar]
[20]Jasim IS, Duru AD, Shaker K, Abed BM, Saleh HM. Evaluation and measuring classifiers of diabetes diseases. In international conference on engineering and technology 2017 (pp. 1-4). IEEE.
[Crossref] [Google Scholar]
[21]Kalyankar GD, Poojara SR, Dharwadkar NV. Predictive analysis of diabetic patient data using machine learning and Hadoop. In international conference on I-SMAC (IoT in social, mobile, analytics and cloud) (I-SMAC) 2017 (pp. 619-24). IEEE.
[Crossref] [Google Scholar]
[22]Kaur H, Batra S. HPCC: An ensembled framework for the prediction of the onset of diabetes. In 4th international conference on signal processing, computing and control (ISPCC) 2017 (pp. 216-22). IEEE.
[Crossref] [Google Scholar]
[23]Kaur P, Sharma N, Singh A, Gill B. CI-DPF: A cloud IoT based framework for diabetes prediction. In annual information technology, electronics and mobile communication conference 2018 (pp. 654-60). IEEE.
[Crossref] [Google Scholar]
[24]Huang L, Lu C. Intelligent diagnosis of diabetes based on information gain and deep neural network. In international conference on cloud computing and intelligence systems 2018 (pp. 493-6). IEEE.
[Crossref] [Google Scholar]
[25]Kohli PS, Arora S. Application of machine learning in disease prediction. In international conference on computing communication and automation 2018 (pp. 1-4). IEEE.
[Crossref] [Google Scholar]
[26]Rani S, Kautish S. Association clustering and time series based data mining in continuous data for diabetes prediction. In second international conference on intelligent computing and control systems (ICICCS) 2018 (pp. 1209-14). IEEE.
[Crossref] [Google Scholar]
[27]Li Y, Ye H. An analysis and research of type-2 diabetes TCM records based on text mining. In international conference on bioinformatics and biomedicine 2018 (pp. 1872-5). IEEE.
[Crossref] [Google Scholar]
[28]Guttikonda G, Katamaneni M, Pandala M. Diabetes Data Prediction Using Spark and Analysis in Hue Over Big Data. In international conference on computing methodologies and communication 2019 (pp. 1112-17). IEEE.
[Crossref] [Google Scholar]
[29]Kim HS, Yi C, Kim Y, Park U, Kook W, Oh B, Kim H, Park T. Topological data analysis can extract sub-groups with high incidence rates of Type 2 diabetes. International Journal of Data Mining and Bioinformatics. 2019; 22(1):44-60.
[Crossref] [Google Scholar]
[30]Karthikeyan R, Geetha P, Ramaraj E. Rule Based System for Better Prediction of Diabetes. In 3rd international conference on computing and communications technologies 2019 (pp. 195-203). IEEE.
[Crossref] [Google Scholar]
[31]Devasena MG, Grace RK, Gopu G. PDD: predictive diabetes diagnosis using datamining algorithms. In international conference on computer communication and informatics 2020 (pp. 1-4). IEEE.
[Crossref] [Google Scholar]