An Improved K-Means Clustering Algorithm

Authors(2) :-Ekta Joshi, Dr. D. A. Parikh

This Vast spread of computing technologies has led to abundance of large data sets. Thus, there is a need to find similarities and define groupings among the elements of these big data sets. One of the ways to find these similarities is data clustering. Currently, there exist several data clustering algorithms which differ by their application area and efficiency. Increase in computational power and algorithmic improvements have reduced the time for clustering of big data sets. But it usually happens that big data sets canít be processed whole due to hardware and computational restrictions. Clustering techniques, like K-Means are useful in analyzing data in a parallel fashion. K-Means largely depends upon a proper initialization to produce optimal results.

Authors and Affiliations

Ekta Joshi
Computer Engineering, L.D. College of Engineering, Ahmedabad, India
Dr. D. A. Parikh
HOD Computer Engineering, L.D. College of Engineering, Ahmedabad, India

K means, Clustering, Data Mining, Big Data.

  1. Anu Saini, G. B. Pant ,Jaypriya Ubriani “New Approach for Clustering of Big Data: DisK-Means”, 2016 IEEE ,International Conference on Computing, Communication and Automation ,pp 122-126;
  2. Kun niu, zhipeng gao,haizhen jaog ,haijie deng “K-mean+:a developed clustering algorithm for big data”, 2016 IEEE , Proceedings of CCIS2016,pp 141-144;
  3. Vadlana Baby,Dr. N. Subhash Chandra “Distributed threshold k-means clustering for privacy preserving data mining”,2016 IEEE,Conference on Advances in Computing, Communications and Informatics (ICACCI);
  4. Rasim Alguliyev , Ramiz Aliguliyev , Adil Bagirov , Rafael Karimov “Batch Clustering Algorithm for Big Data Sets”;
  5. Caiquan Xiong, Zhen Hua, Ke Lv, Wuhan Hubei ,“An Improved K-means text clustering algorithm By Optimizing initial cluster centers”, 2016 IEEE, International Conference on Cloud Computing and Big Data,pp 265-268;
  6. Jiawei Han, Jian Pei, Micheline Kamber “Data Mining: Concepts and Techniques” 3rd edition;
  7. Vu Viet Thang, D.V. Pantiukhin, A.I. Galushkin “A hybrid clustering algorithm : the FastDBSCAN” 2015 International Conference on Engineering and Telecommunication,pp 69-74;
  8. Tahereh Kamali, Daniel Stashuk “A Density-Based Clustering Approach to Motor Unit Potential Characterizations to Support Diagnosis of Neuromuscular Disorders” 2016 IEEE Transactions on Neural Systems and Rehabilitation Engineering ;
  9. Bin Jiang, Jian Pei, Yufei Tao and Xuemin Lin, Member, IEEE “Clustering Uncertain Data Based on Probability Distribution Similarity” IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 4, APRIL 2013;
  10. Chang Lu , Yueting Shi, Yueyang Chen, Shiqi Bao, Lixing Tang “Data Mining Applied to Oil Well Using K-means and DBSCAN” 2016 7th International Conference on Cloud Computing and Big Data;
  11. Jianbing Shen, Xiaopeng Hao, Zhiyuan Liang, Yu Liu, Wenguan Wang,and Ling Shao, Member, IEEE “Real-time Superpixel Segmentation by DBSCAN Clustering Algorithm” 2016 IEEE TRANSACTIONS ON IMAGE PROCESSING;
  12. Dongming Tang.Affinity propagation clustering for bid data based on Hadoop. Computer Engineering and Applications, 2015, 51(4):29-34;
  13. Joshua M.Dudik a, AtsukoKurosu b, JamesL.Coyle b, ErvinSejdi? a,n “A comparative analysis of DBSCAN, K-means, and quadratic variation algorithms for automatic identification of swallows from swallowing accelerometry signals”, Computers in Biology and Medicine 59 (2015);
  14. Jesal Shethna “Data Mining Techniques available from”† November 7, 2016;
  15. Martin Brown “Key techniques from” Published on December 11, 2012;
  16. Data Mining tutorials “Data Mining Techniques from”
  17. Saurabh Arora, Inderveer Chana “A Survey of Clustering Techniques for Big Data Analysis” 2014 5th International Conference- Confluence The Next Generation Information Technology Summit (Confluence),pp 59-65.
  18. Martin Ester, Hans Peter Kriegel, Jorg Sander, Xiaowei Xu, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”, Published in Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96)

Publication Details

Published in : Volume 4 | Issue 2 | January-February 2018
Date of Publication : 2018-01-20
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 239-244
Manuscript Number : IJSRSET184240
Publisher : Technoscience Academy

Print ISSN : 2395-1990, Online ISSN : 2394-4099

Cite This Article :

Ekta Joshi, Dr. D. A. Parikh, " An Improved K-Means Clustering Algorithm, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 4, Issue 2, pp.239-244, January-February-2018.
Journal URL :

Article Preview