Comparison of Cluster Ensemble and Two Step Cluster Methods on Clustering with Mixed Type Data

Authors

  • Fera Hermawati  Department of Statistics, Bogor Agricultural University, Bogor, West Java, Indonesia
  • Budi Susetyo  Department of Statistics, Bogor Agricultural University, Bogor, West Java, Indonesia
  • Agus Mohamad Soleh  Department of Statistics, Bogor Agricultural University, Bogor, West Java, Indonesia

Keywords:

Clustering, Cluster Ensemble, Two Step Cluster, Mixed Type Data

Abstract

Health development is supported by the availability of adequate health facilities and personnel. To facilitate the government in determining the policies taken, it is necessary to group the region to know which areas that need improvement in health facilities and personnel. Cluster analysis is used to group objects based on certain characteristic similarities. Cluster analysis is generally applied to objects with numerical data types. Health facility and health personnel data have categorical and numerical types or also called mixed data type, so it is necessary to use clustering for mixed types data. This study aims to compare cluster ensemble method and two step cluster method in clustering mixed type data. The comparative criterion used is the ratio between diversity within cluster (S_w) and the diversity between cluster (S_b). Smaller ratio values indicate a better method. The research results showed that cluster ensemble method is a better method than the two step cluster method in clustering mixed type data.

References

  1. Z. Huang. "Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values". Data Mining and Knowledge Discovery, vol. 2, no. 3, pp. 283-304, Sept. 1998.
  2. C. Li and G. Biswas. "Unsupervised Learning with Mixed Numeric and Nominal Data". IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 4, pp. 673-690, Jul/Aug. 2002.
  3. SPSS Inc. White paper – technical report, "The SPSS Two Step Cluster Component". 2001.
  4. Z. He, X. Xu, S. Deng. "Clustering Mixed Numeric and Categorical Data: A Cluster Ensemble Approach". High Technology Letters, vol. 9, no. 4, Oct. 2005.
  5. R.R. Dewangan, L.K. Sharma, A.K. Akasapu. "Fuzzy Clustering Technique for Numerical and Categorical Dataset". International Journal on Computer Science on Engineering, pp. 75-80, 2010.
  6. Kementerian Kesehatan Republik Indonesia. "Data SDM Kesehatan yang Didayagunakan di Fasiitas Pelayanan Kesehatan (Fasyankes)." Internet: http://bppsdmk.kemkes.go.id/info_sdmk/info/, May. 9, 2018].
  7. R. Ghaemi, M.N. Sulaiman, H. Ibrahim, N. Mustapha. "A Survey: Clustering Ensembles Techniques", in Proceedings of World Academy of Science, Engineering and Technology, 2009, pp. 636-645.
  8. A. Strehl and J. Ghosh. "Cluster Ensembles – A Knowledge Reuse Framework for Combining Multiple Partitions". Journal of Machine Learning Research, vol. 3, pp. 583-617, Feb. 2002.
  9. Z. He, X. Xu, S. Deng. "Squeezer: An Efficient Algorithm for Clustering Categorical Data". Journal Computer Science and Technology, vol. 17, no. 5, Sept. 2002.
  10. R.A. Johnson and D.W. Wichern. Applied Multivariate Statistical Analysis Fifth Edition. New Jersey: Prentice Hall, 2002, pp. 31.
  11. M. Charrad, N. Ghazzali, V. Boiteau, A. Niknafs. "NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set". Journal of Statistical Software, vol. 61, no. 6, Oct. 2014.
  12. J. Bacher, K. Wenzig, M. Vogler. "SPSS Two Step Cluster – a First Evaluation". Available at https://www.ssoar.info/ssoar/bitstream/handle/document/32715/ssoar-2004-bacher_et_al-SPSS_TwoStep_Cluster_-_a.pdf?sequence=1. 2004.
  13. M.J. Bunkers and J.R. Miller. "Definition of Climate Regions in the Northern Plains Using an Objective Cluster Modification Technique". Journal of Climate, vol. 9, pp. 130-146, Jan. 1996.
  14. A. Dewi. "Metode Cluster Ensemble untuk Pengelompokkan Desa Perdesaan di Provinsi Riau". M.A. thesis, Institut Teknologi Sepuluh Nopember, Surabaya, 2012.

Downloads

Published

2018-07-30

Issue

Section

Research Articles

How to Cite

[1]
Fera Hermawati, Budi Susetyo, Agus Mohamad Soleh, " Comparison of Cluster Ensemble and Two Step Cluster Methods on Clustering with Mixed Type Data, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 4, Issue 9, pp.135-141, July-August-2018.