Breast Cancer Prediction Using Machine Learning Algorithm with Big Data Concept

Authors

  • R. Nirmalan  Assistant Professor, Department of Computer Science and Engineering, Bannari Amman Institute of Technology Sathyamangalam, Erode, Tamil Nadu, India
  • M. Javith Hussain Khan  UG Students, Department of Computer Science and Engineering, Bannari Amman Institute of Technology Sathyamangalam, Erode, Erode, Tamil Nadu, India
  • V. Sounder  UG Students, Department of Computer Science and Engineering, Bannari Amman Institute of Technology Sathyamangalam, Erode, Erode, Tamil Nadu, India
  • A. Manikkaraja   UG Students, Department of Computer Science and Engineering, Bannari Amman Institute of Technology Sathyamangalam, Erode, Erode, Tamil Nadu, India

DOI:

https://doi.org//10.32628/IJSRSET1207232

Keywords:

Classification, Machine Learning, SVM, DNA

Abstract

The evolution in modern computer technology produce an huge amount of data by the way of using updated technology world with the lot and lot of inventions. The algorithms which we used in machine-learning traditionally might not support the concept of big data. Here we have discussed and implemented the solution for the problem, while predicting breast cancer using big data. DNA methylation (DM) as well gene expression (GE) are the two types of data used for the prediction of breast cancer. The main objective is to classify individual data set in the separate manner. To achieve this main objective, we have used a platform Apache Spark. Here,we have applied three types of algorithms used for classification, they are decision tree, random forest algorithm, support vector machine algorithm which will be mentioned as SVM .These three types of algorithm used for producing models used for breast cancer prediction. Analyze have done for finding which algorithm will produce the better result with good accuracy and less error rate. Additionally, the platforms like Weka and Spark are compared, to find which will have the better performance while dealing with the huge data. The obtained outcome have proved that the Support Vector Machine classifier which is scalable might given the better performance than all other classifiers and it have achieved the lowest error range with the highest accuracy using GE data set

References

  1. K. P. Murphy, Machine Learning: A Probabilistic Perspective. Adaptive Computation and Machine Learning. Cambridge, Mass.: MIT Press, 2012.
  2. M. Guller, Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis. Berkeley: Apress, 2015.
  3. International Agency for Research on Cancer (IARC) and World Health Organization (WHO). GLOBOCAN 2018: Age standardized (World) incidence and mortality rates, breast. Online]. Available: https://gco.iarc.fr/today/data/factsheets/cancers/20-Brea st-fact- sheet.pdf
  4. “DNA Deoxyribonucleic Acid,” 2016. Online]. Available: http://www.myvmc.com/anatomy/dna-deoxyribonuclei c- acid/
  5. Y. Lu, and J. Han, “Cancer classification using gene expression data,” Information Systems, vol. 28, no. 4, pp. 243–268, 2003.
  6. M. M. Babu, “Introduction to microarray data analysis,” Computational genomics: Theory and application, vol. 17, no. 6, pp. 225–49, 2004.
  7. T. Mikeska, and J. M. Craig, “DNA methylation biomarkers: cancer and beyond,” Genes, vol. 5, no. 3, pp. 821–864, 2014.
  8. S. B. Baylin, “DNA methylation and gene silencing in cancer,” Nature Reviews Clinical Oncology, vol. 2, no. S1, p. S4, 2005.
  9. A. Einstein, B. Podolsky, and N. Rosen, “Can quantum-mechanical description of physical reality be considered complete?” Physical Review, vol. 47, no. 10, p. 777, 1935.
  10. Spark 2.1.0,” 2018.
  11. “Apache Spark™ - Unified Analytics Engine for Big Data,” Spark.apache.org, 2018. Accessed on: Nov. 10, 2018 Online]. Available: http://spark.apache.org/\
  12. “Spark Programming Guide – Spark 2.0.1 Documentation,” Spark.apache.org, 2018. Accessed on: Oct.15, 2018 Online]. Available: https://spark.apache.org/docs/2.0.1/programming- guide.html

Downloads

Published

2020-04-30

Issue

Section

Research Articles

How to Cite

[1]
R. Nirmalan, M. Javith Hussain Khan, V. Sounder, A. Manikkaraja , " Breast Cancer Prediction Using Machine Learning Algorithm with Big Data Concept, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 7, Issue 2, pp.123-127, March-April-2020. Available at doi : https://doi.org/10.32628/IJSRSET1207232