Breast Cancer Prediction Using Machine Learning Algorithm with Big Data Concept

Authors(4) :-R. Nirmalan, M. Javith Hussain Khan, V. Sounder, A. Manikkaraja

The evolution in modern computer technology produce an huge amount of data by the way of using updated technology world with the lot and lot of inventions. The algorithms which we used in machine-learning traditionally might not support the concept of big data. Here we have discussed and implemented the solution for the problem, while predicting breast cancer using big data. DNA methylation (DM) as well gene expression (GE) are the two types of data used for the prediction of breast cancer. The main objective is to classify individual data set in the separate manner. To achieve this main objective, we have used a platform Apache Spark. Here,we have applied three types of algorithms used for classification, they are decision tree, random forest algorithm, support vector machine algorithm which will be mentioned as SVM .These three types of algorithm used for producing models used for breast cancer prediction. Analyze have done for finding which algorithm will produce the better result with good accuracy and less error rate. Additionally, the platforms like Weka and Spark are compared, to find which will have the better performance while dealing with the huge data. The obtained outcome have proved that the Support Vector Machine classifier which is scalable might given the better performance than all other classifiers and it have achieved the lowest error range with the highest accuracy using GE data set

Authors and Affiliations

R. Nirmalan
Assistant Professor, Department of Computer Science and Engineering, Bannari Amman Institute of Technology Sathyamangalam, Erode, Tamil Nadu, India
M. Javith Hussain Khan
UG Students, Department of Computer Science and Engineering, Bannari Amman Institute of Technology Sathyamangalam, Erode, Erode, Tamil Nadu, India
V. Sounder
UG Students, Department of Computer Science and Engineering, Bannari Amman Institute of Technology Sathyamangalam, Erode, Erode, Tamil Nadu, India
A. Manikkaraja
UG Students, Department of Computer Science and Engineering, Bannari Amman Institute of Technology Sathyamangalam, Erode, Erode, Tamil Nadu, India

Classification, Machine Learning, SVM, DNA

  1. K. P. Murphy, Machine Learning: A Probabilistic Perspective. Adaptive Computation and Machine Learning. Cambridge, Mass.: MIT Press, 2012.
  2. M. Guller, Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis. Berkeley: Apress, 2015.
  3. International Agency for Research on Cancer (IARC) and World Health Organization (WHO). GLOBOCAN 2018: Age standardized (World) incidence and mortality rates, breast. Online]. Available: https://gco.iarc.fr/today/data/factsheets/cancers/20-Brea st-fact- sheet.pdf
  4. “DNA Deoxyribonucleic Acid,” 2016. Online]. Available: http://www.myvmc.com/anatomy/dna-deoxyribonuclei c- acid/
  5. Y. Lu, and J. Han, “Cancer classification using gene expression data,” Information Systems, vol. 28, no. 4, pp. 243–268, 2003.
  6. M. M. Babu, “Introduction to microarray data analysis,” Computational genomics: Theory and application, vol. 17, no. 6, pp. 225–49, 2004.
  7. T. Mikeska, and J. M. Craig, “DNA methylation biomarkers: cancer and beyond,” Genes, vol. 5, no. 3, pp. 821–864, 2014.
  8. S. B. Baylin, “DNA methylation and gene silencing in cancer,” Nature Reviews Clinical Oncology, vol. 2, no. S1, p. S4, 2005.
  9. A. Einstein, B. Podolsky, and N. Rosen, “Can quantum-mechanical description of physical reality be considered complete?” Physical Review, vol. 47, no. 10, p. 777, 1935.
  10. Spark 2.1.0,” 2018.
  11. “Apache Spark™ - Unified Analytics Engine for Big Data,” Spark.apache.org, 2018. Accessed on: Nov. 10, 2018 Online]. Available: http://spark.apache.org/\
  12. “Spark Programming Guide – Spark 2.0.1 Documentation,” Spark.apache.org, 2018. Accessed on: Oct.15, 2018 Online]. Available: https://spark.apache.org/docs/2.0.1/programming- guide.html

Publication Details

Published in : Volume 7 | Issue 2 | March-April 2020
Date of Publication : 2020-04-30
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 123-127
Manuscript Number : IJSRSET1207232
Publisher : Technoscience Academy

Print ISSN : 2395-1990, Online ISSN : 2394-4099

Cite This Article :

R. Nirmalan, M. Javith Hussain Khan, V. Sounder, A. Manikkaraja , " Breast Cancer Prediction Using Machine Learning Algorithm with Big Data Concept, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 7, Issue 2, pp.123-127, March-April-2020. Available at doi : https://doi.org/10.32628/IJSRSET1207232      Citation Detection and Elimination     |     
Journal URL : https://ijsrset.com/IJSRSET1207232

Article Preview