Multivariate Missing Data Handling with Iterative Bayesian Additive Lasso (IBAL) Multiple Imputation in Multicore Environment on Cloud

Authors

  • Lavanya. K  Research Scholar,Department of Computer Science & Engineering, JNTUA College of Engineering, Ananthapuramu, Andhra Pradesh, India
  • L. S. S. Reddy  Professor, Department of Computer Science and Engineering, KLUniversity,Vaddeswaram, Guntur(Dt.) , Andhra Pradesh, India
  • B. Eswara Reddy  Professor, Department of Computer Science and Engineering, JNTUA College of Engineering,Kalikiri, Chittoor(Dt.), Andhra Pradesh, India

DOI:

https://doi.org//10.32628/IJSRSET196319

Keywords:

Multiple Imputation, Regularized Regression, Additive Lasso, High Dimensional, and Multicore Environment.

Abstract

Dealing with high dimensional data of the form p>n for multivariate analysis of missingness is very complicated. It arises in many fields mainly in social science, economics and medical study; genome is an example for that where is to mention that samples are very less compared to study elements nothing but variables. The analysis is a combination of large covariate vectors with response and non-response effects of unknown functional form related to response variable of interest. Thus, there is a need for regularized regression models, with effect of smoothing parametric method to do this in this work combine regularization by incorporating different types of covariates. Although regularization approaches fits to framework but the computation high demands in high dimensional analysis they also rely on penalized estimation. The solution is to implement regularization in iteration based smoothing approaches to fit such analysis. The proposed algorithm called Iterative Bayesian Additive Lasso (IBAL) is compared with standard methods in medical analysis and produced unbiased results. The overall work done in multi core environment offered by Cloud Service called Microsoft Azure. The performance is estimated with benchmarks like Standard Error (SE), Mean Square Error (MSE), and Confidence Interval (CI).

References

  1. Aittokallio. Dealing with missing values in large-scale studies: microarray data imputation and beyond. Briefings in Bioinformatics, 11(2):253–264, 2010.
  2. Graham, J. W., Hofer, S. M., Piccinin, A. M. (1994), “Analysis with missing data in drug prevention research." National Institute on Drug Abuse Research Monograph 142, 13-63.
  3. Aittokallio. Dealing with missing values in large-scale studies: microarray data imputation and beyond. Briefings in Bioinformatics, 11(2):253–264, 2010.
  4. Little RJ, D’Agostino R, Cohen ML, et al. The prevention and treatment of missing data in clinical trials. N Engl J Med. 2012;367(14):1355–1360
  5. Mazumder, T. Hastie, and R. Tibshirani. Spectral regularization algorithms for learning large incomplete matrices. Journal of Machine Learning Research, 99:2287–2322, 2010.
  6. Ibrahim J, Molenberghs G. Missing data methods in longitudinal studies: A review. Test (Madr) 2009;18:1–43
  7. Gromski, P. S., Xu, Y., Kotze, H. L., Correa, E., Ellis, D. I., Armitage, E. G., Turner, M. L., & Goodacre, R. (2014). Influence of missing values substitutes on multivariate analysis of metabolomics data. Metabolites, 4(2), 433-452.
  8. Chiu C-C, Chan S-Y, Wang C-C, Wu W-S. Missing value imputation for microarray data: a comprehensive comparison study and a web tool. BMC Syst Biol. 2013;7(S-6):12. doi: 10.1186/1752-0509-7-S6-S12.
  9. Stuart EA, Azur M, Frangakis C, et al. Multiple imputation with large data sets: a case study of the Children’s Mental Health Initiative. Am J Epidemiol. 2009;169(9):1133–1139.
  10. Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2014). Bayesian data analysis (Vol. 2). Boca Raton, FL, USA: Chapman & Hall/CRC.
  11. Gilks, W. R. and Wild, P. P. (1992). Adaptive rejection sampling for gibbs sampling. Appl. Statist, 41(2):337–348.
  12. Allen and R. Tibshirani. Transposable regularized covariance models with an application to missing data imputation. Annals of Applied Statistics, 4(2):764–790, 2010.
  13. Consentino, F. and Claeskens, G. (2011). Missing covariates in logistic regression, estimation and distribution selection. Statistical Modelling, 11(2):159–183.
  14. Josse, J. and Husson, F. (2016). missMDA: A package for handling missing values in multivariate data analysis. Journal of Statistical Software, 70(1):1–31.
  15. de Jong, S. van Buuren, and M. Spiess. Multiple imputation of predictor variables using generalized additive models. Communications in Statistics - Simulation and Computation, 45(3):968–985, 2014. ISSN 1532-4141

Downloads

Published

2019-06-30

Issue

Section

Research Articles

How to Cite

[1]
Lavanya. K, L. S. S. Reddy, B. Eswara Reddy, " Multivariate Missing Data Handling with Iterative Bayesian Additive Lasso (IBAL) Multiple Imputation in Multicore Environment on Cloud , International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 6, Issue 3, pp.194-200, May-June-2019. Available at doi : https://doi.org/10.32628/IJSRSET196319