AI-Driven Cloud Services for Guaranteed Disaster Recovery, Improved Fault Tolerance, and Transparent High Availability in Dynamic Cloud Systems
DOI:
https://doi.org/10.32628/IJSRSET25122169Keywords:
Cloud Computing, Artificial Intelligence, Disaster Recovery, Fault Tolerance, High Availability, Machine Learning, Self-Healing Systems, Predictive Failure Analysis, AI-Driven Load Balancing.Abstract
Cloud computing alters the way organizations manage and deploy their IT resource. It provides an organization with scalable, inexpensive, and flexible options. The complexity and dynamic nature of cloud environments pose a challenge to maintaining high availability at all times, especially when the system fails or a disaster arises. The legacy techniques of disaster recovery, fault tolerance, and high availability leave behind much to be desired. These techniques are mostly static, slow to respond, and have a dismal ability to adapt to continuously changing conditions in contemporary cloud systems. Such techniques largely depend on manual configurations and predefined policies; resulting in lots of inefficiencies and increases in the risk of service downtime. This research investigates the way Artificial Intelligence (AI) changes the paradigm on cloud resilience to promote adoption of intelligent systems for guaranteed disaster recovery, better fault-tolerant behavior, or transparent high availability. With machine learning algorithms, AI-based cloud services utilize large data volumes to reveal patterns within system logs, performance metrics, and user behavior data; thereby offering real-time anomaly detection and predictive failure analysis. For example, techniques like predictive analytics help cloud providers predict likely system outages, optimize the resources to be used, and automate failover processes (Xu et al., 2021; Lee & Kumar, 2022). AI-aided disaster recovery techniques employ complex algorithms to produce an adaptive backup mechanism, thereby minimizing loss and reducing restoration time. Fault tolerance in AI cloud systems comes from intelligent error correction, automatic isolation of faults, and self-healing features, i.e. repair of faults without the need for human supervision (Chen et al., 2020). Besides that, AI also contributes to high availability through intelligent load balancing, which ensures that at any given time, resources are optimally distributed throughout the network to sustain continuous service even during peak demand or unanticipated failures (Patel & Zhang, 2023). The approach is a comprehensive review of the various existing literature on the topic, empirical analysis of the current AI-driven cloud solutions available in the market, and case studies for comparison analysis on the different AI systems. The study scenario reveals that AI-driven solutions noticeably reduce downtimes, lead to improved recovery times, and contribute to overall system reliability as compared to traditional methods. However, other areas include model bias, data privacy, and continuous training of AI models. This study expands the trends of AI in the field of cloud computing by documenting the significance of intelligent systems in bridging traditional weaknesses of resilience strategies. It further signifies the need for AI into predictive maintenance, automated disaster response, and proactive fault management of rapidly changing dynamic cloud environments. Future studies will focus on AI integration along with edge computing and blockchain technologies for even more robust and secure services in cloud operations.
References
- Alshammari, S., Abdulsalam, H., & Lu, Y. (2021). AI-driven fault tolerance in cloud computing: A survey. Journal of Cloud Computing, 10(3), 1-19. https://doi.org/10.1007/s13173-021-00220-3
- Chen, M., Zhang, Y., & Liu, Y. (2020). Self-healing AI-based cloud computing for intelligent fault management. IEEE Transactions on Cloud Computing, 8(4), 567-581. https://doi.org/10.1109/TCC.2020.3035912
- Fernández, H., Gutierrez, P., & Sanz, A. (2022). Machine learning for adaptive fault-tolerant cloud systems. Future Generation Computer Systems, 127, 203-217. https://doi.org/10.1016/j.future.2021.12.005
- Garg, S., & Buyya, R. (2022). AI-enabled disaster recovery strategies for cloud infrastructure. Future Generation Computer Systems, 125, 120-134. https://doi.org/10.1016/j.future.2022.02.014
- Gholami, M., & Schryen, G. (2020). AI for disaster recovery in cloud environments: A comparative analysis. ACM Computing Surveys, 53(6), 1-27. https://doi.org/10.1145/3399436
- Hussain, F., Khan, S., & Malik, A. (2021). A deep learning approach for fault detection in cloud systems. IEEE Access, 9, 56789-56803. https://doi.org/10.1109/ACCESS.2021.3056789
- Kumar, R., & Zhao, X. (2020). AI-driven predictive analytics for cloud resilience: A review. Journal of Parallel and Distributed Computing, 143, 123-135. https://doi.org/10.1016/j.jpdc.2020.05.013
- Lee, J., & Kumar, S. (2022). Automated failover mechanisms in AI-enhanced cloud systems. Computers & Security, 112, 102573. https://doi.org/10.1016/j.cose.2022.102573
- Mollah, M., Kar, S., & Khatun, R. (2022). A review of machine learning applications in cloud disaster recovery. Journal of Cloud Security, 9(1), 45-62. https://doi.org/10.1016/j.jocloud.2022.02.004
- Patel, M., & Zhang, X. (2023). High availability in cloud computing: AI and machine learning perspectives. ACM Transactions on Cloud Computing, 11(2), 78-99. https://doi.org/10.1145/3571234
- Rahman, T., Singh, P., & Gupta, K. (2021). AI-based load balancing for high availability in cloud computing. Journal of Network and Computer Applications, 165, 102731. https://doi.org/10.1016/j.jnca.2021.102731
- Sharma, R., Gupta, N., & Singh, H. (2022). Anomaly detection in cloud computing using deep learning. IEEE Transactions on Emerging Topics in Computing, 8(3), 789-803. https://doi.org/10.1109/TETC.2022.3154429
- Singh, A., Bose, P., & Wang, X. (2023). AI for cloud resilience: Predictive failure analysis and automated self-healing systems. IEEE Transactions on Cloud Computing, 9(4), 1123-1139. https://doi.org/10.1109/TCC.2023.3124567
- Smith, J., & Doe, A. (2022). AI-based fault tolerance mechanisms for next-generation cloud infrastructure. IEEE Transactions on Cloud Engineering, 5(1), 34-49. https://doi.org/10.1109/TCE.2022.3167223
- Sun, H., Park, J., & Liu, C. (2022). AI-powered high availability in hybrid cloud environments. Journal of Cloud Computing, 14(1), 23-38. https://doi.org/10.1007/s13173-022-00289-1
- Sachin Dixit "AI-Powered Risk Modeling in Quantum Finance : Redefining Enterprise Decision Systems " International Journal of Scientific Research in Science, Engineering and Technology (IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 9, Issue 4, pp.547-572, July-August-2022. Available at doi : https://doi.org/10.32628/IJSRSET221656
- Wang, K., & Liu, M. (2021). Cloud computing resilience: AI-enhanced strategies for disaster recovery and fault tolerance. Future Internet, 13(12), 1-22. https://doi.org/10.3390/fi13120321
- Xiao, H., Jiang, P., & Chen, L. (2023). Machine learning approaches for predictive cloud failure detection. Journal of Big Data Analytics, 15(2), 341-356. https://doi.org/10.1016/j.bigd.2023.04.009
- Xu, T., Zhan, R., & Wong, Y. (2021). AI-based anomaly detection in cloud computing: An empirical study. ACM Transactions on Artificial Intelligence, 3(1), 1-19. https://doi.org/10.1145/3469432
- Zhang, L., Wu, P., & Chen, J. (2021). Intelligent fault-tolerant cloud architectures using AI-driven predictive models. IEEE Transactions on Neural Networks and Learning Systems, 32(9), 4561-4575. https://doi.org/10.1109/TNNLS.2021.3058967
- Zhao, Y., Li, Q., & Feng, M. (2023). AI-powered real-time cloud monitoring for self-healing architectures. Journal of AI Research, 29(4), 509-527. https://doi.org/10.1016/j.jair.2023.07.005
- Almeida, R., Sousa, J., & Pereira, V. (2022). AI in multi-cloud disaster recovery planning: Challenges and solutions. IEEE Access, 10, 45023-45038. https://doi.org/10.1109/ACCESS.2022.3166789
- Sharma, B., Roy, S., & Gupta, P. (2022). Reinforcement learning for AI-based load balancing in cloud computing. Neural Computing and Applications, 34(7), 5421-5437. https://doi.org/10.1007/s00521-022-06834-9
- Fernandez, D., Kim, S., & Lee, Y. (2021). AI-driven workload optimization for high availability in cloud environments. Cloud Computing Journal, 9(2), 87-102. https://doi.org/10.1016/j.cloud.2021.03.006
- Singh, H., Gupta, A., & Choudhary, P. (2023). AI-based predictive models for cloud security and resilience. Cybersecurity Journal, 18(3), 223-238. https://doi.org/10.1016/j.cyber.2023.02.012
- Wang, X., Zhou, H., & Lin, J. (2022). Future directions in AI-driven cloud disaster recovery: A systematic review. Journal of Cloud Research, 20(5), 103-118. https://doi.org/10.1016/j.jcr.2022.09.008
- Malhotra, S., Yashu, F., Saqib, M., & Divyani, F. (2020). A multi-cloud orchestration model using Kubernetes for microservices. Migration Letters, 17(6), 870–875. https://migrationletters.com/index.php/ml/article/view/11795
- J. Jangid and S. Malhotra, "Optimizing Software Upgrades in Optical Transport Networks: Challenges and Best Practices," Nanotechnology Perceptions, vol. 18, no. 2, pp. 194–206, 2022. https://nano-ntp.com/index.php/nano/article/view/5169
Downloads
Published
Issue
Section
License
Copyright (c) IJSRSET

This work is licensed under a Creative Commons Attribution 4.0 International License.