Improving Performance of Data Extracts Using Window-Based Refresh Strategies

Authors

  • Swethasri Kavuri  Independent Researcher, USA
  • Suman Narne  Independent Researcher, USA

DOI:

https://doi.org/10.32628/IJSRSET2310631

Keywords:

Data extracts, Window-based refresh, ETL optimization, Data warehousing, Big data, Performance tuning, Incremental updates

Abstract

This research paper investigates the application of window-based refresh strategies to enhance the performance of data extracts in large-scale data management systems. Traditional extract, transform, load (ETL) processes often struggle with the increasing volume and velocity of data in modern environments. Window-based refresh strategies offer a promising solution by focusing on specific subsets of data during each refresh cycle. This study examines various window-based techniques, including time-based, size-based, and hybrid approaches, and evaluates their effectiveness in improving extract performance. Through extensive analysis and empirical testing, we demonstrate that window-based strategies can significantly reduce processing time and resource utilization while maintaining data consistency and integrity. The paper also explores optimization techniques, challenges, and future research directions in this field.

References

  1. Abadi, D., Ailamaki, A., Andersen, D., Bailis, P., Balazinska, M., Bernstein, P., ... & Zaharia, M. (2019). The Seattle Report on Database Research. ACM SIGMOD Record, 48(4), 44-53.
  2. Armbrust, M., Ghodsi, A., Zaharia, M., Xin, R. S., Lian, C., Huai, Y., ... & Franklin, M. J. (2015). Spark SQL: Relational data processing in Spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (pp. 1383-1394).
  3. Bailis, P., Fekete, A., Franklin, M. J., Ghodsi, A., Hellerstein, J. M., & Stoica, I. (2015). Coordination avoidance in database systems. Proceedings of the VLDB Endowment, 8(3), 185-196.
  4. Boehm, M., Schlegel, B., Volk, P. B., Fischer, U., Habich, D., & Lehner, W. (2020). Efficient in-memory indexing with generalized prefix trees. ACM Transactions on Database Systems (TODS), 45(1), 1-47.
  5. Carbone, P., Fragkoulis, M., Kalavri, V., & Katsifodimos, A. (2020). Beyond analytics: the evolution of stream processing systems. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (pp. 2651-2658).
  6. Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., & Tzoumas, K. (2018). Apache Flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 36(4), 28-38.
  7. Chandramouli, B., Goldstein, J., Barnett, M., DeLine, R., Fisher, D., Platt, J. C., ... & Terwilliger, J. (2018). Trill: A high-performance incremental query processor for diverse analytics. Proceedings of the VLDB Endowment, 8(4), 401-412.
  8. Chen, L., Gao, H., & Xu, Z. (2020). Adaptive parallel execution for window-based stream queries.
  9. Delimitrou, C., & Kozyrakis, C. (2014). Quasar: Resource-efficient and QoS-aware cluster management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (pp. 127-144). ACM.
  10. Dey, A., Fekete, A., Nambiar, R., & Röhm, U. (2016). YCSB+T: Benchmarking web-scale transactional databases. In 2016 IEEE 32nd International Conference on Data Engineering Workshops (ICDEW) (pp. 223-230). IEEE.
  11. Fernandez, R. C., Migliavacca, M., Kalyvianaki, E., & Pietzuch, P. (2018). Integrating scale out and fault tolerance in stream processing using operator state management. In Proceedings of the 2018 International Conference on Management of Data (pp. 725-739). ACM.
  12. Floratou, A., Agrawal, A., Graham, B., Rao, S., & Ramasamy, K. (2017). Dhalion: Self-regulating stream processing in Heron. Proceedings of the VLDB Endowment, 10(12), 1825-1836.
  13. Jonas, E., Pu, Q., Venkataraman, S., Stoica, I., & Recht, B. (2017). Occupy the cloud: Distributed computing for the 99%. In Proceedings of the 2017 Symposium on Cloud Computing (pp. 445-451). ACM.
  14. Kraska, T., Alizadeh, M., Beutel, A., Chi, E. H., Kristo, A., Leclerc, G., ... & Zaharia, M. (2019). SageDB: A learned database system. In CIDR.
  15. Kraska, T., Beutel, A., Chi, E. H., Dean, J., & Polyzotis, N. (2017). The case for learned index structures. In Proceedings of the 2018 International Conference on Management of Data (pp. 489-504). ACM.
  16. Krishnan, S., Wang, J., Wu, E., Franklin, M. J., & Goldberg, K. (2016). ActiveClean: Interactive data cleaning for statistical modeling. Proceedings of the VLDB Endowment, 9(12), 948-959.
  17. Laptev, N., Amizadeh, S., & Flint, I. (2015). Generic and scalable framework for automated time-series anomaly detection. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1939-1947). ACM.
  18. Li, J., Maier, D., Tufte, K., Papadimos, V., & Tucker, P. A. (2018). No pane, no gain: Efficient evaluation of sliding-window aggregates over data streams. In Proceedings of the 2018 International Conference on Management of Data (pp. 39-53). ACM.
  19. Mao, H., Schwarzkopf, M., Venkatakrishnan, S. B., Meng, Z., & Alizadeh, M. (2019). Learning scheduling algorithms for data processing clusters. In Proceedings of the ACM Special Interest Group on Data Communication (pp. 270-288). ACM.
  20. Ramakrishnan, S. R., Swart, G., & Urmanov, A. (2017). Balancing reducer skew in MapReduce workloads using progressive sampling. In Proceedings of the 2017 Symposium on Cloud Computing (pp. 282-294). ACM.
  21. Shanbhag, A., Jindal, A., Madden, S., Quamar, A., & Zhou, H. (2017). A robust partitioning scheme for ad-hoc query workloads. In Proceedings of the 2017 ACM International Conference on Management of Data (pp. 1349-1364). ACM.
  22. Sharma, P., Guo, T., He, X., Irwin, D., & Shenoy, P. (2016). Flint: Batch-interactive data-intensive processing on transient servers. In Proceedings of the Eleventh European Conference on Computer Systems (pp. 1-15). ACM.
  23. Tangwongsan, K., Hirzel, M., Schneider, S., & Wu, K. L. (2017). General incremental sliding-window aggregation. Proceedings of the VLDB Endowment, 8(7), 702-713.
  24. Wu, W., Chi, Y., Zhu, S., Tatemura, J., Hacigümü?, H., & Naughton, J. F. (2021). Towards a learning optimizer for shared clouds. Proceedings of the VLDB Endowment, 12(3), 210-222.
  25. Zamanian, E., Binnig, C., & Salama, A. (2015). Locality-aware partitioning in parallel database systems. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (pp. 17-30). ACM.
  26. Zhang, Y., Cui, B., Fu, H., Guo, W., & Zhang, W. (2019). AdaM: An adaptive partitioning mechanism for continuous query processing over data streams. The VLDB Journal, 28(3), 351-376
  27. .Santhosh Palavesh. (2019). The Role of Open Innovation and Crowdsourcing in Generating New Business Ideas and Concepts. International Journal for Research Publication and Seminar, 10(4), 137–147. https://doi.org/10.36676/jrps.v10.i4.1456
  28. Santosh Palavesh. (2021). Developing Business Concepts for Underserved Markets: Identifying and Addressing Unmet Needs in Niche or Emerging Markets. Innovative Research Thoughts, 7(3), 76–89. https://doi.org/10.36676/irt.v7.i3.1437
  29. Palavesh, S. (2021). Co-Creating Business Concepts with Customers: Approaches to the Use of Customers in New Product/Service Development. Integrated Journal for Research in Arts and Humanities, 1(1), 54–66. https://doi.org/10.55544/ijrah.1.1.9
  30.  Santhosh Palavesh. (2021). Business Model Innovation: Strategies for Creating and Capturing Value Through Novel Business Concepts. European Economic Letters (EEL), 11(1). https://doi.org/10.52783/eel.v11i1.1784
  31. Vijaya Venkata Sri Rama Bhaskar, Akhil Mittal, Santosh Palavesh, Krishnateja Shiva, Pradeep Etikani. (2020). Regulating AI in Fintech: Balancing Innovation with Consumer Protection. European Economic Letters (EEL), 10(1). https://doi.org/10.52783/eel.v10i1.1810
  32. Challa, S. S. S. (2020). Assessing the regulatory implications of personalized medicine and the use of biomarkers in drug development and approval. European Chemical Bulletin, 9(4), 134-146.D.O.I10.53555/ecb.v9:i4.17671
  33. EVALUATING THE EFFECTIVENESS OF RISK-BASED APPROACHES IN STREAMLINING THE REGULATORY APPROVAL PROCESS FOR NOVEL THERAPIES. (2021). Journal of Population Therapeutics and Clinical Pharmacology, 28(2), 436-448. https://doi.org/10.53555/jptcp.v28i2.7421
  34. Challa, S. S. S., Tilala, M., Chawda, A. D., & Benke, A. P. (2019). Investigating the use of natural language processing (NLP) techniques in automating the extraction of regulatory requirements from unstructured data sources. Annals of Pharma Research, 7(5), 380-387.
  35. Challa, S. S. S., Chawda, A. D., Benke, A. P., & Tilala, M. (2020). Evaluating the use of machine learning algorithms in predicting drug-drug interactions and adverse events during the drug development process. NeuroQuantology, 18(12), 176-186. https://doi.org/10.48047/nq.2020.18.12.NQ20252
  36. Ranjit Kumar Gupta, Sagar Shukla, Anaswara Thekkan Rajan, Sneha Aravind, 2021. "Utilizing Splunk for Proactive Issue Resolution in Full Stack Development Projects" ESP Journal of Engineering & Technology Advancements 1(1): 57-64.
  37. Sagar Shukla. (2021). Integrating Data Analytics Platforms with Machine Learning Workflows: Enhancing Predictive Capability and Revenue Growth. International Journal on Recent and Innovation Trends in Computing and Communication, 9(12), 63–74. Retrieved from https://ijritcc.org/index.php/ijritcc/article/view/11119
  38. Sneha Aravind. (2021). Integrating REST APIs in Single Page Applications using Angular and TypeScript. International Journal of Intelligent Systems and Applications in Engineering, 9(2), 81 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6829
  39. Bhavesh Kataria "Weather-Climate Forecasting System for Early Warning in Crop Protection, International Journal of Scientific Research in Science, Engineering and Technology, Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 1, Issue 5, pp.442-444, September-October-2015. Available at : https://doi.org/10.32628/ijsrset14111
  40. Siddhant Benadikar. (2021). Developing a Scalable and Efficient Cloud-Based Framework for Distributed Machine Learning. International Journal of Intelligent Systems and Applications in Engineering, 9(4), 288 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6761
  41. Siddhant Benadikar. (2021). Evaluating the Effectiveness of Cloud-Based AI and ML Techniques for Personalized Healthcare and Remote Patient Monitoring. International Journal on Recent and Innovation Trends in Computing and Communication, 9(10), 03–16. Retrieved from https://www.ijritcc.org/index.php/ijritcc/article/view/11036
  42. Challa, S. S., Tilala, M., Chawda, A. D., & Benke, A. P. (2019). Investigating the use of natural language processing (NLP) techniques in automating the extraction of regulatory requirements from unstructured data sources. Annals of PharmaResearch, 7(5), 380-387.
  43. Dr. Saloni Sharma, & Ritesh Chaturvedi. (2017). Blockchain Technology in Healthcare Billing: Enhancing Transparency and Security. International Journal for Research Publication and Seminar, 10(2), 106–117. Retrieved from https://jrps.shodhsagar.com/index.php/j/article/view/1475
  44. Saloni Sharma. (2020). AI-Driven Predictive Modelling for Early Disease Detection and Prevention. International Journal on Recent and Innovation Trends in Computing and Communication, 8(12), 27–36. Retrieved from https://www.ijritcc.org/index.php/ijritcc/article/view/11046
  45. Fadnavis, N. S., Patil, G. B., Padyana, U. K., Rai, H. P., & Ogeti, P. (2020). Machine learning applications in climate modeling and weather forecasting. NeuroQuantology, 18(6), 135-145. https://doi.org/10.48047/nq.2020.18.6.NQ20194
  46. Narendra Sharad Fadnavis. (2021). Optimizing Scalability and Performance in Cloud Services: Strategies and Solutions. International Journal on Recent and Innovation Trends in Computing and Communication, 9(2), 14–21. Retrieved from https://www.ijritcc.org/index.php/ijritcc/article/view/10889
  47. Patil, G. B., Padyana, U. K., Rai, H. P., Ogeti, P., & Fadnavis, N. S. (2021). Personalized marketing strategies through machine learning: Enhancing customer engagement. Journal of Informatics Education and Research, 1(1), 9. http://jier.org
  48. Bhaskar, V. V. S. R., Etikani, P., Shiva, K., Choppadandi, A., & Dave, A. (2019). Building explainable AI systems with federated learning on the cloud. Journal of Cloud Computing and Artificial Intelligence, 16(1), 1–14.
  49. Vijaya Venkata Sri Rama Bhaskar, Akhil Mittal, Santosh Palavesh, Krishnateja Shiva, Pradeep Etikani. (2020). Regulating AI in Fintech: Balancing Innovation with Consumer Protection. European Economic Letters (EEL), 10(1). https://doi.org/10.52783/eel.v10i1.1810
  50. Dave, A., Etikani, P., Bhaskar, V. V. S. R., & Shiva, K. (2020). Biometric authentication for secure mobile payments. Journal of Mobile Technology and Security, 41(3), 245-259.
  51. Saoji, R., Nuguri, S., Shiva, K., Etikani, P., & Bhaskar, V. V. S. R. (2021). Adaptive AI-based deep learning models for dynamic control in software-defined networks. International Journal of Electrical and Electronics Engineering (IJEEE), 10(1), 89–100. ISSN (P): 2278–9944; ISSN (E): 2278–9952
  52. Bhavesh Kataria "Use of Information and Communications Technologies (ICTs) in Crop Production” International Journal of Scientific Research in Science, Engineering and Technology, Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 1, Issue 3, pp.372-375, May-June-2015. Available at : https://doi.org/10.32628/ijsrset151386
  53. Narendra Sharad Fadnavis. (2021). Optimizing Scalability and Performance in Cloud Services: Strategies and Solutions. International Journal on Recent and Innovation Trends in Computing and Communication, 9(2), 14–21. Retrieved from https://www.ijritcc.org/index.php/ijritcc/article/view/10889
  54. Prasad, N., Narukulla, N., Hajari, V. R., Paripati, L., & Shah, J. (2020). AI-driven data governance framework for cloud-based data analytics. Volume 17, (2), 1551-1561.
  55. Big Data Analytics using Machine Learning Techniques on Cloud Platforms. (2019). International Journal of Business Management and Visuals, ISSN: 3006-2705, 2(2), 54-58. https://ijbmv.com/index.php/home/article/view/76
  56. Bhavesh Kataria, Jethva Harikrishna, "Performance Comparison of AODV/DSR On-Demand Routing Protocols for Ad Hoc Networks", International Journal of Scientific Research in Science and Technology, Print ISSN : 2395-6011, Online ISSN : 2395-602X, Volume 1, Issue 1, pp.20-30, March-April-2015. Available at : https://doi.org/10.32628/ijsrst15117
  57. Shah, J., Narukulla, N., Hajari, V. R., Paripati, L., & Prasad, N. (2021). Scalable machine learning infrastructure on cloud for large-scale data processing. Tuijin Jishu/Journal of Propulsion Technology, 42(2), 45-53.
  58. Narukulla, N., Lopes, J., Hajari, V. R., Prasad, N., & Swamy, H. (2021). Real-time data processing and predictive analytics using cloud-based machine learning. Tuijin Jishu/Journal of Propulsion Technology, 42(4), 91-102
  59. Secure Federated Learning Framework for Distributed Ai Model Training in Cloud Environments. (2019). International Journal of Open Publication and Exploration, ISSN: 3006-2853, 7(1), 31-39. https://ijope.com/index.php/home/article/view/145
  60. Paripati, L., Prasad, N., Shah, J., Narukulla, N., & Hajari, V. R. (2021). Blockchain-enabled data analytics for ensuring data integrity and trust in AI systems. International Journal of Computer Science and Engineering (IJCSE), 10(2), 27–38. ISSN (P): 2278–9960; ISSN (E): 2278–9979.
  61. Challa, S. S. S., Tilala, M., Chawda, A. D., & Benke, A. P. (2019). Investigating the use of natural language processing (NLP) techniques in automating the extraction of regulatory requirements from unstructured data sources. Annals of Pharma Research, 7(5),
  62. Challa, S. S. S., Tilala, M., Chawda, A. D., & Benke, A. P. (2021). Navigating regulatory requirements for complex dosage forms: Insights from topical, parenteral, and ophthalmic products. NeuroQuantology, 19(12), 15.
  63. Tilala, M., & Chawda, A. D. (2020). Evaluation of compliance requirements for annual reports in pharmaceutical industries. NeuroQuantology, 18(11), 27.
  64. Ghavate, N. (2018). An Computer Adaptive Testing Using Rule Based. Asian Journal For Convergence In Technology (AJCT) ISSN -2350-1146, 4(I). Retrieved from http://asianssr.org/index.php/ajct/article/view/443
  65. Shanbhag, R. R., Dasi, U., Singla, N., Balasubramanian, R., & Benadikar, S. (2020). Overview of cloud computing in the process control industry. International Journal of Computer Science and Mobile Computing, 9(10), 121-146. https://www.ijcsmc.com
  66. Bhavesh Kataria, "XML Enabling Homogeneous and Platform Independent Data Exchange in Agricultural Information Systems, International Journal of Scientific Research in Science, Engineering and Technology, Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 1, Issue 2, pp.129-133, March-April-2015. Available at : https://doi.org/10.32628/ijsrset152239
  67. Benadikar, S. (2021). Developing a scalable and efficient cloud-based framework for distributed machine learning. International Journal of Intelligent Systems and Applications in Engineering, 9(4), 288. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6761
  68. Shanbhag, R. R., Balasubramanian, R., Benadikar, S., Dasi, U., & Singla, N. (2021). Developing scalable and efficient cloud-based solutions for ecommerce platforms. International Journal of Computer Science and Engineering (IJCSE), 10(2), 39-58.
  69. Tripathi, A. (2020). AWS serverless messaging using SQS. IJIRAE: International Journal of Innovative Research in Advanced Engineering, 7(11), 391-393.
  70. Bhavesh Kataria, "The Challenges of Utilizing Information Communication Technologies (ICTs) in Agriculture Extension, International Journal of Scientific Research in Science, Engineering and Technology, Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 1, Issue 1, pp.380-384, January-February-2015. Available at : https://doi.org/10.32628/ijsrset1511103
  71. Tripathi, A. (2019). Serverless architecture patterns: Deep dive into event-driven, microservices, and serverless APIs. International Journal of Creative Research Thoughts (IJCRT), 7(3), 234-239. Retrieved from http://www.ijcrt.org
  72. Thakkar, D. (2021). Leveraging AI to transform talent acquisition. International Journal of Artificial Intelligence and Machine Learning, 3(3), 7. https://www.ijaiml.com/volume-3-issue-3-paper-1/
  73. Bhavesh Kataria, "Role of Information Technology in Agriculture : A Review, International Journal of Scientific Research in Science, Engineering and Technology, Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 1, Issue 1, pp.01-03, 2014. Available at : https://doi.org/10.32628/ijsrset141115
  74. Thakkar, D. (2020, December). Reimagining curriculum delivery for personalized learning experiences. International Journal of Education, 2(2), 7. Retrieved from https://iaeme.com/Home/article_id/IJE_02_02_003
  75. Kanchetti, D., Munirathnam, R., & Thakkar, D. (2019). Innovations in workers compensation: XML shredding for external data integration. Journal of Contemporary Scientific Research, 3(8). ISSN (Online) 2209-0142.
  76. Aravind Reddy Nayani, Alok Gupta, Prassanna Selvaraj, Ravi Kumar Singh, & Harsh Vaidya. (2019). Search and Recommendation Procedure with the Help of Artificial Intelligence. International Journal for Research Publication and Seminar, 10(4), 148–166. https://doi.org/10.36676/jrps.v10.i4.1503
  77. Vaidya, H., Nayani, A. R., Gupta, A., Selvaraj, P., & Singh, R. K. (2020). Effectiveness and future trends of cloud computing platforms. Tuijin Jishu/Journal of Propulsion Technology, 41(3). Retrieved from https://www.journal-propulsiontech.com
  78. Alok Gupta. (2021). Reducing Bias in Predictive Models Serving Analytics Users: Novel Approaches and their Implications. International Journal on Recent and Innovation Trends in Computing and Communication, 9(11), 23–30. Retrieved from https://ijritcc.org/index.php/ijritcc/article/view/11108
  79. Bhavesh Kataria, "Variant of RSA-Multi prime RSA, International Journal of Scientific Research in Science, Engineering and Technology, Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 1, Issue 1, pp.09-11, 2014. Available at https://doi.org/10.32628/ijsrset14113
  80. Rinkesh Gajera , "Leveraging Procore for Improved Collaboration and Communication in Multi-Stakeholder Construction Projects", International Journal of Scientific Research in Civil Engineering (IJSRCE), ISSN : 2456-6667, Volume 3, Issue 3, pp.47-51, May-June.2019
  81. Voddi, V. K. R., & Konda, K. R. (2021). Spatial distribution and dynamics of retail stores in New York City. Webology, 18(6). Retrieved from https://www.webology.org/issue.php?volume=18&issue=60
  82. Gudimetla, S. R., et al. (2015). Mastering Azure AD: Advanced techniques for enterprise identity management. Neuroquantology, 13(1), 158-163. https://doi.org/10.48047/nq.2015.13.1.792
  83. Gudimetla, S. R., & et al. (2015). Beyond the barrier: Advanced strategies for firewall implementation and management. NeuroQuantology, 13(4), 558-565. https://doi.org/10.48047/nq.2015.13.4.876

Downloads

Published

2021-10-30

Issue

Section

Research Articles

How to Cite

[1]
Swethasri Kavuri, Suman Narne "Improving Performance of Data Extracts Using Window-Based Refresh Strategies" International Journal of Scientific Research in Science, Engineering and Technology (IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 8, Issue 5, pp.359-377, September-October-2021. Available at doi : https://doi.org/10.32628/IJSRSET2310631