Bit Reduction based Compression Algorithm for DNA Sequences
DOI:
https://doi.org/10.32628/IJSRSET218529Keywords:
Data compression, lossy and lossless compression, DNA, bases,bit reduction, hexa decimal format, variable length code, huffman codesAbstract
Deoxyribonucleic acid called DNA is the smallest fundamental unit that bears the genetic instructions of a living organism. It is used in the up growth and functioning of all known living organisms. Current DNA sequencing equipment creates extensive heaps of genomic data. The Nucleotide databases like GenBank, size getting 2 to 3 times larger annually. The increase in genomic data outstrips the increase in storage capacity. Massive amount of genomic data needs an effectual depository, quick transposal and preferable performance. To reduce storage of abundant data and data storage expense, compression algorithms were used. Typical compression approaches lose status while compressing these sequences. However, novel compression algorithms have been introduced for better compression ratio. The performance is correlated in terms of compression ratio; ratio of the capacity of compressed file and compression/decompression time; time taken to compress/decompress the sequence. In the proposed work, the input DNA sequence is compressed by reconstructing the sequence into varied formats. Here the input DNA sequence is subjected to bit reduction. The binary output is converted to hexadecimal format followed by encoding. Thus, the compression ratio of the biological sequence is improved.
References
- Afify, H., Islam, M., Abdel-Wahed, M., et al., 2010, Genomic Sequences Differential Compression Model, Proceeding of 27th National Radio Science Conferenec, Egypt.
- Bacem Saada, Jing Zhang, " DNA Sequences Compression Algorithm Based on Extended- ASCII Representation in Proceedings of the world congress on engineering and computer science 2015 Vol II WCECS 2015, October 21-23, 2015, San Francisco, USA.
- Behzadi B and Le Fessant F, "DNA compression challege revisited: a dynamic programming approach",in Proceedings of the Annual Symposium on Combinatorial Pattern Matching, pp. 90-200, Springer, Berlin,Germany,2005.
- Cao M D, Dix T I, Allison L, and Mears C, "A simple statistical algorithm for biological sequence compression," in Proceedings of the Data Compression Conference (DCC'07), pp. 43-52, IEEE, Snowbird, Utah, USA, March 2007.
- Chen X, Li M, Ma B, and Tromp J, "DNACompress: fast and effective DNA sequence compression,"Bioinformatics, vol. 18, no. 12, pp. 1696-1698, 2002.
- Chen X, Kwong S, and Li M, "Compression algorithm for DNA sequences and its applications in genome comparison," in Proceedings of the 4th Annual International Conference on Computation Molecular Biology (RECOMB'00), p. 107, ACM, Tokyo, Japan, April 2000.
- Grumbach S and Tahi F, "A new challenge for compression algorithms: genetic sequences", Information Processing and Management, vol. 30, no.6, pp. 875-886,1994.
- Grumbach S and Tahi F," Compression of DNA sequences", in Proceedings of the IEEE Symposium on Data Compression, pp. 340- 3550, Snowbird, Utah, USA, 1993
- Kanika Mehta and Satya Prakash Ghrera," DNA compression using referential compression algorithm",in Contemporary Computing (IC3), 2015 Eighth International Conference.
- Loewenstern D and Yianilos P N, "Significantly lower entropy estimates for natural DNA sequences,"Journal of Computational Biology, vol. 6, np. 1,pp. 125-142, 1999.
- Myung J I, Navarro D J, and Pitt M A, "Model selection by normalized maximum likelihood", Journal of Mathematical Psychology, vol.50, no. 2, pp. 167-179,2006
- Ma B, Tromp J, and Li M, "PatternHunter: fast and more sensitive homology search", Bioinformatics, vol. 18, no. 3, pp. 440-445, 2002.
- Matsumoto T, Sadakane K, and Imai H, "Biological sequence compression algorithms", Genome Informatics, vol. , pp. 43-52, 2000.
- Pamela Vinitha Eric, Gopakumar Gopalakrishnan and Muralikrishnan Karunakaran, " An Optimal Seed Based Compression Algorithm for DNA Sequences", Advances in Bioinformatics, vol 2016 (2016), Article ID 3528406, 7 pages.
- Prasad, V. H., and Kumar, P. V., 2012, A New Revised DNA Cramp Tool Based Approach of Chopping DNA Repetitive and Non- Repetitive Genome Sequences, International Journal of Computer Science Issues (IJCSI), 9(6), 448-454.
- Rajeswari, P. R., and Apparao, A., 2011, DNABIT Compress- Genome compression algorithm, Bioinformatics, 5(8), 350-360.
- Rajeswari, P. R., and Apparao, A., 2010, GenBit Compress Tool (GBC): A Java-Based Tool To Compress DNA Sequences and Compute Compression Ratio (BITS/BASE) Of Genomes, International Journal of Computer Science and Information Technology, 2(3), 181-191.
- Satyanvesh, D., Balleda, K., Padyana, A., 2012, GenCodex- A Novel Algorithm for Compressing DNA seuences on Multi-cores and GPUs, Proc. IEEE, 19th International Conf. on High Performance Computing (HiPC), Pune, India, No 37.
- Zhu Z, Zhang Y, Ji Z, He S, Yang X," High - throughput DNA sequence data compression",in Briefings in bioinformatics. 2015 Jan; 16 (1)
Downloads
Published
Issue
Section
License
Copyright (c) IJSRSET

This work is licensed under a Creative Commons Attribution 4.0 International License.