Cancerous/Disease DNA Prediction Using Fixed Length Motifs/Frequent Patterns Matching

Authors(3) :-Adnan Ferdous Ashrafi, Shah S Mahin, Tarikuzzaman Emon

In the radical field of bioinformatics, one very interesting and rather concerning area of research is predicting cancer infected gene from a set of samples of species DNA. This field is quite a challenging one considering the limited knowledge on how cancers affect gene of species and the pattern of mutation are not always the same. Gene prediction can be effectively done through several techniques like frequent pattern mining, neural networks or sequence alignment. These traditional approaches were able to predict to a very small limit. In this paper a new method using frequent patterns/motifs is shown that can be a new strategy for prediction of gene in a DNA. As the motifs in a DNA are the conserved region, so it's more appropriate to be used for gene predication and alignment. The new method proposed in this paper includes the sampling of fixed length motifs from a sequence of reference genome and finally other samples are aligned against the more frequent motifs to establish their relevancy to the reference genome.

Authors and Affiliations

Adnan Ferdous Ashrafi
Department of CSE, Stamford University Bangladesh, Dhaka, Bangladesh
Shah S Mahin
Department of CSE, Stamford University Bangladesh, Dhaka, Bangladesh
Tarikuzzaman Emon
Department of CSE, Stamford University Bangladesh, Dhaka, Bangladesh

Gene Prediction; Cancer Cell Prediction; Motifs; Hash Table; Frequent Pattern Matching;

  1. F. Ashrafi, A. K. M. I. Newaz, R. Ajwad, M. M. Tanvee and M. A. Mottalib, "A modified algorithm for DNA motif finding and ranking considering variable length motif and mutation," Recent Trends in Information Systems (ReTIS), 2015 IEEE 2nd International Conference on, Kolkata, 2015, pp. 12-17. doi: 10.1109/ReTIS.2015.7232844, URL:
  2. Kazi Mahbub Mutakabbir, S. S. Mahin and Md. Abid Hasan, "Mining frequent pattern within a genetic sequence using unique pattern indexing and mapping techniques," Informatics, Electronics & Vision (ICIEV), 2014 International Conference on, Dhaka, 2014, pp. 1-5. doi: 10.1109/ICIEV.2014.6850729, URL:
  3. Sharifa, L., S., A., Harun, H., and Taib, M., N.: A Modified Algorithm for Species Specific Motif Discovery. In International Conference on Science and Social Research (CSSR 2010), Kuala Lumpur, Malaysia, Dec 5-7, 2010.
  4. Sharifa, L., S., A. and Harun, H.: Motif Discovery using Linear-PSO with binary Search. In AWERProcedia Information Technology & Computer Science. Pp 458 – 462. (2012)
  5. Hong Zhou; Zheng Zhao; Hongpo Wang, "A novel parallel motif discovery algorithm based on de Bruijn graph,"Industrial Mechatronics and Automation (ICIMA), 2010 2nd International Conference on , vol.2, no.,pp.139,142,30-31May2010 doi: 10.1109/ICINDMA.2010.5538350.
  6. Islam,S.M.S.; Asger, M.R. ; Hasan, M.A. ; Mottalib M.A. : A modified algorithm for variable length DNA motif discovery, Smart Instrumentation, Measurement and Applications (ICSIMA), 2013 IEEE International Conference on 25-27 Nov. 2013, Pages 1-4.
  7. Dianhui Wang, SarwarTapan. : MISCORE: a new scoring function for characterizing DNA regulatory motifs in promoter sequences.From 23rd International Conference on Genome Informatics (GIW 2012) Tainan, Taiwan. 12-14 December 2012.
  8. Chang, B., C., H., Ratnaweera, A., and Halagmuge, S., K.,: Particle Swarm Optimization for Protein Motif Discovery. In Genetic Programming and Evolvable Machine, vol. 5, pp. 203-214. (2004)
  9. Akbari, R., and Ziarati, K.,: An Efficient PSO Algorithm for Motif Discovery in DNA. In IEEE International Conference of Emerging Trends in Computing, Tamil Nadu, India 2009.
  10. Hardin, C., T., and Rouchka, E., C.: DNA Motif Detection Using Particle Swarm Optimization and Expectation-Maximization. In IEEE Symposium on Swarm Intelligence, 2005.
  11. Zhou, W., Zhu, H., Liu, G., Huang, Y., Wang, Y., Han, D., and Zhou C.: A Novel Computational Based Method for Discovery of Sequence Motifs from Coexpressed Genes. In International Journal of Information Technology, vol. 11 (2005).
  12. Lei, C., and Ruan, J.: A Particle Swarm Optimization Algorithm for Finding DNA Sequence. IEEE International Conference on Bioinformatics and Biomedicine, Philadelphia, 2008.
  13. Davila, J., Balla, S., Rajasekaran, S.: Fast and practical Algorithm for Panted (l, d) Motif Search. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 4, pp. 544-552, IEEE Press (2007)
  14. Pradhan, M.: Motif Discovery in Biological Sequences. Master’s Projects (2008)
  15. CompariMotif: quick and easy comparisons of sequence motifs Richard J. Edwards1,2,*, Norman E. Davey1 and Denis C. Shields. Received on February 11, 2008; revised on March 18,2008; accepted on March 19, 2008.Advance Access publication March 28, 2008.
  16. Kennedy, J., and Ebehart, R.: Particle Swarm Optimization. In: IEEE International Conference on Neural Networks, Perth, Australia (1995).
  17. ShripalVijayvargiya, PratyooshShukla.: A Structured Evolutionary Algorithm for Identification of Transcription Factor Binding Sites in Unaligned DNA Sequences. International Journal of Advancements in Technology. ISSN 0976-4860
  18. Matt Stine, DipankurDashgupta,SurajMukatira. : Motif Discovery in Upstream Sequences of Coordinately Expressed genes. sequences.From 20rd International Conference on Genome Informatics (GIW 2011) Tainan, Taiwan. 11-13 December 2011.
  19. Bai, S. X. Bai, “The Maximal Frequent Pattern Mining of DNA
    Sequence,” GrC, pp 23-26, 2009.
  20. F. Zerin, B. S. Jeong, “A Fast Contiguous Sequential Pattern Mining
    Technique in DNA Data Sequences Using Position Information,”
    Department of Computer Engineering, Kyung Hee University, 1
    Seocheon-dong, Giheung-gu, Yongin-si, Gyeonggi-do, 446-701, Korea.
    Date of Web Publication 12-Dec-2011.
  21. M. Tanvee, S. J. Kabeer, T. M. Chowdhury, “Mining Maximal
    Adjacent Frequent Patterns from DNA Sequences using Location
    Information,” Department of CSE, Islamic University Of Technology.
  22. H. Kang, J. S. Yoo and H. Y. Kim, “Mining frequent contiguous
    sequence patterns in biological sequences,” in proceeding of the 7th
    IEEE International Conference on Bioinformatics and Bioengineering,
    pp. 723-8, 2007.
  23. Agrawal and R. Srikant, “Fast algorithms for mining association
    rules.” In Proc. 1994 Int. Conf. Very Large Databases (VLDB?94),
    pages 487–499, Santiago, Chile, Sept. 1994.
  24. Srikant and R. Agrwal, "Mining sequential patterns: generalizations
    and performance improvements", in Proceedings of 5th International
    Conference on Extending Database Technology (EDBT'96), Avignon, France, pp. 3-17, Mar. 1996.

Publication Details

Published in : Volume 2 | Issue 5 | September-October 2016
Date of Publication : 2016-10-30
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 15-22
Manuscript Number : IJSRSET16253
Publisher : Technoscience Academy

Print ISSN : 2395-1990, Online ISSN : 2394-4099

Cite This Article :

Adnan Ferdous Ashrafi, Shah S Mahin, Tarikuzzaman Emon, " Cancerous/Disease DNA Prediction Using Fixed Length Motifs/Frequent Patterns Matching, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 2, Issue 5, pp.15-22, September-October-2016.
Journal URL :

Article Preview