Review of Advances in Digital Recognition of Indian Language Manuscripts

Authors

  • Bhavesh Kataria  Research Scholar, Gujarat Technological University, Ahmedabad, Gujarat, India
  • Dr. Harikrishna B. Jethva  Associate Professor, Department of Computer Engineering, Government Engineering College, Patan, Gujarat, India

DOI:

https://doi.org//10.32628/IJSRSET1841215

Keywords:

OCR, Indic Scripts, Pattern Recognition. Character Recognition, Devanagari

Abstract

Digital content creation and document management in Indian languages are in progressing stage. OCR has become an administrative requirement for effective governance and daily activities. Scripts including those from medieval to contemporary time are of literary and political importance. The present research initiatives highlights the importance and needs of efforts in recognition of printed and handwritten documents written in languages of Indian origin. This paper is aims at reviewing the state of various scripts in use including those from medieval to present era and explores the prospective of digital recognition of handwritten and printed texts and thereby pointing towards futuristic trends in developing restoration software for Indic scripts. While OCRs for Indic scripts like Devanagari has attained good results and still improving the accuracy levels, many medieval and ancient scripts have very little attempts. Challenge is due to the number of languages and their diverse scripts. The scarcity of digitized linguistic resources makes the task a tougher one. The paper also highlights on the characteristics and challenges of recognition of scripts of Indic origin. Largely the digital recognition is limited to simple numerals and isolated characters. The paper enumerates the highest known performance of OCR attempts for important Indic scripts and suggests possibilities of using various approaches including statistical and soft computing for recognizing scripts of medieval times in particular.

References

  1. U. Pal and B.B. Chaudhuri, "Printed Devnagari script OCR system", Knowledge Based Computer Systems : Research and Applications, Ed. K. S. R. Anjaneyulu, M. Sasikumar and S. Ramani, Narosa Publishing House, 1996, pp. 359-371
  2. Veena Bansal, M. K. Sinha, "A Complete OCR for Printed Hindi Text in Devanagari Script", ICDAR, 2001, 2013 12th International Conference on Document Analysis and Recognition, 2013 12th International Conference on Document Analysis and Recognition 2001, pp. 800-804, doi:10.1109/ICDAR.2001.953898
  3. Arora, S.; Bhattacharjee, D.; Nasipuri, M.; Basu, D.K.; Kundu, M., "Combining Multiple Feature Extraction Techniques for Handwritten Devnagari Character Recognition," Industrial and Information Systems, 2008. ICIIS 2008. IEEE Region 10 and the Third international Conference on , vol., no., pp.1,6, 8-10 Dec. 2008
    doi: 10.1109/ICIINFS.2008.4798415
  4. Pal, U.; Sharma, N.; Wakabayashi, T.; Kimura, F., "Off-Line Handwritten Character Recognition of Devnagari Script," Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on , vol.1, no., pp.496,500, 23-26 Sept. 2007
    doi: 10.1109/ICDAR.2007.4378759
  5. Mahmud, J.U.; Raihan, M.F.; Rahman, C.M., "A complete OCR system for continuous Bengali characters," TENCON 2003. Conference on Convergent Technologies for the Asia-Pacific Region , vol.4, no., pp.1372,1376 Vol.4, 15-17 Oct. 2003
    doi: 10.1109/TENCON.2003.1273141
  6. Mandal, S.; Sur, S.; Dan, A.; Bhowmick, P., "Handwritten Bangla character recognition in machine-printed forms using gradient information and Haar wavelet," Image Information Processing (ICIIP), 2011 International Conference on , vol., no., pp.1,6, 3-5 Nov. 2011
    doi: 10.1109/ICIIP.2011.6108911
  7. T. K. Das, A. Datta, S. K. Parui,and B. B. Chaudhuri , Recognition Of Handprinted Bangla Numerals Using Neural Network Models, U. Bhattacharya, Advances in Soft Computing - AFSS 2002, Springer Verlag, Lecture Notes on Artificial Intelligence, Eds. N.R. Pal and M. Sugeno, LNAI 2275, 2002, pp. 228-235.
  8. Bhattacharya, N.; Pal, U., "Stroke Segmentation and Recognition from Bangla Online Handwritten Text," Frontiers in Handwriting Recognition (ICFHR), 2012 International Conference on , vol., no., pp.740,745, 18-20 Sept. 2012
    doi: 10.1109/ICFHR.2012.275
  9. Purkait, P.; Chanda, B., "Off-line Recognition of Hand-Written Bengali Numerals Using Morphological Features," Frontiers in Handwriting Recognition (ICFHR), 2010 International Conference on , vol., no., pp.363,368, 16-18 Nov. 2010
    doi: 10.1109/ICFHR.2010.63
  10. Soman, Soumya T; Nandigam, Ashakranthi; Chakravarthy, V.Srinivasa, "An efficient multiclassifier system based on convolutional neural network for offline handwritten Telugu character recognition,"Communications (NCC), 2013 National Conference on , vol., no., pp.1,5, 15-17 Feb. 2013
    doi: 10.1109/NCC.2013.6488008
  11. Jawahar, C.V.; Pavan Kumar, M.N.S.S.K.; Kiran, S S Ravi, "A bilingual OCR for Hindi-Telugu documents and its applications," Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on , vol., no., pp.408,412 vol.1, 3-6 Aug. 2003
    doi: 10.1109/ICDAR.2003.1227699
  12. Shelke, S.; Apte, S., "A novel multistage classification and Wavelet based kernel generation for handwritten Marathi compound character recognition," Communications and Signal Processing (ICCSP), 2011 International Conference on , vol., no., pp.193,197, 10-12 Feb. 2011
    doi: 10.1109/ICCSP.2011.5739299
  13. Urmila Shinde, Vanita Mane, Rajashree Shedge, “Marathi Character Recognition Using Ant Miner Algorithm”, International Journal of Advanced Computational Engineering and Networking, ISSN: 2320-2106, Volume-2, Issue-10, Oct.-2014, pp.101-107
  14. Kiran R Dahake, S R Suralkar and S P Ramteke. Article: Optical Character Recognition for Marathi Text Newsprint. International Journal of Computer Applications 62(16):11-15, January 2013
  15. S. M. Mali, “Moment And Density Based Hadwritten Marathi Numeral Recognition”, Indian Journal of Computer Science and Engineering (IJCSE), ISSN: 0976-5166 Vol. 3 No.5 Oct-Nov 2012, pp.707-712
  16. A S Ramteke, G S Katkar, “Recognition of Off-line Modi Script : A Structure Similarity Approach”, International Journal of ICT and Management, February 2013 Vol- I Issue –I, ISSN No. 2026-6839, pp.12-15
  17. K. G. AparnaA. G. Ramakrishnan, “A Complete Tamil Optical Character Recognition System”, Document Analysis Systems, Lecture Notes in Computer Science Volume 2423, 2002, Aug 2002, pp. 53-57
  18. Suresh, R.M.; Ganesan, L., "Recognition of printed and handwritten Tamil characters using fuzzy approach," Computational Intelligence and Multimedia Applications, 2005. Sixth International Conference on , vol., no., pp.291,296, 16-18 Aug. 2005
    doi: 10.1109/ICCIMA.2005.47
  19. S. A. Husain and S. H. Amin. “A multi-tier holistic approach for Urdu Nastaliq recognition”. In IEEE Int. Multi-topic Conference, Karachi, Pakistan, Dec. 2002.
  20. Malik, H.; Fahiem, M.A., “Segmentation of Printed Urdu Scripts Using Structural Features Visualisation”, 2009. VIZ '09. Second International Conference in doi: 10.1109/VIZ.2009.12 Publication Year: 2009 , PP: 191 – 195
  21. Inam Shamsher, Zaheer Ahmad, Jehanzeb Khan. Title: “OCR for Printed Urdu Script Using Feed Forward Neural Network”. Conference name: “Proceedings of world academy of science, engineering and technology”, International Journal of Computer, Electrical, Automation, Control and Information Engineering Vol:1, No:10, 2007 volume 23, August 2007, pp 508-513.
  22. Malik, S.; Khan, S.A., "Urdu online handwriting recognition," Emerging Technologies, 2005. Proceedings of the IEEE Symposium on , vol., no., pp.27,31, 18-18 Sept. 2005
    doi: 10.1109/ICET.2005.1558849
  23. Khalil Khan, Rehan Ullah, Nasir Ahmad Khan and Khwaja Naveed. Article: Urdu Character Recognition using Principal Component Analysis. International Journal of Computer Applications 60(11):1-4, December 2012
  24. Brijesh Sojitra, Vishnukumar Dhakad, “Neural Network In Character Recognition Of Gujarati Script”Journal Of Information, Knowledge And Research In Computer Engineering, ISSN: 0975– 6760, Volume – 02, Issue – 02, Pp.269-272
  25. Vasant, A.R.; Vasant, S.R.; Kulkarni, G.R., "Performance Evaluation of Different Image Sizes for Recognizing Offline Handwritten Gujarati Digits Using Neural Network Approach," Communication Systems and Network Technologies (CSNT), 2012 International Conference on , vol., no., pp.270,273, 11-13 May 2012
    doi: 10.1109/CSNT.2012.66
  26. Vijaykumar, B.; Ramakrishnan, A.G., "Radial basis function and subspace approach for printed Kannada text recognition," Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on , vol.5, no., pp.V,321-4 vol.5, 17-21 May 2004
    doi: 10.1109/ICASSP.2004.1327112
  27. Rajput, G.G.; Horakeri, R., "Shape descriptors based handwritten character recognition engine with application to Kannada characters," Computer and Communication Technology (ICCCT), 2011 2nd International Conference on , vol., no., pp.135,141, 15-17 Sept. 2011
    doi: 10.1109/ICCCT.2011.6075175
  28. Vishwaas, M.; Arjun, M.M.; Dinesh, R., "Handwritten Kannada character recognition based on Kohonen Neural Network," Recent Advances in Computing and Software Systems (RACSS), 2012 International Conference on , vol., no., pp.91,97, 25-27 April 2012 doi: 10.1109/RACSS.2012.6212704
  29. Primekumar, K.P.; Idiculla, S.M., "On-line Malayalam handwritten character recognition using HMM and SVM," Signal Processing Image Processing & Pattern Recognition (ICSIPR), 2013 International Conference on , vol., no., pp.322,326, 7-8 Feb. 2013 doi: 10.1109/ICSIPR.2013.6497991
  30. Anil R, Arjun Pradeep, Midhun E M, Manjusha K, “Malayalam Character Recognition using Singular Value Decomposition”, International Journal of Computer Applications,  ISSN:0975 – 8887, Volume 92 – No.12, April 2014, pp-6-11.
  31. Chaudhuri, B.B.; Pal, U.; Mitra, M., "Automatic recognition of printed Oriya script," Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on , vol., no., pp.795,799, 2001
    doi: 10.1109/ICDAR.2001.953897
  32. Pal, U.; Wakabayashi, T.; Kimura, F., "A System for Off-Line Oriya Handwritten Character Recognition Using Curvature Feature," Information Technology, (ICIT 2007). 10th International Conference on , vol., no., pp.227,229, 17-20 Dec. 2007
    doi: 10.1109/ICIT.2007.63
  33. Sukhpreet Singh, Ashutosh Aggarwal, Renu Dhir, “Use of Gabor Filters for Recognition of Handwritten Gurmukhi Character”, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 2, Issue 5, May 2012, ISSN: 2277 128X, pp.234-239.
  34. G S Lehal and Chandan Singh, “A Complete OCR System For Gurmukhi Script” Proceedings  SPR2002, Windsor, Canada, Lecture Notes in Computer Science, Vol. 2248, Springer- Verlag, Germany, 2002, pp. 344-352
  35. Aydin, M.; Celik, E., "Assamese character recognition with Artificial Neural Networks," Signal Processing and Communications Applications Conference (SIU), 2013 21st , vol., no., pp.1,4, 24-26 April 2013, doi: 10.1109/SIU.2013.6531488
  36. Medhi, K.; Kalita, S.K., "Recognition of assamese handwritten numerals using mathematical morphology," Advance Computing Conference (IACC), 2014 IEEE International , vol., no., pp.1076,1080, 21-22 Feb. 2014
    doi: 10.1109/IAdCC.2014.6779475
  37. R. Dineshkumar and J. Suganthi, “Sanskrit Character Recognition System using Neural Network”,  Indian Journal of Science and Technology, Vol 8(1), 65–69, January 2015, ISSN (Print) : 0974-6846,  ISSN (Online) : 0974-5645, pp.65-69
  38.  Chandan Jyoti Kumar, Sanjib Kumar Kalita, “Recognition of Handwritten Numerals of Manipuri Script”, International Journal of Computer Applications (0975 – 8887) Volume 84 – No.17, December 2013, pp.1-5
  39. Romesh Laishram, Angom Umakanta Singh, N.Chandrakumar Singh, A.Suresh Singh, H.James, “Simulation and Modeling of Handwritten Meitei Mayek Digits using Neural Network Approach”,  Proc. of the Intl. Conf. on Advances in Electronics, Electrical and Computer Science Engineering — EEC 2012,  ISBN: 978-981-07-2950-9 doi:10.3850/ 978-981-07-2950-9 769, pp.355-358
  40. Thoudam Doren Singh,  “Bidirectional Bengali Script and Meetei Mayek Transliteration of Web Based Manipuri News Corpus”, Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing (SANLP), pages 181–190, COLING 2012, Mumbai, December 2012,  pp.181-189
  41. D. N. Hakro, A. Z. Talib, Z. Bhatti, G. N. Mojai,  “A Study Of Sindhi Related And Arabic Script Adapted Languages Recognition”,  Sindh University Research Journal,  Vol. 46 (3) 323-334 (2014), pp.323-333
  42. Bashir, R.; Quadri, S., "Identification of Kashmiri script in a bilingual document image," in Image Information Processing (ICIIP), 2013 IEEE Second International Conference on , vol., no., pp.575-579, 9-11 Dec. 2013, doi: 10.1109/ICIIP.2013.6707658
  43. Santosh K.C., Cholwich Nattee, “A Comprehensive Survey On On-Line Handwriting Recognition Technology And Its Real Application To The Nepalese Natural Handwriting”, Kathmandu University Journal Of Science, Engineering And Technolgy Vol. 5, No. I, January, 2009, pp 31-55
  44. Prajwal Rupakheti, Bal Krishna Bal, “Research Report on the Nepali OCR”, Madan Puraskar Pustakalaya, 2009
  45. Fritz E. Froehlich, Allen Kent, The Froehlich/Kent Encyclopedia of Telecommunications: Volume 3, CRC Press, 31-Oct-1991
  46. ChartsBin statistics collector team 2011, Number of Endangered Languages by Country, ChartsBin.com, August, 2015, <http://chartsbin.com/view/1339>.
  47. Uday Narayan Singh, ‘Minor and Minority Languages in India’, in Report by G.N. Devy Sub-Group, Protecting Non-Scheduled Languages, 11th five year plan proposal, Ministry of Human Resource Development, 2006.
  48. Indic writing systems. 2015. Encyclopædia Britannica Online. Retrieved 27 August, 2015, http://www.britannica.com/topic/Indic-writing-systems
  49. UNESCO 2011, Number of endangered languages by country, 2011, United Nations Educational, Scientific and Cultural Organisation Institute for Statistics, Paris, France, 2011,  [http://www.unesco.org/culture/languages-atlas/index.php?hl=en&page=atlasmap].
  50. D. N. Besekar, R. J. Ramteke, “Study for Theoretical Analysis of Handwritten MODI Script – A Recognition Perspective”, International Journal of Computer Applications (0975 – 8887) Volume 64– No.3, February 2013, pp-45-49.
  51. D. N. Besekar, “Recognition Of Numerals Of Modi Script Using Morphological Approach”, Shodhsamiksha Aur Mulyankan, ISSN- 0974-2832 RNI-RAJBIL 2009/29954.Vol.III, Issue-27, pp-63-66
  52. D. N. Besekar, R. J. Ramteke, “A Chain Code Approach for Recognizing Modi Script Numerals”,  Indian Journal of Applied Research, Vol-I, Issue-3, Dec 2011, ISSN-2249-555X, pp-222-225
  53. A. S. Ramteke, G S Katkar, “Recognition of Off-line Modi Script : A Structure Similarity Approach”, International Journal of ICT and Management, February 2013 Vol- I Issue –I, ISSN No. 2026-6839, pp-12-15
  54. Pandey, Anshuman. "Proposal to Encode the Modi Script in ISO/IEC 10646". Unicode Consortium. 2011, http://www.unicode.org/L2/L2011/11212r2-n4034-modi.pdf
  55. Unicode Standard 8.0, Copyright © 1991-2015 Unicode. < http://unicode.org/charts/PDF/U11600.pdf >
  56. Tejinder Singh Saini and Gurpreet Singh Lehal,  “Shahmukhi to Gurmukhi Transliteration System: A Corpus based Approach”,  Advances in Natural Language Processing and Applications Research in Computing Science 33, 2008, pp. 151-162
  57. Kansham Angphun Maring, Dr. Renu Dhir, “Recognition Of Cheising Iyek/Eeyek-Manipuri Digits Using Support Vector Machines” IJCSIT, Vol. 1, Issue 2 (April 2014),   e-ISSN: 1694-2329 | p-ISSN: 1694-2345,  pp-1-6.
  58. B.Anuradha Srinivas, Arun Agarwal, And C.Raghavendra Rao, “An Overview Of Ocr Research In Indian Scripts”,  Ijcses International Journal Of Computer Sciences And Engineering Systems, Vol.2, No.2, April 2008, pp-141-153
  59. R.M.K.Sinha and H.N.Mahabala (1979), Machine recognition of Devnagari script, IEEE Trans. on Systems, Man and Cybernetics, SMC-9, 435-441.
  60. Mahesh Jangid, “Devanagari Isolated Character Recognition by using Statistical features”, International Journal on Computer Science and Engineering, ISSN : 0975-3397 Vol. 3 No. 6 June 2011, pp-2400-2407
  61.  Ankush A.Mohod, Nilesh N.Kasat, “Optical Character Recognition of Printed Text in Devanagari Using Neuro - Fuzzy Integrated System”, International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-3, Issue-7, December 2013
  62. Ladwani, V.M.; Malik, L., "Novel Approach to Segmentation of Handwritten Devnagari Word," in Emerging Trends in Engineering and Technology (ICETET), 2010 3rd International Conference on , vol., no., pp.219-224, 19-21 Nov. 2010
    doi: 10.1109/ICETET.2010.143
  63. Lehal, G.S.; Singh, C., "A Gurmukhi script recognition system," in Pattern Recognition, 2000. Proceedings. 15th International Conference on , vol.2, no., pp.557-560 vol.2, 2000
    doi: 10.1109/ICPR.2000.906135
  64. Jindal, M.K.; Sharma, R.K.; Lehal, G.S., "Structural Features for Recognizing Degraded Printed Gurmukhi Script," in Information Technology: New Generations, 2008. ITNG 2008. Fifth International Conference on , vol., no., pp.668-673, 7-9 April 2008
    doi: 10.1109/ITNG.2008.223
  65. U. Bhattacharya, M. Shridhar, and S.K. Parui, “On Recognition of Handwritten Bangla Characters”, ICVGIP 2006, LNCS 4338, pp. 817–828
  66. Seethalakshmi R.†, Sreeranjani T.R.†, Balachandar T., “Optical Character Recognition for printed Tamil text using Unicode”, Journal of Zhejiang University SCIENCE, ISSN 1009-3095, 2005, pp. 297-1305.
  67. Bindu Philip and R. D. Sudhaker Samuel, “An Efficient OCR for Printed Malayalam Text using Novel Segmentation Algorithm and SVM Classifiers”, International Journal of Recent Trends in Engineering, Issue. 1, Vol. 1, May 2009, pp. 178-182
  68. R Sanjeev Kunte,  R D Sudhaker Samuel, “A simple and efficient optical character recognition system for basic symbols in printed Kannada text”, Sadhana Vol. 32, Part 5, October 2007, pp. 521–533
  69. Apurva A. Desai, “Gujarati handwritten numeral optical character reorganization through neural network,” Pattern Recognition, Volume 43, Issue 7, July 2010, ISSN 0031-3203, pp. 2582-2589,
  70. Sohail Abdul, Sattar Shams-ul, Haque Mahmood Khan Pathan, “A Finite State Model for Urdu Nastalique Optical Character Recognition”, IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.9, September 2009, pp. 116-122
  71. Khalil Khan, Muhammad Siddique , Muhammad Aamir & Rehanullah Khan, “An Efficient Method for Urdu Language Text Search in Image Based Urdu Text”, IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 2, No 3, March 2012 ISSN (Online): pp. 1694-0814.
  72. Sujata S. Magare, Ratnadeep R. Deshmukh, “Offline Handwritten Sanskrit Character Recognition Using Hough Transform and Euclidean Distance”, International Journal of Innovation and Scientific Research ISSN 2351-8014 Vol. 10 No. 2 Oct. 2014, pp. 295-302
  73. National Mission for Manuscripts, [http://namami.org]
  74. Technology Development for Indian Languages (TDIL), Department of Information Technology (DIT), Govt.  India,   [http://www.tdil.mit.gov.in]
  75. Centre for Development of Advanced Computing, Multilingual Computing & Heritage Computing, [http://www.cdac.in/index.aspx?id=mlingual_heritage]
  76. People’s Linguistic Survey of India (PLSI), http://peopleslinguisticsurvey.org/
  77. Matthias Brenzinger, “Language Diversity Endangered”, ISBN, Walter de Gruyter GmbH &  Co, 2007, 978-3-11-017054
  78. Pal, U.; Chaudhuri, B.B., "OCR in Bangla: an Indo-Bangladeshi language," in Pattern Recognition, 1994. Vol. 2 - Conference B: Computer Vision & Image Processing, Proceedings of the 12th IAPR International. Conference on , vol.2, no., pp.269-273 vol.2, 9-13 Oct 1994
    doi: 10.1109/ICPR.1994.576917
  79. Dunn, C.E.; Wang, P.S.P., "Character segmentation techniques for handwritten text-a survey," in Pattern Recognition, 1992. Vol.II. Conference B: Pattern Recognition Methodology and Systems, Proceedings., 11th IAPR International Conference on , vol., no., pp.577-580, 30 Aug-3 Sep 1992
    doi: 10.1109/ICPR.1992.201844
  80. Sharma, D.V.; Lehal, G.S., "An Iterative Algorithm for Segmentation of Isolated Handwritten Words in Gurmukhi Script," in Pattern Recognition, 2006. ICPR 2006. 18th International Conference on , vol.2, no., pp.1022-1025, doi: 10.1109/ICPR.2006.258
  81. Chaudhuri, B.B. ; CVPR Unit, Indian Stat. Inst., Kolkata, India ; Bera, S., Handwritten Text Line Identification in Indian Scripts, Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference 2009, ISBN- 978-1-4244-4500-4, 10.1109/ICDAR.2009.69
  82. Sundaram, S.; Ramakrishan, A.G., "Lexicon-Free, Novel Segmentation of Online Handwritten Indic Words," in Document Analysis and Recognition (ICDAR), 2011 International Conference on , vol., no., pp.1175-1179, 18-21 Sept. 2011 doi: 10.1109/ICDAR.2011.237

Downloads

Published

2018-02-28

Issue

Section

Research Articles

How to Cite

[1]
Bhavesh Kataria, Dr. Harikrishna B. Jethva, " Review of Advances in Digital Recognition of Indian Language Manuscripts , International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 4, Issue 1, pp.1302-1318, January-February-2018. Available at doi : https://doi.org/10.32628/IJSRSET1841215