Impact of Character Models Choice on Arabic Text Recognition Performance
Type of publication: | Inproceedings |
Citation: | fouad10:icfhr |
Booktitle: | 12th International Conference on Frontiers in Handwriting Recognition (ICFHR 2010) |
Year: | 2010 |
Month: | November |
Location: | Kolkata (India) |
URL: | http://www.hennebert.org/downl... |
Abstract: | We analyze in this paper the impact of sub-models choice for automatic Arabic printed text recognition based on Hidden Markov Models (HMM). In our approach, sub-models correspond to characters shapes assembled to compose words models. One of the peculiarities of Arabic writing is to present various character shapes according to their position in the word. With 28 basic characters, there are over 120 different shapes. Ideally, there should be one sub-model for each different shape. However, some shapes are less frequent than others and, as training databases are finite, the learning process leads to less reliable models for the infrequent shapes. We show in this paper that an optimal set of models has then to be found looking for the trade-off between having more models capturing the intricacies of shapes and grouping the models of similar shapes with other. We propose in this paper different sets of sub-models that have been evaluated using the Arabic Printed Text Image (APTI) Database freely available for the scientific community. |
Keywords: | arabic, HMM, machine learning, OCR |
Authors | |
Added by: | [] |
Total mark: | 0 |
Attachments
|
|
Notes
|
|
|
|
Topics
|
|
|