Dear guest, welcome to this publication database. As an anonymous user, you will probably not have edit rights. Also, the collapse status of the topic tree will not be persistent. If you like to have these and other options enabled, you might ask Admin (Ivan Eggel) for a login account.
 [BibTeX] [RIS]
A New Arabic Printed Text Image Database and Evaluation Protocols
Type of publication: Inproceedings
Citation: slim09:icdar
Booktitle: International Conference on Document Analysis and Recognition (ICDAR 09), July 26 - 29, Barcelona, Spain
Year: 2009
Pages: 946--950
URL: http://www.hennebert.org/downl...
DOI: 10.1109/ICDAR.2009.155
Abstract: We report on the creation of a database composed of images of Arabic Printed words. The purpose of this database is the large-scale benchmarking of open-vocabulary, multi-font, multi-size and multi-style text recognition systems in Arabic. The challenges that are addressed by the database are in the variability of the sizes, fonts and style used to generate the images. A focus is also given on low-resolution images where anti-aliasing is generating noise on the characters to recognize. The database is synthetically generated using a lexicon of 113’284 words, 10 Arabic fonts, 10 font sizes and 4 font styles. The database contains 45’313’600 single word images totaling to more than 250 million characters. Ground truth annotation is provided for each image. The database is called APTI for Arabic Printed Text Images.
Keywords: Benchmarking, Document Engineering, OCR, Pattern Recognition
Authors Slimane, Fouad
Ingold, Rolf
Kanoun, Slim
Alimi, Adel
Hennebert, Jean
Added by: []
Total mark: 0
Attachments
    Notes
      Topics