Component-based Segmentation of words from handwritten Arabic text

J. H. AlKhateeb, J. Jiang , Jinchang Ren, S. Ipson

Research output: Contribution to journalArticle

Abstract

Efficient preprocessing is very essential for automatic recognition of handwritten documents. In this paper, techniques on segmenting words in handwritten Arabic text are presented. Firstly, connected components (ccs) are extracted, and distances among
different components are analyzed. The statistical distribution of this distance is then obtained to determine an optimal threshold for words segmentation. Meanwhile, an improved projection based method is also employed for baseline detection. The proposed method has been successfully tested on IFN/ENIT database consisting of 26459 Arabic words handwritten by 411 different writers, and the results were promising and very encouraging in more accurate detection of the baseline and segmentation of words for further recognition.
LanguageEnglish
JournalInternational Journal of Computer Systems Science and Engineering
Volume5
Issue number1
Publication statusPublished - 2009

Keywords

  • ocr
  • offline recognition
  • baseline estimation
  • word segmentation

Cite this

@article{19db98931a1f4dbab855ab47461d8a08,
title = "Component-based Segmentation of words from handwritten Arabic text",
abstract = "Efficient preprocessing is very essential for automatic recognition of handwritten documents. In this paper, techniques on segmenting words in handwritten Arabic text are presented. Firstly, connected components (ccs) are extracted, and distances among different components are analyzed. The statistical distribution of this distance is then obtained to determine an optimal threshold for words segmentation. Meanwhile, an improved projection based method is also employed for baseline detection. The proposed method has been successfully tested on IFN/ENIT database consisting of 26459 Arabic words handwritten by 411 different writers, and the results were promising and very encouraging in more accurate detection of the baseline and segmentation of words for further recognition.",
keywords = "ocr, offline recognition, baseline estimation, word segmentation",
author = "AlKhateeb, {J. H.} and J. Jiang and Jinchang Ren and S. Ipson",
year = "2009",
language = "English",
volume = "5",
journal = "International Journal of Computer Systems Science and Engineering",
issn = "1307-3699",
number = "1",

}

Component-based Segmentation of words from handwritten Arabic text. / AlKhateeb, J. H.; Jiang , J.; Ren, Jinchang; Ipson, S.

In: International Journal of Computer Systems Science and Engineering, Vol. 5, No. 1, 2009.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Component-based Segmentation of words from handwritten Arabic text

AU - AlKhateeb, J. H.

AU - Jiang , J.

AU - Ren, Jinchang

AU - Ipson, S.

PY - 2009

Y1 - 2009

N2 - Efficient preprocessing is very essential for automatic recognition of handwritten documents. In this paper, techniques on segmenting words in handwritten Arabic text are presented. Firstly, connected components (ccs) are extracted, and distances among different components are analyzed. The statistical distribution of this distance is then obtained to determine an optimal threshold for words segmentation. Meanwhile, an improved projection based method is also employed for baseline detection. The proposed method has been successfully tested on IFN/ENIT database consisting of 26459 Arabic words handwritten by 411 different writers, and the results were promising and very encouraging in more accurate detection of the baseline and segmentation of words for further recognition.

AB - Efficient preprocessing is very essential for automatic recognition of handwritten documents. In this paper, techniques on segmenting words in handwritten Arabic text are presented. Firstly, connected components (ccs) are extracted, and distances among different components are analyzed. The statistical distribution of this distance is then obtained to determine an optimal threshold for words segmentation. Meanwhile, an improved projection based method is also employed for baseline detection. The proposed method has been successfully tested on IFN/ENIT database consisting of 26459 Arabic words handwritten by 411 different writers, and the results were promising and very encouraging in more accurate detection of the baseline and segmentation of words for further recognition.

KW - ocr

KW - offline recognition

KW - baseline estimation

KW - word segmentation

UR - http://www.waset.org/journals/waset/v41/v41-61.pdf

M3 - Article

VL - 5

JO - International Journal of Computer Systems Science and Engineering

T2 - International Journal of Computer Systems Science and Engineering

JF - International Journal of Computer Systems Science and Engineering

SN - 1307-3699

IS - 1

ER -