Feature Extraction

Several methods have been deeply analysed. The two most efficient of these methods have been improved further, by using complementary techniques. This has led to two new methods which appears to be extremely efficient. On the other hand, vectorization of the characters has appeared to be a failure, in the case of off-line character recognition. As a matter of fact, the writing process is then unavailable.

The first method is called the Averaged Pixel (AP) method. It is the easiest one to make use of, and consists in reducing the size of each character to a normalized dimension. A grid area is superimposed on the image of the character, and the averaged value of the pixels of each area is computed. In order to keep information about the original aspect of the character, the ratio between the initial width and height of the character is also computed and included in the feature vector (figure 1).

Figure 1 - The Averaged Pixel method.

The second method is called the Normalized Contour Analysis method. A contour analysis is performed on the normalized characters obtained by use of the first method. It consists of sending probes up to the character from several directions (figure 2). The length of each probe is the ordinate, according to the search direction, of the first non-background pixel met. In order to get scale invariance and normalized values between 0 and 1, the length of each probe is divided by its highest possible value. In addition, the number of intersections between the layout of the character and vertical and horizontal lines is also taken into account.

Figure 2 - The Normalized Contour Analysis method.

Finally, for each of these feature extraction methods, a Discriminant Analysis is performed, so as to eliminate redundancy and noise, and to provide to the classifier the most efficient features only.

Back to the Technologies page.

Back to the Handwritten Character Recognition Homepage.

Send coments to gosselin@tcts.fpms.ac.be.