2004 AutomaticWriterIdentification

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Large Multiclass Classification Task.

Notes

Quotes

Abstract

  • In this paper, a new technique for offline writer identification is presented, using connected-component contours (COCOCOs or CO^3s) in uppercase handwritten samples. In our model, the writer is considered to be characterized by a stochastic pattern generator, producing a family of connected components for the uppercase character set. Using a codebook of CO^3s from an independent training set of 100 writers, the probability-density function (PDF) of CO^3s was computed for an independent test set containing 150 unseen writers. Results revealed a high-sensitivity of the CO^3 PDF for identifying individual writers on the basis of a single sentence of uppercase characters. The proposed automatic approach bridges the gap between image-statistics approaches on one end and manually measured allograph features of individual characters on the other end. Combining the CO^3 PDF with an independent edge-based orientation and curvature PDF yielded very high correct identification rates.

==

  • Objection 4. Currently, powerful methods for class separation exist, such as the multi-layer perceptron (MLP) and the support-vector machine (SVM). One would expect that the use of these methods will yield higher performances than reported on the simple distance measures and nearest-neighbor search.
  • Reply. The use of a technique like the SVM is not trivial in the writer-identification problem. The amount of writers in a realistic problem may exceed the number of 20000. Training writer-specific SVMs, using, e.g., a one-vs-others training scheme becomes prohibitive. A more realistic solution would entail the use of a trained distance function between two given sample feature vector. Although the idea of trained distance functions as such is appealing, preliminary experiments revealed that the results where not much better than those obtained by nearest-neighbor search. The number of contrasting classes (writers) is large, and it is difficult to find a distance function which suits all local sample configurations with a smooth margin separating ’near’ (same-identity) from ’far’ (different-identity) samples. At this moment, the combination of a comparable or lower performance with the additional cost of training efforts and additional parameters seems unattractive. However, more research is needed here, indeed.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2004 AutomaticWriterIdentificationLambert Schomaker
Marius Bulacu
Automatic Writer Identification Using Connected-Component Contours and Edge-based Features of Uppercase Western Scripthttp://dx.doi.org/10.1109/TPAMI.2004.1810.1109/TPAMI.2004.18