Visual Document

Context:
- It can be an input to a Visual Document Information Extraction (IE) Task.
- …
Example(s)
- a Visual Form.
- a Visual Invoice Document.
- a Scanned Taxi Receipt, such as: in (Zhao, Niu et al., 2019)
- …
See: Visual Document Representation, PDF File.

References

(Zhao, Niu et al., 2019) ⇒ Xiaohui Zhao, Endi Niu, Zhuo Wu, and Xiaoguang Wang. (2019). “Cutie: Learning to Understand Documents with Convolutional Universal Text Information Extractor.” In: arXiv preprint arXiv:1903.12363.
- QUOTE: ... Extracting key information from documents, such as receipts or invoices, and preserving the interested texts to structured data is crucial in the document-intensive streamline processes of office automation in areas that includes but not limited to accounting, financial, and taxation areas. ...
  ... In this paper, we propose to harness the effective information from both semantic meaning and spatial distribution of texts in documents. ...
  
  Figure 2. Example of scanned taxi receipt images. We provide two colored rectangles to help readers find the key information about distance of travel and total amount with blue and red, respectively. Note the different types of spatial layouts and key information texts in these receipt images.

(Csurka et al., 2016) ⇒ Gabriela Csurka, Diane Larlus, Albert Gordo, and Jon Almazán. (2016). “What is the Right Way to Represent Document Images?. ” arXiv preprint arXiv:1603.01076
- ABSTRACT: In this article we study the problem of document image representation based on visual features. We propose a comprehensive experimental study that compares three types of visual document image representations: (1) traditional so-called shallow features, such as the RunLength and the Fisher-Vector descriptors, (2) deep features based on Convolutional Neural Networks, and (3) features extracted from hybrid architectures that take inspiration from the two previous ones.
  We evaluate these features in several tasks (i.e. classification, clustering, and retrieval) and in different setups (e.g. domain transfer) using several public and in-house datasets. Our results show that deep features generally outperform other types of features when there is no domain shift and the new task is closely related to the one used to train the model. However, when a large domain or task shift is present, the Fisher-Vector shallow features generalize better and often obtain the best results.