Information Extraction from Business Documents

Investor logo

Warning

This publication doesn't include Faculty of Arts. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

GELETKA Martin BANKOVIČ Mikuláš MELUŠ Dávid ŠČAVNICKÁ Šárka ŠTEFÁNIK Michal SOJKA Petr

Year of publication 2022
Type Article in Proceedings
Conference Recent Advances in Slavonic Natural Language Processing (RASLAN 2022)
MU Faculty or unit

Faculty of Informatics

Citation
Web fulltext PDF
Keywords OCR; Multi-modal learning; Information extraction; Transformers; Structured Documents
Description Document AI is a relatively new research topic that refers to techniques for automatically reading, understanding, and analyzing business documents. Nowadays, many companies extract data from business documents through manual efforts that are time-consuming and expensive, requiring manual customization or configuration. This paper describes techniques to address these problems, apply them to real-world data, and implement them to an end-to-end solution for automatic information extraction from business documents.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.