Information Extraction from Business Documents

Varování

Publikace nespadá pod Filozofickou fakultu, ale pod Fakultu informatiky. Oficiální stránka publikace je na webu muni.cz.

Autoři	GELETKA Martin BANKOVIČ Mikuláš MELUŠ Dávid ŠČAVNICKÁ Šárka ŠTEFÁNIK Michal SOJKA Petr
Rok publikování	2022
Druh	Článek ve sborníku
Konference	Recent Advances in Slavonic Natural Language Processing (RASLAN 2022)
Fakulta / Pracoviště MU	Fakulta informatiky
Citace
www	fulltext PDF
Klíčová slova	OCR; Multi-modal learning; Information extraction; Transformers; Structured Documents
Popis	Document AI is a relatively new research topic that refers to techniques for automatically reading, understanding, and analyzing business documents. Nowadays, many companies extract data from business documents through manual efforts that are time-consuming and expensive, requiring manual customization or configuration. This paper describes techniques to address these problems, apply them to real-world data, and implement them to an end-to-end solution for automatic information extraction from business documents.
Související projekty:	Aplikovaný výzkum v oblastech vyhledávání, analýz a vizualizací rozsáhlých dat, zpracování přirozeného jazyka a aplikované umělé inteligence Inteligentní back office