Pre-processing Large Resources for Family Names Research
Authors | |
---|---|
Year of publication | 2016 |
Type | Article in Proceedings |
Conference | RASLAN 2016 Recent Advances in Slavonic Natural Language Processing |
MU Faculty or unit | |
Citation | |
Web | PDF full paper |
Field | Informatics |
Keywords | DEB platform; lexicography; big data; family names; data conversion |
Description | This paper describes methodology and tools used to pre-process historical archive documents in various formats and their conversion to unified format. Resources were used to investigate the origins and geographical distribution of surnames in the United Kingdom, as part of the Family Names in Britain and Ireland research project. Data extracted from the documents and their connection proved to be valuable research resource which helped to speed up the lexicographic work. |
Related projects: |