Named Entity Discovery and Alignment in Parallel Data.

Varování

Publikace nespadá pod Filozofickou fakultu, ale pod Fakultu informatiky. Oficiální stránka publikace je na webu muni.cz.
Autoři

NEVĚŘILOVÁ Zuzana

Rok publikování 2025
Druh Článek ve sborníku
Konference Proceedings of the 17th International Conference on Agents and Artificial Intelligence (ICAART 2025)
Fakulta / Pracoviště MU

Fakulta informatiky

Citace NEVĚŘILOVÁ, Zuzana. Named Entity Discovery and Alignment in Parallel Data. Online. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence (ICAART 2025). Volume 3. Porto (Portugal): SCITEPRESS – Science and Technology Publications, Lda., 2025, s. 1215-1220. ISBN 978-989-758-737-5.
www https://www.insticc.org/node/TechnicalProgram/ICAART/2025/presentationDetails/133113
Klíčová slova Named Entity Recognition; Named Entity Alignment; Named Entity Discovery; Named Entity Linking
Popis The paper describes two experiments with named entity discovery and alignment for English-Czech parallel data. In the previous work, we enriched the Parallel Global Voices corpus with named entity recognition (NER) for both languages and named entity linking (NEL) annotations for English. The alignment experiment employs sentence transformers and cosine similarity to identify NE translations from English to Czech and possibly other languages. The discovery experiment uses the same method to find possible translations between named entities in English and Czech n-grams. The described method achieves an F1 score of 0.94 in finding alignments between recognized entities. However, the same method can also discover unknown named entities with an F1 score of 0.70. The result indicates the method can be used to recognize named entities in parallel data in cases where no NER model is available with sufficient quality.
Související projekty:

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.