Developing a multilingual, multimodal, and machine learning-driven annotation infrastructure for the analysis, creation, enrichment and cross-cultural comparison of historical formulaic text corpora

01 May 2024 → 30 April 2028
Research Foundation - Flanders (FWO)
Research disciplines
  • Humanities
    • Greek language
    • Latin language
    • Computational linguistics
  • Natural sciences
    • Information technologies
historical formulaic text corpora
Project description

Formulaic texts constitute a core interest of researchers working at UGent. Within the Greek section, there are two major ongoing research projects about such genres, an ERC-project about ‘everyday’ texts such as letters, petitions and contracts, and a GOA-project about poetic paratexts accompanying Byzantine manuscripts. In addition, other formulaic genres, such as inscriptions, are explored by individual researchers working on Greek and other languages, such as Latin, Arabic, Coptic, and Medieval Italian. To various degrees, these projects combine a more traditional, manual annotation approach with innovative, computational annotation methods. We now intend to develop a multilingual, multimodal, and machine learning-driven annotation platform that (i) allows manual annotation and validation of automatic annotation, through communication with APIs integrating machine-learning models, (ii) allows researchers to maximally profit from each other’s expertise and technological advances, thus incentivizing innovative research approaches, and faster, more reliable, and more extensive annotation of corpora, (iii) stimulates collaboration between researchers working on different historical corpora/languages, and within different disciplines and (iv) offers a userfriendly research environment that can feed back data to existing project databases, while at the same time also being open to smaller research projects and individual researchers.