Project

Fine-tuning GPT models for lexicography

Code
bof/baf/4y/2024/01/1120
Duration
01 January 2024 → 31 December 2025
Funding
Regional and community funding: Special Research Fund
Research disciplines
  • Natural sciences
    • Natural language processing
  • Humanities and the arts
    • African languages
    • English language
    • Lexicography
Keywords
Engels Bantoe Groot Taalmodel Lexicografie Generatieve AI
 
Project description

Soon after the release of ChatGPT, the state of the art of generative AI in lexicography was surveyed (cf. de Schryver 2023). If one is to believe that survey, as well as many subsequent studies (esp. Lew and colleagues 2024), generative AI has now made lexicographers, as well as dictionaries themselves, redundant. However, these studies conveniently assume that because it works for English, it will work for any other language. It is time to reveal the truth. Pairing any other language with English only produces look-alikes: the lexicographic material appears to be sound, until one scratches the surface and realises that what was generated is ‘translated English’. When it comes to dictionaries for languages of limited diffusion, the use of existing models mostly produce gibberish. In this research project, various comparisons will be made between out-of-the-box, customisation and fine-tuned GPT models for lexicography, with a focus on monolingual dictionaries for undocumented Bantu languages.