A Respeaking and Collaborative Game-Based Approach to Building a Parsed Corpus of European Spanish Dialects

01 May 2018 → 30 April 2022
Research Foundation - Flanders (FWO)
Research disciplines
  • Humanities
    • Language studies
    • Literary studies
Spanish dialects
Project description

The study of dialectal microvariation of Spanish spoken in Spain has until recently mainly focused on lexical and phonetic features. The morphosyntax of these dialects, on the contrary, remains largely unexplored, despite the recent surge in interest in dialect grammars. This is due to the lack of large annotated dialectal corpora. The proposed project aims to fill this lacuna and will create the first morphosyntactically annotated and parsed corpus of the European Spanish dialects. This dialect corpus will be designed in a geographically balanced way and its material will proceed from the COSER corpus (Corpus Oral y Sonoro del Español Rural `Audible Corpus of Spoken Rural Spanish'), which is the largest collection of oral data in the Spanishspeaking world but which remains largely un-transcribed. As transcribing and annotating are expensive and laborintensive, this project takes a respeaking and collaborative game-based approach to building the parsed corpus of European Spanish dialects. In other words, we intend to obtain automatic transcriptions using a speech recognizer. These will then be processing using Natural Language Processing tools and can then be used to create a crowdsourced game through which members of the public contribute to the co-creation of the parsed corpus by providing annotations in the context of a game.