Project

A Respeaking and Collaborative Game-Based Approach to Building a Parsed Corpus of European Spanish Dialects

Code

319107818

Duration

01 May 2018 → 30 April 2022

Funding

Research Foundation - Flanders (FWO)

Promotor

Veronique Hoste

Research disciplines

Humanities and the arts
- Language studies
- Literary studies

Keywords

Spanish dialects Communication Language technology Linguistics

Project description

The study of dialectal microvariation of Spanish spoken in Spain has until recently mainly focused on lexical and phonetic features. The morphosyntax of these dialects, on the contrary, remains largely unexplored, despite the recent surge in interest in dialect grammars. This is due to the lack of large annotated dialectal corpora. The proposed project aims to fill this lacuna and will create the first morphosyntactically annotated and parsed corpus of the European Spanish dialects. This dialect corpus will be designed in a geographically balanced way and its material will proceed from the COSER corpus (Corpus Oral y Sonoro del Español Rural `Audible Corpus of Spoken Rural Spanish'), which is the largest collection of oral data in the Spanishspeaking world but which remains largely un-transcribed. As transcribing and annotating are expensive and laborintensive, this project takes a respeaking and collaborative game-based approach to building the parsed corpus of European Spanish dialects. In other words, we intend to obtain automatic transcriptions using a speech recognizer. These will then be processing using Natural Language Processing tools and can then be used to create a crowdsourced game through which members of the public contribute to the co-creation of the parsed corpus by providing annotations in the context of a game.