Project

Spoken Corpus of the Southern-Dutch Dialects (GCND)

Code

319300220

Duration

01 May 2020 → 30 April 2024

Funding

Research Foundation - Flanders (FWO)

Promotor

Anne Breitbarth

Research disciplines

Humanities and the arts
- Corpus linguistics
- Diachronic linguistics
- Dialectology
- Grammar
- Historical linguistics
- Morphology
- Sociolinguistics
- Syntax

Keywords

dialectology dialects southern Dutch dialects corpus corpus linguistics History Linguistics Language technology

Project description

This application proposes the creation of the first corpus of spoken Dutch dialects. The project aims at making accessible a unique collection of dialect recordings from 768 places in Belgium, France and the south of the Netherlands, 740 of them originally recorded between 1963 and 1976, with speakers that are generally non-mobile, rural, unschooled and born around 1900. For the GCND, the recordings are transcribed – urgent in times of rapidly progressing dialect loss! – using a newly developed two-tier protocol, and linguistically annotated (i.e. with information on the word class of the individual words (‘postags’) and the syntactic functions of word groups (‘parsing’)) using existing software tools. Compared to other data collections on Dutch dialects, the GCND will be unique in being based exclusively on spontaneous speech. As the dialect recordings represent a historical stage of the language (in the case of French-Flemish even the last witness of a now all but extinct language variety) and will now finally be searchable for word forms and syntactic patterns, the GCND will (i) make it possible to track language change through time and space, (ii) enable a new perspective on the functional strength of dialect features in real life and (iii) facilitate the serendipitous research of previously unnoticed structures. Audio, transcriptions and annotations will be made available online (with query tools). The GCND will as such form an unparalleled corpus of dialect data.