Project

Towards an open-source universal dependency treebank for spoken Spanish

Code
01CD12323
Duration
01 October 2023 → 29 February 2024
Funding
Regional and community funding: Special Research Fund
Research disciplines
  • Humanities and the arts
    • Computational linguistics
    • Corpus linguistics
    • Dialectology
Keywords
Part of Speech tagging spoken Spanish Parsing Universal Dependencies
 
Project description
This project aims to develop a treebank for spoken Spanish using the Audible Corpus of Spoken Rural Spain (COSER) transcriptions, aiding advancements in lemmatization, part-of-speech tagging, and parsing. It further proposes to validate lemmas, PoS tags, evaluate the accuracy of existing libraries, and ultimately train a model for spoken Spanish while advancing guidelines for Universal Dependencies relations.