Project

Exploring BERT for distributional semantics in French Construction Grammar

Code
bof/baf/1y/2026/01/005
Duration
01 January 2026 → 31 December 2026
Funding
Regional and community funding: Special Research Fund
Research disciplines
  • Humanities and the arts
    • Computational linguistics
    • Corpus linguistics
    • Semantics
    • Syntax
Keywords
distributional semantics French BERT productivity
 
Project description

Distributional semantics provides a well-established method for quantifying semantic similarity on the basis of natural language corpora. In research on syntactic productivity, semantic measures derived from this approach complement productivity metrics. The latter only assess lexical generality and diversity, but fail to capture semantic generality or diversity, which is another dimension of  their “openness”.

With the advent of deep learning, traditional count-based distributional semantic models were first challenged by static, type-based neural models such as Word2Vec, and are currently superseded by dynamic transformer-based models that produce token-level embeddings, such as BERT.

The aim of this project is to develop, document and evaluate a pipeline for finetuning BERT (and more specifically one of its French variants e.g. FlauBERT, and CamemBERT) to create embeddings for modelling semantic similarity at word-sense level within grammatical constructions in French.