New statistical tools for analysing complex survey data in the social sciences

01 October 2015 → 30 September 2018
Regional and community funding: Special Research Fund
Research disciplines
  • Social sciences
    • Psychometrics
    • Statistics and data analysis
Structural Equation Modeling multilevel data categorical data
Project description

Social research increasingly requires investigators to gather large and complex data that are frequently cross-national in nature. Using large questionnaires, skills, attitudes, and traits are measured and saved in huge datasets. A typical example of a complex dataset is the PISA study (Programme for International Student Assessment), a triennial international survey that aims to evaluate education systems worldwide by testing 15-year-old students’ skills and knowledge. Statistical techniques to analyse these complex data have to adequately deal with the combination of 1) the clustering of students in various countries, 2) the categorical response options of the questionnaires, such as “correct”/ “incorrect”, and 3) missing values in the data when respondents fail to fill in all questions. Unfortunately, the current available techniques fail to adequately deal with complex data and as a consequence, researchers often adopt suboptimal analysis techniques. In this project, I aim to develop new statistical techniques to analyse large and complex data in a correct and practical manner. To this end, I will 1) develop a general statistical method for analysing complex data, 2) provide solutions for clustered data with missing values and develop fit measures to indicate whether a model fits the data, and 3) develop freely available software for researchers. Moreover, this research will lead to clear guidelines for researchers in social sciences dealing with large and complex data.