Investigating the added value of machine learning and Twitter data for public health organizations during COVID19

01 September 2020 → 31 August 2024
Regional and community funding: Special Research Fund
Research disciplines
  • Social sciences
    • Market research
    • Marketing models
    • Data collection and data estimation methodology, computer programs
    • Mathematical and quantitative methods not elsewhere classified
social media Artificial Intelligence Machine Learning
Project description

The aim of this research project is to provide a decision support system to public health organizations that enables them to respond to several issues that arise during crisis situations, such as the one caused by COVID19. The current COVID19 crisis is a pandemic that has led most countries around the world to be in lock down. Such circumstances can lead to public unrest and it is therefore crucial that governments and health authorities respond appropriately. Two aspects seem to be especially relevant. First, governments should monitor the population’s wellbeing as crisis situations can have a significant impact on mental health. Second, authorities should ensure that citizens get correct information by identifying and rectifying rumours. With this research project we want to show how these two aspects can be fulfilled by analysing public tweets regarding COVID19. During previous pandemics, To do so, we have been collecting public Twitter data regarding COVID19 since January 2020. Since we are mainly interested in the content of the tweet, we only collected information related to the tweet itself. Regarding the user, we only gathered an anonymized user ID. Twitter was used to spread information as well as opinions and experiences. Therefore, tweets can be used for real-time analysis, allowing governments to gain insights on on-going issues and respond accurately. A first objective of this project is to investigate the impact of different events (e.g., lockdown) for different industries (e.g., supermarkets, hair dressers, among others) on wellbeing. To do so, we will identify the relevant events and industries from the text of tweet. We will use the content of the tweet to explain the impact of several events on wellbeing in different industries. We would like to emphasize that wellbeing will be analysed on industry-level and no individual user-level analyses will be conducted. A second objective is to focus on the detection of  the rumours about COVID19. A rumour in this case is defined as a piece of information that has not yet been confirmed by government officials, but is already spread on social media. For example, in the beginning of the pandemic several false rumours were spread about the symptoms of the corona virus. In our project, we will focus on detecting tweets that are identified as rumours. To do so, we will build a classification model that will predict whether or not a tweet is a rumour based on several linguistic, content-based, and network features. We will use state-of-the-art deep learning methods to successfully classify tweets as rumours. We would like to stress that no individual identifiable user-level data will be used to classify a tweet as a rumour.