Serious games: development of a standard for measuring the effectiveness of games in terms of cognitive learning outcomes

Serious game
01 January 2013 → 31 December 2016
Regional and community funding: IWT/VLAIO
Research disciplines
  • Social sciences
    • Animal experimental and comparative psychology
    • Applied psychology
    • Human experimental psychology
cognitive learning outcomes educational videogames
Project description

1. Introduction

A large heterogeneity in study designs assessing the effectiveness of digital game-based (DGBL) and questions raised regarding reliability and validity of certain study design characteristics has resulted in a need for a more common methodology. The aim of this Ph.D. project was to develop a standardized procedure for assessing the effectiveness of DGBL, with a primary focus on games that target cognitive learning outcomes (i.e., knowledge transfer and not aimed at skill development or attitudinal/behavioral change). In a first phase, a first version of the procedure was developed. For this purpose, firstly study design characteristics of published DGBL effectiveness studies aimed towards cognitive learning outcomes were mapped by means of a systematic literature review (chapter 2). Secondly, we conceptualized and operationalized effectiveness of DGBL by means of a user requirements analysis among relevant stakeholder groups (chapter 3). Thirdly, we defined best practices for assessing the effectiveness of DGBL by means of expert interviews (chapter 4) in order to finalize the first version of the procedure. In a second phase, we tested the feasibility of the procedure by means of experimental studies using this procedure as a guideline in order to further optimize it. Based on our experiences in the second stage, we have developed a second version of the procedure.

2. Summary of main findings

Issues in published effectiveness studies of DGBL

The systematic literature review has pointed towards three main issues in the field of effectiveness research on DGBL. Firstly, heterogeneity exists among separate studies. The main causes of this heterogeneity significantly impact results. More specifically, the three main issues causing heterogeneity are different activities that are implemented in the control group, the different measures that are used to assess effectiveness and different statistical techniques for quantifying learning outcomes. A second issue with the DGBL effectiveness research field are suboptimal study designs as a result of confounds. More specifically, a) the addition of extra elements to the intervention (i.e., required reading, debriefing session, etc.), b) the presence, type (i.e., familiar vs unfamiliar) and role (i.e., supervision, procedural help or guidance) of the instructor during the intervention and c) practice effects as a result of the same test administered pre- and post-intervention. These elements make it difficult to know whether the same beneficial effects would have been found without these elements. A third and last issue with DGBL effectiveness research is that of replication of studies. Very little information is given about how the interventions were implemented (e.g., who was present? In which context was the game played? Was gameplay individual? Etc.), how sampling occurred, how similarity was attained between experimental and control group, which tests were implemented and if tests were developed by researchers themselves, how these were developed.

Conceptualizing and operationalizing DGBL effectiveness  

Results of the user requirements analysis has shown that effectiveness of DGBL is a multidimensional construct consisting of three categories of desired outcomes: learning, motivational and efficiency outcomes. For every category of outcomes, several indicators can be used. For learning outcomes this can be a) an increased interest in the subject matter, b) performance (i.e., on a knowledge test) and c) transfer. Motivational outcomes can refer to a) creating a more enjoyable learning experience compared to current instructional media or b) making learners more motivated to learn using DGBL. Efficiency outcomes refer to a) reducing time for teaching a certain content matter or b) providing a more cost-effective solution for teaching a content matter to a certain group of learners. Higher motivational and efficiency outcomes are not a stand-alone reason to implement DGBL, but should still be related to similar learning outcomes achieved by more traditional media.

Best practices for assessing DGBL effectiveness

In chapter 3 we have defined best practices for assessing the effectiveness of DGBL, based on semi-structured interviews with experts on intervention research coming from the field of psychology and educational science. In this chapter, we have detected several potential areas for improvement in the field of DGBL effectiveness research: the implementation of the intervention and the methods employed to assess effectiveness. Regarding implementation of both the interventions in the experimental and control group, several practices were defined that are preferably avoided during the intervention in order to reduce confounds (such as guidance by the instructor, extra elements that consist of substantive information) and which elements could be allowed (e.g., procedural help, training session). Moreover, variables on which similarity between experimental and control condition should be attained were determined (e.g., time exposed to intervention, instructor, day of the week). With regard to the methods dimension, proposed improvements related to assignment of participants to conditions (e.g., variables to take into account when using blocked randomized design), general design (e.g., necessity of a pre-test and control group), test development (e.g., develop and pilot parallel tests) and testing moments (e.g., follow up after minimum 2 weeks). In sum, this chapter provides best practices that cover all aspects of the study design. While several suggestions have previously been made regarding research design of DGBL effectiveness studies these do not cover all aspects of the research design, such as aspects for which similarity between subjects should be attained between experimental and control group, instructor role and implementation of the intervention.

Empirical findings on DGBL effectiveness

Besides testing the feasibility of and optimizing the procedure, the experimental studies also provided us with some insights regarding the effectiveness of DGBL. Our first feasibility study (chapter 5) has shown that while no significant difference could be found between the group that had learned English vocabulary using DGBL and the group that had received a traditional class by the teacher; at the second post-test -three weeks later- the group that had received the traditional class outperformed the group that was instructed by DGBL. Thus, in the longer term the traditional class proved to be more effective. This supports previously made claims on short term effects in computer-based learning. Moreover, a debriefing session did not add value regarding learning and motivational outcomes to the game-only condition. This goes against part of the literature as it has been suggested that a debriefing is indispensable in digital game-based learning. This has raised a number of questions regarding delineation of DGBL characteristics that require a debriefing. More specifically, nuances regarding game type, complexity of learning content, explicit/implicit learning goals, game characteristics and possibly other factors need to be explored.

In our second feasibility study (chapter 6) we found that adding a pre-test to an effectiveness study of DGBL can influence results as pre-test sensitization only occurs in the group that received a slide-based lecture. More specifically, the participants that received a pre-test before the slide-based lecture had significantly higher post-test scores than the slide-based group that did not receive a pre-test before the lecture. In the game group, no significant differences could be found between those participants that received a pre-test before the game-based training and those that did not. This makes comparison of DGBL and more traditional classes in a pre-test post-test control group design rather difficult, as post-test scores of the traditional class might be positively biased. However, the fact that pre-test sensitization does not occur in the DGBL group also confirms the effectiveness of DGBL, as the interactivity of the game required them to be attentive, regardless of whether they received a pre-test or not before the DGBL intervention. Furthermore, both game groups still outperformed the slide-based group that received a pre-test before the lecture, confirming the higher effectiveness of the game. This study has thus also shown the added value of conducting a Solomon 4-group design in the context of DGBL.

In our third feasibility study (chapter 7), which took place in a corporate context (chapter 7), the interactivity of the game was found not to add value to a passive instructional video that delivers exactly the same content. The motivation rationale behind DGBL did not hold true in this case pointing to the need for careful consideration as to where to use interactive content.

3. Reflection

Based on this dissertation, we can state that effectiveness of DGBL is a complex construct. The several dimensions and sub dimensions defined in chapter 3 can be approached in different ways. An important distinction that comes forward in this dissertation is the difference between absolute effectiveness (i.e., achievement of predefined goals) and relative effectiveness (i.e., comparison of learning, motivational and efficiency outcomes with other instructional media). What type of effectiveness will be required, will ultimately depend on the research question and what type of media are currently available for teaching a certain content matter.

Regarding assessment of DGBL effectiveness, control of as many elements as possible –which is desired in experimental research- can be problematic in the context of DGBL. A first reason for this is the complex environments in which DGBL is often being implemented, such as natural collectives in which one does not always have control over observed and unobserved variables. Another reason why control is not always possible or desirable is that the main rationale behind implementing games as instructional tools is one of motivation. Hence, implementing a game in a controlled lab setting would provide us with limited insight in motivational outcomes as this is a highly artificial environment. The complexity of DGBL effectiveness also results in a trade-off between control and ecological validity. For instance, keeping instructional time equal between experimental and control group is not always desirable. In a context where learners are paid employees, a reduction of training time and as a result, higher cost-efficiency is often a desired outcome. Hence, keeping instructional time equal is incompatible with the efficiency outcomes of DGBL. In such cases, instructional timeshould be treated as an outcome and research should focus on investigating whether learners learn as much or more in less time using the game-based method.

To summarize, a balance between ecological validity and control is thus best achieved by firstly increasing internal validity by reducing the influence of confounding variables (i.e., instructor support, extra material during the intervention, etc.) during implementation of the intervention(s) as much as possible and by keeping potential confounds equal in the experimental and control group (e.g., day and time of the intervention, context of implementation, etc.). If there is an imbalance between groups regarding relevant participant variables that might influence the outcomes, this could be added to the analysis in order to take this difference into account (e.g., differences regarding prior knowledge). External validity can be maximized by ensuring similarity between elements present in the real world implementation environment and the implementation for the effectiveness assessment, such as implementation in a context in which the game is intended to be used, implementation in natural collectives such as existing class groups (i.e., randomization on a classroom level or blocked random assignment), the presence of a familiar teacher in a classroom, the provision of procedural help, etcetera.

4. Conclusion

This dissertation has shown that a more standardized approach for assessing DGBL effectiveness is not only possible, but required. Firstly, a more streamlined approach on outcome measures is now possible by our conceptualization and operationalization of DGBL effectiveness. Secondly, a more standardized approach regarding actual study design characteristics is possible up to a certain point (e.g., pre-test, post-test, follow-up, context that is representative for real-world implementation, procedural help during implementation, etc.). However, for some study design aspects, complete standardization is not possible and will ultimately depend on the research question and needs of the people for whom the study is conducted (e.g., assessment of absolute effectiveness, trade-off between ecological validity and experimental control, etc.). Important here is accurate reporting by researchers in order to provide readers with a more nuanced view on factors potentially influencing the outcomes of interest. Finally, a more standardized approach is not only possible, but required in order to make assumptions of the magnitude of the effect of DGBL by means of effect sizes, make claims on a more general level and further investigate features of DGBL that contribute to its effectiveness