Making comparisons between scores on different tests is challenging because test products differ in their design, purpose, and format (Taylor, 2004, Lim et al, 2013) and the greater the difference in design the more problematic the exercise is. Nonetheless, test score users are often interested to know how results on two differing tests may compare.
The inclusion in this document of two separate reports, each reflecting a very different methodology, highlights the need to consider any equivalence estimate from two distinct perspectives:
- Construct.
- Measurement.
The Construct approach typically entails a detailed evaluation of the way in which the tasks and items contained in the test reflect the target language construct. For our test scores to be truly meaningful, we don’t simply focus on the language. Instead, we broaden our focus to the underlying cognitive demands of the test tasks (do they reflect those of the real world of language use) while understanding the impact of the social conditions of language use (particularly relevant for the productive skills, where social parameters such as speaker/interlocutor relationship is always likely to be impact on performance).
The measurement approach compares the scores achieved across the different sections of the test. This allows us to draw comparisons around the measurement relationship between the two, for example, allowing us to answer questions such as how well one test can predict performance on the other.
By combining two studies in which the different approaches have been taken, we would expect to gain a much more nuanced understanding of the relationship between the two tests under investigation than would be the case if only one approach were taken. This is certainly the case in the two studies reported on here.