Keynote Speech: From standards to operational rating criteria: exploring the issue of context in scale design
Speaker: Ute Knoch, University of Melbourne
|Associate Professor Ute Knoch is the Director of the Language Testing Research Centre at the University of Melbourne. Her research interests are in the area of writing assessment, rating processes, assessing languages for academic and professional purposes, and placement testing. She has been successful in securing grant funding, including grants from the Educational Testing Service in the US, IELTS, the British Council, Pearson and the Australian Research Council. She was the Co-president of the Association for Language Testing and Assessment of Australian and New Zealand (ALTAANZ) from 2015-2016 and has been serving on the Executive Board of the|
International Language Testing Association (ILTA) from 2011 to 2014 and again since 2017. In 2014, Dr Knoch was awarded the TOEFL Outstanding Young Scholar Award by the Educational Testing Service (Princeton, US), recognizing her contribution to language assessment. In 2016, Dr Knoch was awarded a Thomson Reuter Women in Research citation award.
Abstract: One function of language framework documents, also called language standards, is to provide those working in language testing and assessment contexts with an expected learning progression which can be used both for the design of tests and with the process of scoring. The Common European Framework of Language (Council of Europe, 2001), as an example of a language standards document adopted in many contexts globally, states, for example, that one of its possible uses is ‘for stating the criteria for the attainment of a learning objective, both in relation to the assessment of a particular spoken or written performance, and in relation to continuous teacher-, peer- or self-assessment’ (Council of Europe, 2001, p. 19). However, the CEFR level descriptors have been criticized for containing impressionistic terms and inconsistencies across levels (Alderson, 2007; Fulcher, 2012), and as being too generic in nature. Galaczi et al. (2011) found that the CEFR descriptors were unusable as ready-made rating instruments (see also Weir, 2005). This is of course not surprising, considering that the frameworks such as the CEFR are designed to be abstract and applicable across a range of contexts (Fulcher, 2004) and have a reporting function, rather than being designed for scoring (Harsch & Hartig, 2015). Rating criteria designed for operational rating, regardless of whether they are related to a framework document such as the CEFR, have traditionally been divided into two groups, (1) those designed drawing on intuitive methods and (2) those created using a data-driven design (Fulcher, 2003; COE, 2001). It has been argued that the latter of these methods is more sensitive to the local testing context, as criteria are designed on the basis of real learner performances, while the former method is less context-sensitive. It is clear from this description that the issue of context is highly relevant to the design of rating scale criteria.
In this talk, I will explore the issue of context in the design of rating criteria. I will discuss different aspects of context that are relevant to this, and explore how these may come into play both in the design and adaption of operational scale criteria. I will do this by drawing both on the increasing literature on rating scale design and on a range of projects I have worked on, in a variety of contexts. I will argue that context-sensitive criteria can both increase the precision of the inferences that can be made based on the test scores or threaten generalisability, depending on what contextual issues are considered.