Validation in Language Assessment

November 1999 — Volume 4, Number 2

Validation in Language Assessment

Antony John Kunnan (1998)
Mahwah, NJ: Lawrence Erlbaum Associates
Pp. xiii + 290
ISBN 0-8058-2753-6 (paper)
US $32.50 (also available in cloth, $59.95)

Validation in Language Assessment is a collection of selected papers from the 17th Language Testing Research Colloquium. The volume opens with an introduction to approaches to validation in second and foreign language assessment in published research in the last 15 years. In this article, Antony John Kunnan asserts that the idea of validation has been the center of intense language assessment research. He sheds light on the foci of language assessment researchers by categorizing key studies according to the Messick framework (1989), which recommended that a unified validity framework be constructed. Messick’s progressive matrix of validity details two interconnected facets of the unitary validity concept. One facet is the source of justification of testing, based on either appraisal of evidence or consequence. The other facet is the function of the test, being either interpretation or use. Kunnan’s study reveals what he calls an “imbalance” in the attention researchers have given certain facets of this framework.

Kunnan’s introductory chapter is followed by 11 chapters that are presented in 3 parts: Part I presents four papers that focus on validation through the stages of the test development and test-taking process. Part II presents six papers that focus on validation by examining data from test-taker characteristics and test-taker feedback. Part III presents an analytical assessment of the presentations at past Language Testing Research Colloquiums.

Dorry Kenyon’s chapter is the lead article in Part I, which illustrates the conventional approach to assessment validation research. Kenyon investigates foreign language students’ perceived difficulty in performing various speaking tasks in the American Council on the Teaching of Foreign Languages (ACTFL) Speaking Proficiency Guidelines hierarchy. The purpose of her study is to examine and shed light on the validity of the demands of oral proficiency tasks constructed according to the criteria contained in the ACTFL Guidelines.

Three chapters on test development follow, each one focusing on a single assessment concern: Read on a new test format for vocabulary; Fortus, Coriat, & Fund on item difficulty in reading comprehension; and Wiggelsworth on time planning in language assessment.

John Read’s study is a contribution to the ongoing validation of the word associates format, a selected-response type of test item designed to measure depth of vocabulary knowledge. Read draws on test performance data from New Zealand to provide evidence for concurrent validity of the new format. [-1-]

Ruth Fortus, Rikki Coriat, and Susan Fund examine the difficulty levels of items in the reading section of an English test used in Israel. The purpose of their study is to isolate factors affecting difficulty level so that item pools can be developed in accordance with specific needs. Their major argument is that by isolating these factors, the test developer’s knowledge and understanding of the construct validity of the test will increase.

Gillian Wigglesworth’s study focuses on an important aspect of the test-taking process: the presence or absence of planning time. Her paper describes a study of the effects of planning time on second language oral test discourse in an oral interaction test in Australia. She uses techniques from discourse analysis to examine the nature and significance of differences in the areas of fluency, accuracy,and complexity in the second language.

Section II of Validation in Language Assessment presents six papers that focus on validation by examining test-taker characteristics and feedback. James Purpura’s paper details the development and construct validation of a questionnaire that measures the reported cognitive strategy and cognitive background characteristics of test takers in the United States. Purpura’s primary goal is to design an instrument that allows test takers to report the cognitive strategies they think they use in second language acquisition, use, and testing so that these processes can ultimately be related to second language test performance.

Caroline Clapham’s chapter examines the effect of language proficiency and background knowledge on students’ reading comprehension in the United Kingdom. The aim of Clapham’s study is to investigate the effects of background knowledge on reading comprehension, and to examine whether students should be given reading proficiency tests in their own academic content areas.

April Ginther and Joseph Stevens investigate the internal construct validity of an advanced Spanish-language placement exam in order to determine whether the traditional four-factor examination structure (listening, speaking, reading, and writing) was invariant for certain subpopulations. Ginther and Stevens analyze data drawn from Latinamerican Spanish-speaking examinees as well as Mexican Spanish-speaking, Mexican Spanish-English bilinguals, White English-speaking and Black English-speaking examinees.

Annie Brown and Noriko Iwashita’s paper examines the role of native language background in the validation of a computer-adaptive test. Their study investigates the performance of learners of Japanese from different language backgrounds (native speakers of English, Chinese, and Korean) on a 225-item multiple-choice question computer-adaptive grammar test.

Kathryn Hill investigates the effect of test-taker characteristics on reactions to an oral English proficiency test called the Access Test, used to assess the English language proficiency for prospective migrants to Australia. Hill seeks reactions to the test’s validity by means of a questionnaire.

Bonny Norton and Pippa Stein’s chapter addresses issues of textual meaning, testing, and pedagogy on the basis of data drawn from piloting a college reading exam in English for black students in South Africa. Their findings call into question a number of assumptions about language and language assessment. [-2-]

Liz Hamp-Lyons and Brian Lynch provide what Kunnan calls “a fitting conclusion” to a volume on validation in language assessment. They examine research practices of the second- and foreign-language testing community as seen through the abstracts of papers presented at the Language Testing Research Colloquium throughout the last 15 years. Hamp-Lyons and Lynch focus their analysis on the ways in which test validity and reliability have been addressed in language testing research. This interesting concluding chapter explores the extent to which the Language Testing Research Colloquium community has already engaged itself with newer modes of inquiry beyond the psychometric.

Validation in Language Assessment is an essential read for those working in the language testing community. Kunnan has done an admirable job in selecting the studies represented in this volume as they present diverse approaches to test validity from an international perspective. A particularly valuable part of the book is the annotated list of suggested readings presented at the end of each chapter. Kunnan’s volume is a valuable addition to the body of knowledge in language assessment.


Messick, S. (1989). Validity. In R. Linn (Ed.). Educational measurement (pp. 13-103). New York: Macmillan.

Christine Coombe
Dubai Men’s College-Higher Colleges of Technology

© Copyright rests with authors. Please cite TESL-EJ appropriately.

Editor’s Note: Dashed numbers in square brackets indicate the end of each page for purposes of citation.