The CHILDES Project (3rd ed.)

Vol. 5. No. 1

R-19

April 2001

The CHILDES Project (3rd ed.). Volume I: Tools for Analyzing Talk: Transcription Format and Programs

Brian MacWhinney (2000)
Mahwah, NJ: Lawrence Erlbaum
Pp. ix + 385
ISBN: 0-8058-2995-4
US $49.95 (cloth)
The CHILDES Project (3rd ed.). Volume II: Tools for Analyzing Talk: The Database

Brian MacWhinney (2000)
Mahwah, NJ: Lawrence Erlbaum
Pp. viii + 418
ISBN: 0-8058-3572-5
US $49.95 (cloth)
(CD-ROM also available, US $39.95)

According to Brian MacWhinney (1995), "the dream of establishing a system for sharing child language transcript data has a long history . . . [starting from] mimeographed copies of Roger Brown's original Adam, Eve, and Sarah transcripts . . ." (p. ix). Studies of child language acquisition, especially diary studies of individual children made by their parent(s), have an even longer history. The question of how children learn their language has intrigued humankind throughout the ages. Unfortunately, one of the limitations of diary and small group studies is the extent to which one can generalize the findings to larger populations.

"Conducting an analysis on a small and unrepresentative sample may lead to incorrect conclusions. Because child language data are so time-consuming to collect and to process, many researchers may actually avoid using empirical data to test their theoretical predictions. Or they may try to find one or two sentences that illustrate their ideas, without considering the extent to which their predictions are important for the whole of the child's language. In the case of studies of pronoun omission, early claims based on the use of a few examples were reversed when researchers took a broader look at larger quantities of transcript data" (vol. 1, p. 3).

With advances in computer technology over the last twenty years, the Child Language Data Exchange System (CHILDES), first conceived in 1981, now provides researchers with a means to overcome the limitations of small group studies by providing access to approximately 300 million characters (300 megabytes) of child language data, contributed by researchers around the world. With CHILDES, researchers can code and store transcript data in a standardized format, which can then be analyzed and shared. This technology greatly increases researchers' potential for testing theoretical concepts and identifying general or specific tendencies in language acquisition.

The CHILDES corpora are divided into five major directories: English data, non-English data, data from clinical populations, and data from bilingualism and second-language acquisition (SLA). Transcripts include data on the learning of over 26 different languages, including bilingual acquisition data. Additionally, the database includes a bibliography of work on child language and the MacArthur Communicative Development Inventory (CDI) database. [-1-]

Brian MacWhinney has provided a comprehensive discussion and explanation of the CHILDES Project in his two new books: The CHILDES Project (3rd ed.), Volume I: Tools for Analyzing Talk: Transcription Format and Programs and The CHILDES Project (3rd ed.), Volume II: Tools for Analyzing Talk: The Database. The issue of the limitations of small group studies is also problematic in some areas of research in second language acquisition, especially classroom-focused research or studies of individual learners. This makes these volumes of interest to SLA researchers, as well as to those studying child language acquisition.

This two-volume third edition is a detailed and comprehensive handbook and manual for the user of the CHILDES database. Volume I provides a more detailed look at the transcription tools and formats required for using the database than what was available in the second edition (MacWhinney, 1995).

Volume I is divided into two parts: Transcription Format and The Programs. Part 1 (Transcription Format) is organized into the following sections: "Introduction," "Principles," "CHAT Outline," "File Headers," "Words," "Morphemes," "Utterances," "Scoped Symbols," "Dependent Tiers," "CA Transcription," "Signed Languages (BTS)," "Extending CHAT," "UNIBETS," "Error Coding," "Speech Act Codes," "Morphosyntactic Coding," "Word Lists," "Recording Techniques," "Symbol Summary," "References," and "Index." Part 2 (The Programs) is organized into the following sections: "Introduction," "Tutorial," "The Editor," "Features," "Analysis Commands," "Options," "Exercises," "References," and "Index."

This format, while quite informative, is a bit confusing, as the book is literally divided into two parts, with part 2 beginning halfway through the book, and page numbering re-starting at 1. Since this format is not clearly explained at the beginning of the volume, some readers might never find the second section, and, of course, this arrangement could create confusion among users attempting to make use of the index (e.g., which page 45?).

Volume II provides in-depth descriptions of the various contributions to the database, including the contributors, a detailed sketch of the subjects and type of data, including information on the participants, their ages, other family members, the recording conditions, interactional contexts, transcription conventions, and coding methods. The data files themselves are included on an accompanying CD-ROM. Contents include "Introduction," "English Corpora" (42 different contributions), "Bilingual Corpora" (15 different contributions), "Clinical Corpora" (19 different contributions), "Narrative Corpora" (5 different contributions), "Germanic Languages" (13 different contributions, including speakers of Danish, Dutch, German and Swedish), "Romance Languages" (19 different contributions, including speakers of Catalan, French, Italian, Portuguese, and Spanish), and "Other Languages" (21 different contributions, including speakers of Cantonese, Estonian, Greek, Hebrew, Hungarian, Irish, Japanese, Mambilla, Mandarin, Polish, Russian, Tamil, Turkish, and Welsh).

Volume II would of interest to researchers looking for specific types of data (e.g., child narratives or age- or language-specific acquisition sequences). For researchers interested in SLA and/or working with clinical populations, this book also clearly identifies the cross-linguistic and clinical corpora available. Cross-linguistic and bilingual acquisition data could be of interest to various types of readers, including language and other teachers in the primary and elementary levels, researchers exploring similarities between first and second language acquisition processes, and so forth, facilitating comparisons between small research populations.

In summary, Brian MacWhinney has provided a comprehensive discussion and explanation of the CHILDES Project in these books. The issues addressed should make them of interest to SLA researchers, as well as to the main audience of child language researchers.

Reference

MacWhinney, B. (1995). The CHILDES project (2nd ed.). Mahwah, NJ: Lawrence Erlbaum.

Karen Woodman
University of New England, Australia
<kwoodman@metz.une.edu.au>

Editor's Note: Dashed numbers in square brackets indicate the end of each page for purposes of citation..

Return to Table of Contents

Return to Top

Return to Main Page

[-2-]