Second Language Incidental Vocabulary Learning: The Effect of Online Textual, Pictorial, and Textual Pictorial Glosses

December 2009 — Volume 13, Number 3

Second Language Incidental Vocabulary Learning: The Effect of Online Textual, Pictorial, and Textual Pictorial Glosses

Seyed Abdollah Shahrokni
Iran University of Science and Technology (IUST);


This empirical study investigates the effect of online textual, pictorial, and textual pictorial glosses on the incidental vocabulary learning of 90 adult elementary Iranian EFL learners. The participants were selected from a pool of 140 volunteers based on their performance on an English placement test as well as a knowledge test of the target words in the study. Afterward, they were randomly assigned to 3 groups of 30 and subsequently exposed to the research treatment. During 3 sessions of instruction, 5 computerized reading texts including 25 target words were studied. The participants read the texts for comprehension and, at the same time, were able to consult the glosses attached to the target words. Having read each text under each research condition, the participants were tested on their incidental vocabulary learning through two research instruments, word and picture recognition tests. The results of a one-way ANOVA analysis of the data indicate that a combination of text and still images resulted in significantly better incidental vocabulary learning, confirming the Dual-coding Theory (Paivio, 1971, 1990).


Research suggests that a large portion of the vocabulary children learn in their L1 is incidental in nature, a by-product of reading (Huckin & Coady, 1999) or listening (Nagy, Anderson & Herman, 1987) which provides at least three benefits for language learners:

  1. a richer grasp of the contextual meaning and use
  2. the concurrency of the two activities (e.g., reading/listening and vocabulary learning)
  3. a more learner-centered learning process.

Likewise, it is generally accepted that a considerable percentage of learners’ L2 vocabulary is acquired incidentally. Huckin and Coady (1999) highlight the importance of incidental vocabulary learning by referring to several studies indicating that learners gain more vocabulary knowledge through extensive reading with guessing at the meaning of unknown words.

However, despite the obvious advantages, there are also a number of disadvantages for incidental vocabulary learning. For example, research suggests that contextual information is often unclear for language learners to make correct inferences (Bensoussan & Laufer, 1984; Mondria & Wit-de Boer, 1991), leading to learners’ making wrong inferences and, thus, running the risk of learning words incorrectly. Interestingly; however, one of the ways such a disadvantage might be alleviated is by using marginal glosses (Hulstijn, Hollander & Greidanus, 1996), which has proved quite effective in printed materials.

Despite the mixed views (James, 1996) towards the potentials of Computer Assisted Language Learning (CALL), one can consider the element of time being highly influential in judging technology related issues. According to Jones (2000), the availability of many current electronic resources provides numerous opportunities for making texts more comprehensible to learners. Indeed, one of the recent developments in making texts more comprehensible to readers is using computerized glosses/annotations.


According to Lomicka (1998), the concept of glossing “dates back to the Middle Ages when students struggling with a foreign text, usually Latin, produced them as they moved along during the reading process” (p. 41). They are “typically located in the side or bottom margins of a page, [and] are most often supplied for unfamiliar words, which may help to limit continual dictionary consultation that may hinder and interrupt the L2 reading comprehension process” (p. 42). However, this learner-oriented technique was soon adopted by teachers and pedagogues so that they could present a short definition or note for unknown words to facilitate the reading comprehension process for L2 learners. The issue of glossing is by no means a medieval phenomenon now. Leloup and Ponterio (2000) refer to the current status of glossing as: “[T]he cues that appear when the reader clicks on the glossed vocabulary take various forms. Some are text explanations only, generally using a combination of target language and English words. Others are pictorial representations of the meaning of the word or phrase” (p. 7).

Roby’s (1998) taxonomy of the present types and significance of glosses in teaching, serves to almost comprehensively depict the different layers of such contemporary teaching aids. In fact, nowadays researchers consider the usefulness of glosses as a point of departure, and it is investigating the different types that constitute much of the current research (Yoshii, 2006).

Therefore, the following presents a brief review of the related literature on the effectiveness of non-CALL and CALL glosses as used in vocabulary acquisition research.

Non-CALL Glosses

Concerning the possible usefulness of glosses in assuaging the disadvantages of incidental vocabulary learning, several studies have investigated glossed printed materials. Working with American students studying Spanish as an L2, Jacobs, Dufon, and Fong (1994) found that the performance of the gloss condition was significantly better on a vocabulary test administered immediately after the treatment. This study also compared the effectiveness of L1 and L2 glosses which found no significant difference between the two.

Further supporting evidence comes from Hulstijn, et al. (1996) who conducted their research with Dutch students learning French as an L2. This study showed that having access to L1 marginal glosses was more effective than using bilingual dictionaries or, similarly, having no access to dictionaries or marginal glosses.

Watanabe (1997) investigated how text modification and task would affect incidental vocabulary learning. This study, which was carried out with Japanese university students, indicated that the use of L2 glosses in the texts helped the participants retain more vocabulary compared to when they worked with texts containing no modifications, or appositives. This study also established no significant difference in the effectiveness of L1 and L2 glosses. Furthermore, the research compared single- and multiple-choice glosses. The participants were required to choose the correct definition from the two alternatives offered, which revealed no significant difference in the effectiveness of the two types. However, this finding might be slightly different from what Nagata (1999) revealed based on a Japanese courseware program called Banzai Readings (please see the section on CALL glosses).

Working with American students learning German as an L2 in a second semester course, Kost, Foss, and Lenzini (1999) compared three types of L1 glosses, Text-only, Picture-only, and Text-and-Picture. The results indicated that the Text-and-Picture (combination) condition was the most effective of the three types. A similar study by Yoshii and Flaitz (2002), by comparison, examined the learners’ incidental vocabulary learning through incorporating the task into an online computerized environment (please see below).

CALL Glosses

Chun and Plass (1996) investigated the effect of multimedia annotations on the incidental vocabulary learning of 160 university German students. Using their computerized reading program, they conducted three studies, employing a within subjects design. The students used the same version of the program and worked with the program in a realistic L2 learning situation. Afterward, the participants were tested on their incidental vocabulary learning and overall reading comprehension while being free to choose the available annotations. The results indicated that “the recall protocol for visual annotations (that is, words annotated with text and pictures, text and video) was higher than for words annotated with text alone” (p. 189).

Lyman-Hager and Davis (1996) carried out an experiment using their interactive reading program, employing two conditions: computerized reading, and non-computerized reading. The first group had access to multimedia annotations while the other group consulted printed text with the same glosses. After the experiment, a written recall protocol as well as a vocabulary quiz of the target words provided the researchers with the conclusion that students who worked with the multimedia program were better able to retain vocabulary words than students who worked with non-computerized text.

Nagata (1999) examined the single- or multiple-choice glosses as used in a Japanese courseware program. The single-gloss version of the program provided an English translation for each target word or grammatical structure in the reading text, and the multiple-choice version provided two alternative translations in a multiple-choice format followed by immediate feedback on the participants’ choice. The results revealed that the multiple-choice condition significantly outperformed the single-gloss condition, since it helped learners with deeper lexical processing as well as feedback on the errors. The findings of this study can well be compared with those provided by Watanabe (1997, please see the previous section).

Al-Seghayer (2001) examined the effect of dynamic video or still pictures on vocabulary learning. Thirty participants studying at an American university participated in the study. The students were exposed to one of three conditions: textual gloss alone, textual gloss and still pictures, and textual gloss and dynamic video. The participants were subsequently evaluated on their vocabulary gains through recognition and production tests. The results indicated that when learners looked up a combination of video clips and text definitions, they learned unknown vocabulary items better than when they looked up definitions alone or in combination with still images.

Investigating the effect of visible/invisible links on L2 reading, De Ridder (2002) conducted a study with advanced learners of French as a second language. The research was carried out under two conditions, the visible, and invisible links.  In the former, the students read the text with access to highlights on glossed words and, in the latter, the learners were presented with the same text but with no highlights on the glossed words. The results indicated that the participants’ clicking behavior resulted in higher vocabulary gains and, incidentally, did not impair reading comprehension. Furthermore, the two groups were not significantly different in comprehension level, but merely different in their vocabulary gains. In addition, the results of a delayed vocabulary test showed that no significant differences existed in the performance of the two groups.

Yoshii and Flaitz (2002) studied the effect of annotation type on learners’ incidental vocabulary learning. There were 151 adult ESL learners at beginning and intermediate language proficiency levels in their study who read a short story under one of three conditions: text-only, picture only, or text-and-picture (combination). In these three treatments  the  glosses were attached to the verbs in the reading text. Having read the text, the participants were tested on their vocabulary gains by taking immediate, and delayed word and picture recognition, as well as definition-supply tests. The results indicated that the combination group outperformed the other two groups on all measures both in immediate and delayed tests, even though the differences were smaller in delayed tests. This study was a replication of the study by Kost et al. (1991), which was reported in the section on non-CALL studies.

A study carried out by Yeh and Wang (2003) investigated the effect of three types of multimedia glosses, text-only, text and picture, and text, picture and sound, on the incidental vocabulary learning of 82 university students in Taiwan. In addition, the researchers used both L1 (Chinese translation) and L2 (English explanation) in textual glosses. The results indicated that the combination of text and picture was the most effective type of annotation.

Yoshii (2006) compared the effectiveness of L1 and L2 glosses on the incidental vocabulary learning of 195 Japanese university students. There were four groups in the study, L1-text-only, L2-text-only, L1-text-plus-picture, and L2-text-plus-picture. The research instruments were immediate and delayed definition-supply and word recognition tests. However, the results indicated that there were no significant differences between the two language gloss types. Significant differences were found between picture (text-plus-picture) and no-picture (text-only) glosses for definition-supply test. Delayed tests, on the other hand, showed that the L1 text-only group outperformed the L2 text-only and L2 text plus picture groups in recalling the target words.

A more recent study in the field has been carried out by Yanguas (2009) following the theoretical framework of attention (Robinson, 1995). Applying  four treatments, namely textual, pictorial, textual plus pictorial, and a control condition for comparison, with 94 students of fourth semester college-level Spanish, he used think-aloud technique, reading comprehension, recognition, and production measures to investigate the effects of different types of multimedia glosses when the goal was  comprehension of a computerized text. The results indicated that first of all, all the multimedia groups outperformed the control group on noticing and recognition measures. Secondly, there was no significant difference in the performance of the groups on the production measures. Finally, the combination group outperformed all other groups on the comprehension measures. The results of this study suggest that a combination condition is ideal for text comprehension. 

Overall, the studies reported here assigned a positive role to CALL in improving the quality of (incidental) vocabulary learning. Consequently, this study, in line with the theoretical framework of Dual Coding Theory (Paivio, 1971, 1990), attempts to shed light on the effectiveness of textual, pictorial, and textual pictorial glosses in the incidental vocabulary learning of adult Iranian EFL learners at the elementary level. The present study, hence, attempts to address the question: “Is there any significant difference in the incidental vocabulary learning of the participants when exposed to three different modes of multimedia annotations in the course of reading?”



The participants were 90 (n=90) male Iranian EFL learners enrolled in an English as a Foreign Language (EFL) course in Iran, who were selected from an initial pool of 140 volunteers. They were invited to participate in the research by an announcement at a private institute, recruiting Interchange intro students. Although the participants were purportedly homogeneous in terms of their perceived level at the Interchange course, they were given an additional standardized English placement test. They ranged in age from 16 to 22 and had scores ranging from 101 to 109, which indicated that they were elementary-limited users, based on the OPT Language Level specification, or A2 Waystage in keeping with the Common European Framework. Moreover, the participants were assessed based on their knowledge of the target words in the study. Therefore, besides being homogeneous in terms of the level of English language proficiency, total lack of familiarity with the final pool of 25 target words constituted the second criterion for participant selection.


The reading passages used in this study were selected from the book Communicative Reading Skills (CRS) based on the materials prepared by Root and Blanchard (2004) and edited by Mirhassani and Alavi (2004). The texts were checked against Flesch readability formula to guarantee the readability level of texts (please refer to Appendix B and C). The texts used in this study, therefore, enjoyed the readability levels, “fairly easy” and “standard” based on the readability of texts which had been studied by the participants in the interchange course. The target words were the focus of instruction in the passages selected.

In order to gloss the target words in the three modes of instruction, it was required that clear definitions as well as pictures be provided. The textual definitions were extracted from Oxford Learner’s Dictionary (1991) and the pictorial definitions were extracted from the Internet. Not only was great care exercised to find clear and contextually-appropriate textual and pictorial definitions, but further these selections were evaluated by two raters. The following table demonstrates the guessability of pictures as determined by the raters:

Table 1. Inter-rater Reliability of Pictorial Cues

Pictorial Cues Correlation Coefficient Shared
Convertible .99 .9801
Pony .99 .9801
Cone .98 .9604
Snack .95 .9025
Cookies .99 .9801
Pet .97 .9409
Tornado .94 .8836
Storm .97 .9409
Funnel 1.00 1.0000
Basement .92 .8464
Floor .98 .9604
Ravine .89 .7921
Disc .94 .8836
Teammate .85 .7225
Referee .99 .9801
Coach .92 .8464
Athlete .91 .8281
Bike 1.00 1.0000
Competition .87 .7569
Ocean .97 .9409
Brain .99 .9801
Octopus .99 .9801
Crab .99 .9801
Jar 1.00 1.0000
Mammal .90 .81

Reading each text, the participants had the option to consult definitions of the target words by placing the mouse pointer over the bold-faced words. All pages had common design features (please refer to Appendix B).


English Language Placement Test

In order to guarantee the close homogeneity of the groups, the Oxford Placement Test was administered to the participants. The test, which is a commercially developed package, is claimed to grade and place students reliably into appropriate levels. It is divided into two sections, listening and grammar, which take about an hour to complete. The results are interpreted by referring to the test manual. By reference to a 12-column table of level specifications, students can be assigned to levels within the OPT Band, OPT Score, OPT Language Level, Common European Framework Level, ALTE & QPT, UK NQF level, IELTS, Cambridge ESOL Main Suite, Cambridge BEC, Cambridge CELS, TOEFL, and TOEIC.

Word Recognition Test
The tests evaluated the participants on the learning of the target words by presenting them with a written definition. The students had to choose a suitable definition for each target word from the same number of alternatives plus two distracters. The definitions were phrased differently from those used in reading passages, although they conveyed the same meaning. Likewise, the picture recognition test required the participants to choose a related picture for each target word. The pictures were also different from the ones used in the study even though they conveyed the same meaning. Such a safeguard was taken to avoid the participants’ memorizing the definitions as well as pictures encountered in the course of reading (please refer to Appendix A).

The use of the picture recognition test was primarily based on the studies by Kost et al. (1999) and Yoshii and Flaitz (2002). Since this study replicates these studies in some respects, I felt it necessary to include the test as accurately as possible. It might be interesting to see how pictures can convey the meaning of words, or how pictures can help tap participant’s understanding of the meaning of the words, but one wonders how valid this type of test is, since learners might not normally take such a test in real life situations.


One week before the study, a standardized English placement test was administered to the volunteers. Once the researcher made certain that the participants formed a homogenous sample, a pretest examining the knowledge of the target words was administered. The participants were presented with a list of 32 words and were instructed to put a check mark by each word they knew and write down a short definition or synonym in English or Farsi for the checked words. Subsequently, the words which were defined correctly by the participants were discarded from the initial pool of target words, resulting in the elimination of seven words. When the final participants as well as the target words were identified, the participants were randomly divided into three groups of 30 and the texts encompassing the target words were glossed and made online. Three versions of the same texts were designed, each displaying one type of gloss, textual, pictorial or textual-pictorial (combination), as definitions for the target words. During three sessions of instruction, 100 minutes each, the texts were worked on a computer site. For each group, the first session was allocated to the demonstration of the learning medium. The participants were introduced to different parts and components of the website including the entry for reading texts, glosses, test pages, and operating the website.

In the second session, the reading of materials under each condition followed. To each reading passage along with its accompanying test 30 minutes was devoted, 15 minutes to each activity, through a countdown function on the website. The redirection behavior assigned to each page did not interrupt the participants throughout the two activities; therefore, the participants managed to finish each activity within the allocated time. The glosses could be consulted by placing the mouse pointer over the colored boldface words. The following snapshot shows the first reading text in the combination mode of research as the word “pony” has been consulted:

Figure 1

Figure 1. Consulting the Word “pony” in the Combination Mode of Research

When the reading task finished, the participants were redirected to the test page where they were presented with the two main testing instruments, word and picture recognition tests, and, as a safety measure, two reading comprehension items to avoid the participants’ guessing the main concern of the research. Although the test items were displayed on the screen, the participants were to answer the questions on the answer sheets which was distributed towards the end of the reading task. The students were not allowed to look at the text while they worked on the vocabulary tests.


Figure 2. The Test Page

Once the answer sheets were collected, the participants proceeded to the next reading text. Accordingly, by the end of the second session, the participants had studied three texts. The third session followed the same procedure and the two remaining texts were studied.


The data were analyzed using the one-way ANOVA statistical analysis as performed in the environment of the software SPSS 15.0 for Windows. For all the analyses, the alpha level was set at .05.

English Language Placement Test Results

Even though the 140 volunteers had been chosen from a population of Interchange intro students, they were further given a standardized English placement test. As was stated earlier, the 90 students scoring within the range 105-119 were chosen as the final pool of participants in the study.

Posttest Results

After finishing each reading text, the three groups were tested on the immediate recall of the target words via the two instruments, word and picture recognition tests.

Word Recognition Test

The word recognition tests were evaluated based on the number of correct responses. To each correct choice one point was assigned. Comparing the mean scores revealed a contrast in the performance of the three groups, with the combination group (M = 24.17) outperforming both the pictorial group (M = 20.37) and the textual group (M = 17.07).

In order to further investigate whether the differences among the means were statistically significant, a one-way ANOVA analysis was performed on the data. The results indicated that significant differences existed in the performance of the groups, F(2, 87) = 91.77, p < .05. The results of a post hoc Scheffé test indicated that group means significantly differed for the three conditions in the study, that is, the combination group outperformed the other two groups on the word recognition test (M =24.17, SD = 1.11), and the pictorial group (M = 20.37, SD = 2.37) outperformed the textual group (M = 17.07, SD = 2.34).

Picture Recognition Test

The same scoring procedure and statistical analyses were employed in evaluating the picture recognition test. By comparison, the performance of the groups on this test revealed something of an actual difference. The trends in the analysis indicated that significant differences existed in the performance of the groups, F(2, 87) = 335.99, p < .5. Likewise, the combination group outperformed the other two groups on the picture recognition test (M = 24.73, SD = .52), and the pictorial group (M = 23.53, SD = 2.06) outperformed the textual group (M = 13.83, SD = 2.24) on this measure.

Posttest Results by Research Question

Is there any significant difference in the incidental vocabulary learning of the participants when exposed to three different modes of multimedia annotations in the course of reading?

As was demonstrated earlier, the pictorial group outperformed the textual group on both measures. Significant differences were also found in the performance of the two groups on word recognition as well as picture recognition tests. Moreover, the textual-pictorial group outperformed the textual group on both measures and significant differences were also found in the performance of the two groups. Furthermore, it was found that the textual-pictorial group still outperformed the pictorial group on the two measures, although the mean differences were comparatively less than those of the textual group, with the picture recognition test marking the least difference.

Moreover, as a measure of the dispersion of a statistical population, the standard deviation of scores in the three groups was indicative of the effectiveness of the combination gloss type as well.

Table 2. The Standard Deviation of Scores Obtained by the Three Groups

Groups Word Recognition Test Picture Recognition Test
Textual 2.34 2.24
Pictorial 2.37 2.06
Textual Pictorial (Combination) 1.11 .52

Table 2 shows the cross-tabulation of the standard deviations of the scores obtained by the three groups on the word and picture recognition tests. As can be seen, the standard deviation of scores obtained by the combination group was, by comparison, smaller on both measures, indicating that scores tended to be closer to the mean. Also, the variability of scores was smaller on the picture recognition test.


The findings of this study confirmed the previous findings (Al-Seghayer, 2001; Chun & Plass, 1996; Yeh & Wang, 2003; Yoshii & Flaitz , 2002). The results suggested that a combination of textual and pictorial glosses was more beneficial to the learners, possibly due to with the fact that they received two modes of input (Ellis, 1994), namely verbal and visual. The results of this study are similar to those of Yoshii and Flaitz (2002) in that the combination group outperformed the other two groups. In their study; though, the differences between the pictorial group and the textual group were not so significant except as regards the picture recognition test in which the pictorial group had an advantage over the text-only group. This could be due to the fact that Yoshii and Flaitz (2002) were basically examining incidental vocabulary learning via pictorial glosses as “simple line drawings designed to be as culturally and linguistically neutral as possible for foreign language instruction” (p. 38) that were attached to 14 verbs. Interestingly, the two variables in these studies, that is 14 verbs vs. 25 concrete nouns and simple line-drawings vs. high-quality-images, could have potentially resulted in the relatively larger difference found in the performance of the textual and pictorial groups in this study. Research suggests that learning concrete nouns is easier than learning abstract nouns (Kess, 1992; Whitney, 1998) and that learning nouns is easier than learning verbs (Ellis, 1994). Besides, the quality of pictures, which could be simply  the use of color pictures or the use of real-life shots, must have been more effective in triggering the memory and resulting in sounder incidental vocabulary learning. Therefore, the results seem reasonable.

An interesting finding regarding the performance of the pictorial group was that this group performed virtually the same (M=23.53) as the combination group (M=24.73) on the picture recognition test; while, the textual group had a significantly lower mean (M=18.83). This finding seems logical in that the pictorial group was exposed to pictorial glosses, even though the pictures in the test were different from those the students observed in the glosses attached to the target words. Though it was expected that the textual group would, in turn, outperform the pictorial group on the word recognition test, the reverse turned out to be the case and the pictorial group still outperformed the textual group. It is worth mentioning that incidental vocabulary learning is, by comparison, more effective with the use of pictures.

Regarding the variability of scores, it was determined that the combination group had an advantage over the other two groups on both word and picture recognition tests. This is further evidence to support the idea that pictures help foster incidental vocabulary learning.

On the whole, the two instruments indicated that the combination of the two glossing techniques, namely textual and pictorial, was most influential in helping the participants with learning incidental vocabulary.

Implications and Applications

The rationale for using glosses as reading aids is that they free up learners’ working memory by providing the bottom-up function of processing unknown words (Chun, 2006). Furthermore, the provision of such learning aids will make unnecessary continual dictionary searches and the resultant interruption in the course of learner reading.

Some studies (see Knight, 1994; Krashen, 1993) corroborate the idea that reading a text with the purpose of comprehension will help learners retain vocabulary incidentally. The study reported here confirmed the previous findings. Furthermore, it also revealed that learners will indeed learn significantly better when they are provided with more input presentation modes. This is in line with the Dual-coding Theory (Paivio, 1971, 1990), which states that information coded both verbally and visually is more effective for learning than information coded in either form.

The use of marginal glosses has been deemed influential in removing the potential risks of learning words incidentally (Hulstijn et al., 1996) as shown in printed materials. The present study indicated that online glosses can, indeed, alleviate the problems linked with incidental vocabulary learning in online materials as well. This study has some implications for both syllabus designers and decision makers. Since one of the characteristics of vocabulary learning is the sheer size of the task, any means that can lighten this burden for students should be appreciated. Not only are intentional means of learning vocabulary needed, but also incidental ones should be depended upon (Nation, 1999).

Utilizing computers, multimedia and IT has proven to be influential in language teaching in general, and incidental vocabulary learning in particular. Therefore, equipped with sound theoretical knowledge, material designers may create appropriate CALL programs which can promote learning, and subsequently those programs can be used in language classrooms. Of course, such programs ought to be designed based on sound theoretical and pedagogical principles (see Lee, Owens, & Benson, 2002; Plass, 1998).

Language teachers might find the results of this study useful in that it provides further evidence for the importance of multimodality of input presentation. Since the added significance of glosses as teaching and learning aids in incidental vocabulary learning was reconfirmed in this study, teachers might rely upon CALL to unify the two and, thus, enhance the learning experience for language learners. In case language teachers might lack the training or time to write CALL programs, there are some reliable applications available that can be used to gloss reading texts.

Suggestions for Further Research

This study did not distinguish between the learning styles of participants. The rich literature on Individual learner Differences (IDs) suggests, “there is a particularly wide variation among language learners in terms of their ultimate success in mastering an L2” (Dörnyei, 2005, p. 6). Therefore, there is a need to carry out the same study taking into account the participants’ learning styles. Some learners might be visualizers, getting more advantage from pictures; while others, verbalizers, might benefit more from textual materials (Ellis, 1994).

Research suggests that learners with large vocabularies gain benefits more from marginal glosses (Jacobs et al., 1994). As a result, it could be valuable for another study to examine the incidental vocabulary learning of more proficient language learners through the same procedure. Though one of the unique characteristics of this study was the fact that it was carried out with low-level Iranian English language learners.

This study investigated the immediate incidental vocabulary gains of participants. There is a need to further assess the delayed retention of target words after a one/two-week period.

This study examined the incidental learning of twenty-five concrete nouns. Aside from the fact that twenty-five is too small a sample to provide us with airtight proof, a similar study is needed to investigate abstract nouns. Multimedia software can potentially provide more versatile tools for portraying qualities which cannot be drawn in printed materials.

This study also controlled for gender. A similar study could investigate the effect of the three annotation types on the incidental vocabulary learning of female students.

There were a large number of comments, feedbacks, and gestures from students that resulted from throughout the events encountered during the experiment. However, the task of running this quantitative study did not allow the researcher to appreciate such invaluable pieces of qualitative data. It is suggested that another study focus on qualitative aspects of teaching and learning with multimedia CALL programs in general, and multimedia glosses in particular.


This study investigated the effectiveness of three multimedia annotation types, namely textual, pictorial, and textual pictorial, on the incidental vocabulary learning of 90 adult Iranian EFL learners at the elementary level. Like the previous research carried out in the field (Al-Seghayer, 2001; Chun & Plass, 1996; Yeh & Wang, 2003; Yoshii & Flaitz, 2002), the results indicated that a combination of text and still images resulted in significantly better incidental vocabulary learning. This study confirms that “electronic dictionaries and software that provide textual, contextual, and/or multimedia annotations” are part of “main technologies” which support specific components of reading, especially incidental vocabulary learning (Chun, 2006, p.1), and that multimodality (Guichon and McLornan, 2008) in CALL strongly enhances incidental vocabulary learning.


I would like to thank the participants who willingly agreed to take part in this study. Also, my special thanks go to the anonymous reviewers who provided insightful and constructive comments on an earlier version of this paper.

About the Author

Seyed Abdollah Shahrokni holds an M.A. in Teaching English as a Foreign Language from Iran University of Science and Technology. His research interests include Computer Assisted Language Learning (CALL) and aspects of Second Language Acquisition (SLA).


Al-Seghayer, K. (2001). The effect of multimedia annotation modes on L2 vocabulary acquisition: A comparative study. Language Learning and Technology, 5(1), 202-232.

Bensoussan, M., & Laufer, B. (1984). Lexical guessing in context in EFL reading comprehension. Journal of Research in Reading, 7(1), 15-32.

Chun, D. M. (2006). CALL Technologies for L2 Reading. In L. Ducate, & N. Arnold (Eds.), Calling on CALL: From theory and research to new directions in foreign language teaching (pp. 69-98). San Marcos: CALICO.

Chun, D. M., & Plass, J. L. (1996). Effects of multimedia annotations on vocabulary acquisition. The Modern Language Journal, 80(2), 183-198.

De Ridder, I. (2002). Visible or invisible links: Does the highlighting of hyperlinks affect incidental vocabulary learning, text comprehension, and the reading process? Language Learning and Technology, 6(1), 123-146.

Dörnyei, Z. (2005). The psychology of the language learner: Individual differences in second language acquisition. NJ: Lawrence Erlbaum Associates, Inc.

Ellis, R. (1994). The study of second language acquisition. Oxford: Oxford University Press.

Guichon, N., & McLornan, S. (2008). The effects of multimodality on L2 learners: Implications for CALL resource design. System, 36, 85-93.

Huckin, T., & Coady, J. (1999). Incidental vocabulary acquisition in a second language. Studies in Second Language Acquisition, 21(2), 181-193.

Hulstijn, J. H., Hollander, M., & Greidanus, T. (1996). Incidental vocabulary learning by advanced foreign language students: The influence of marginal glosses, dictionary use, and reoccurrence of unknown words. The Modern Language Journal, 80(3), 327-339.

Jacobs, G. M., Dufon, P., & Fong, C. H. (1994). L1 and L2 vocabulary glosses in L2 reading passages: Their effectiveness for increasing comprehension and vocabulary knowledge. Journal of Research in Reading, 17(1), 19-28.

Jones, B. G. (2000). Emerging technologies: Literacies, and technology or trends. Language Learning and Technology, 4(2), 11–18.

Kess, J. (1992). Psycholinguistics: Psychology, linguistics and the study of natural language. Victoria: University of Victoria.

Knight, S. (1994). Dictionary use while reading: The effects on comprehension and vocabulary acquisition for students of different verbal abilities. The Modern Language Journal, 78(3), 285-299.

Kost, C. R., Foss, P., & Lenzini, J. J. (1999). Textual and pictorial glosses: Effectiveness on incidental vocabulary growth when reading in a foreign language. Foreign Language Annals, 32(1), 89-113.

Krashen, S. (1993). The case for free voluntary reading. The Canadian Modern Language Journal, 50(1), 72-82.

Lee, W. W., Owens, D. L., & Benson, A. D. (2002, November 4). Design considerations for web-based learning systems. In Advances in developing human resources (Chap. 2). Retrieved November 20, 2007, from

Leloup, W. J., & Ponterio, R. (2000). On the net literacy: Reading on the net. Language Learning and Technology, 4(2), 5–10.

Lomicka, L. L. (1998). To gloss or not to gloss: An investigation of reading comprehension online. Language Learning and Technology, 1(2), 41-50.

Lyman-Hager, M. A., & Davis. J. A. (1996). Une vie de boy (The life of a boy). Journal of the French Review, 69(5), 775-792.

Mirhassani, S. A., & Alavi, S. M. (2004). Communicative Reading Skills (CRS). Tehran: Zabankadeh Publications.

Mondria, J., & Wit-de Boer, M. (1991). The effects of contextual richness on the guessability and the retention of words in a foreign language. Journal of Applied Linguistics, 12, 249-267.

Nagata, N. (1999). The effectiveness of computer-assisted interactive glosses. Foreign Language Annals, 32(4), 469-479.

Nagy, W., Anderson, R., & Herman, P. (1987). Learning word meanings from context during normal reading. American Educational Research Journal, 24(2), 237-270.

Nation, I. S. P. (1999). Learning vocabulary in another language. Victoria: University of Wellington.

Oxford learner’s pocket dictionary (2nd ed.). (1991). Oxford: Oxford University Press.

Oxford placement test (1st ed.). (2004). Oxford: Oxford University Press.

Plass, J. L. (1998). Design and evaluation of user interface of foreign language multimedia software: A cognitive approach. Language Learning and Technology, 2(1), 40-53.

Richards, J. C. (2005). Interchange intro: Student’s book (3rd ed.). Cambridge: Cambridge University Press.

Robinson, P. (1995). Attention, memory and the “noticing” hypothesis. Language Learning, 45(2), 283-331.

Roby, W. B. (1998). What’s in a gloss: A commentary on Lara L. Lomicka’s To gloss or not to gloss: An investigation of reading comprehension online. Language Learning and Technology, 2(2), 94–101.

Root, C., & Blanchard, K. (2004). Get ready to read. NY: Pearson PTR Interactive.

Watanabe, Y. (1997). Input, intake, and retention: Effects of increased processing on incidental learning of foreign language vocabulary. Studies in Second Language Acquisition, 19, 287-307.

Yanguas, I. (2009). Multimedia glosses and their effect on L2 text comprehension and vocabulary learning. Language Learning and Technology, 13(2), 48-67.

Yeh, Y., & Wang, C. (2003). Effects of multimedia vocabulary annotations and learning styles on vocabulary learning. CALICO Journal, 21(1), 131-144.

Yoshii, M. (2006). L1 and L2 glosses: Their effects on incidental vocabulary learning. Language Learning and Technology, 3(10), 85-101.

Yoshii, M., & Flaitz, J. (2002). Second language incidental vocabulary retention: The effect of picture and annotation types. CALICO Journal, 20(1), 33-58.

© Copyright rests with authors. Please cite TESL-EJ appropriately.

Editor’s Note: The HTML version contains no page numbers. Please use the PDF version of this article for citations.