Vol. 7. No. 2 INT September 2003
Return to Table of Contents Return to Main Page

From the Editor

Concordancing is a venerable old idea whose potential has never been reached, according to its most assiduous proponents. Until recently it was assumed that specialized software was required to do concordancing, but it turns out that a search engine such as Google can generate queries into almost limitless corpora (using the Advanced Search feature from the main portal page, for example). This paper by Tom Robb addresses more refined issues regarding the integrity of the data thus derived, and how we might improve on the integrity of that data through more defined searches, as explained here.

Vance Stevens, Editor
On the Internet


Google as a Quick 'n Dirty Corpus Tool

by Thomas Robb
<trobb@cc.kyoto-su.ac.jp>

The Problem

This message appeared on August 6, 2003 on the JALTTALK mailing list, a list for language teachers in Japan.

Date: Wed, 06 Aug 2003 10:28:22 +0900
From: Russell Willis <russe11@eigotown.com>
Subject: [jalttalk 25027] Eiken Question
This from the Eiken test:

Her wedding dress was very a)beautiful b)gorgeous c)wonderful

The correct answer is beautiful -- but can anyone explain exactly why?

The first respondent supplied the standard explanation that "Very" cannot be used with non-gradable adjectives.

Date: Wed, 06 Aug 2003 10:40:07 +0900
From: <app1epie@minos.ocn.ne.jp>
Subject: [jalttalk 25028] Re: Eiken Question

The adverb "Very" rarely co-occurs with strong meaning adjectives, such as wonderful, delicious, fantastic, excellent, magnificent etc, unless you want to mean something sarcastic.

Here is the explanation given on one web site concerning non-gradable adjectives. [-1-]

Unique means "one of a kind." Therefore, comparatives, superlatives, and words like very, so, or extremely should not be used to modify it. If it is one of a kind, it cannot be compared!

Incorrect: He is a very unique personality.

Correct: He is a unique personality.

This same logic applies to other words which reflect some kind of absolute: absolute, overwhelmed, straight, opposite, right, dead, entirely, eternal, fatal, final, identical, infinite, mortal, opposite, perfect, immortal, finite, or irrevocable.

http://englishplus.com/grammar/00000269.htm

While "wonderful" might be considered "non-gradable" since it means "full of wonder" and, if something is "full" it is therefore non-gradable. But then, "beautiful" should mean "full of beauty" and by the same argument should also be ungrammatical. So much for logic! What we need is real data.

In fact, the next poster, then claimed that both "very gorgeous" and "very wonderful" could be found in a Google search. (Petersen:[jalttalk 25029]) While this may be true, anyone can make a web page these days including non-native speakers and first graders. This would seem to make Google useless as a source for such answers—or is it?

Possible Approaches

Google can be used to derive creditable information in a number of ways:

  1. Instances might be found from "impeccable sources" which would authenticate its usage.
  2. We can show, by a careful selection of domains, that the frequency of occurrence of the word or phrase under scrutiny shows up with a similar relative frequency even in domains where one would expect educated usage.
  3. Strength of collocation - We can demonstrate that "very" collocates with the word in question, even in domains indicating educated usage.

Procedure

  • Impeccable Sources

    Project Gutenberg is a huge, volunteer-based effort to create e-texts of material that is out of copyright, particularly works of literature and texts of historical value. See http://www.gutenberg.org for further information.

    To search within a specific domain, two separate techniques were used, neither of which was perfect. [-2-]

    Method 1: In addition to the target phrase in quotation marks, "Project Gutenberg" and "Michael S. Hart" were specified in order to restrict the search to texts, which were part of the project. All such texts contain header material with this information. The largest problem with this technique is that there are multiple sites containing the text, some authorized by the Gutenberg Project, and others not, so the "hit count" is inflated. One could assume, however that all hit counts will be inflated by approximately the same amount and thus will not affect the frequency ratios to any great extent. (= Guten-1 )

    The search string "very beautiful" "Project Gutenberg" "Michael S. Hart" produces something similar to the results shown here:

    Method 2: In addition to the target phrase in quotation marks, site:gutenberg.org was specified. This is a Google-specific search parameter to restrict the search to a specific domain. While this would seem to be ideal, there were two problems, a) the site has sub-sites which contain some, but not all of the same texts perhaps in differing formats, and b) some legitimate texts discovered by method #1, "Main Street" by Sinclair Lewis, for one, failed to be returned. (= Guten-2 ) [-3-]

    The search string "very beautiful" site:Gutenberg.org produces the results are partially shown below:

    I thus decided to search using both techniques and compare the figures derived by each method.

  • Relative frequency of occurrence

    Using a similar technique to that described above, sites ending in " edu " mainly U.S.-based educational entities, as well as their British ( ac.uk ) and Australian ( edu.au ) counterparts, and " .jp ", Japan-based pages in English were searched, as well as pages with no additional criteria in order to search the entire Google database. The total number of hits was recorded. In instances where a number of less than 100 was reported, Google often mentioned that there were more pages with substanitially the same content that were not displayed. In such instances, the lower number was used.

    From the data thus derived, the ratio of hits of each phrase to "very beautiful" was calculated, since "very beautiful" is undeniably an acceptable phrase (and the correct answer in the original examination question). Any variance in the ratio can be explained as a greater or lesser tendency to use the phrase in the domains under investigation. "Window pane," included in the data for comparison purposes, shows an interesting pattern, with three of the academic domains showing a higher ratio of usage than "All Google" or the other columns. This cannot be interpreted to mean that "window pane" is more acceptable, only that for some odd reaason the academic domain has a greater need to refer to "window panes." The ratios presented merely mean that there is a greater or lesser likelihood of this term being needed for the genre distribution of that particular domain. [-4-]

    Very unique was included in the searches since it is a standard example of a usage looked upon with approbation, and one that educated speakers are often aware of and will try to avoid.

    Results

    There were approximately 90 instances of "very wonderful" by authors including one by Mark Twain. For "very gorgeous," I found 11 instances including one by A.A. Milne and another by Herman Melville. See the examples at the end of the paper.

    Frequency data:
     All Google.eduac.ukedu.au.jpGuten-1Guten-2
    very beautiful430,000 13,800 1,490 1,250 5,670 2,750 357
    very gorgeous10,100 78 5 10 106 11 0
    very wonderful56,000 2,210 89 55 961 90 71
    very unique316,000 11,700 323 303 2,130 11 1
    window pane53,7003,1502952891448454
    Table 1 – Frequency data

    Ratio of other phrases to "very beautiful":
     All Google.eduac.ukedu.au.jpGuten-1Guten-2
    very gorgeous0.0230.0060.0030.0080.0190.0040.000
    very wonderful0.1300.1600.0600.0440.1690.0330.199
    very unique0.7350.8480.2170.2420.3760.0040.003
    window pane0.1250.2280.1980.2310.0250.0310.151
    Table 2 – Ratio of other phrases to "very beautiful"
    [-5-]

    Comparing the acceptable form "very beautiful" with both the raw number of hits and the frequency ratios, we can see that very unique is almost as frequent as very beautiful in the overall count, and not far behind in the . edu and . jp domains. The ac.uk and edu.au domains are strikingly similar. Surprisingly, the . jp domain tends to conform closely to the . edu domain, but the cause will have to wait for another paper. Gutenberg authors apparently disdain very unique although examples such as the Sherlock Holmes instance, (" . . . some very unique features") indicates that when suitably modified, the phrases appears to be more acceptable. A true corpus-based study might be able to reveal the exact conditions, which allow it to be construed as "grammatical".

    In the case of very wonderful , the frequencies are similar across all domains although ac.uk and edu.au show approximately one-third the hits of the other domains. Very gorgeous reveals few instances overall. Many of the tokens in the "general Google" search are from porno sites advertising their wares. (non-wears? :-). A second search with the "safe search" filter on yielded 5220 hits, approximately half the original number.

    Strength of Collocation

    The following two charts show the total number of occurrences of the word in question in Google, and the proportion of those occurrences that collocate with "very."

    TotalsAll Google.eduac.ukedu.au.jpGuten-1Guten-2
    beautiful5,920,0001,150,00088,50044,300243,00039,7002,080
    gorgeous1,820,000101,0003,2702,50040,9003,750417
    wonderful8,830,0001,030,00044,80042,00095,80014,6001,600
    unique6,740,0002,640,000439,000145,000298,0003,600482
    Table 3 – Total hits for content words

    Strength of collocationAll Google.eduac.ukedu.au.jpGuten-1Guten-2
    beautiful0.0730.0120.0170.0280.0230.0690.172
    gorgeous0.0060.0010.0020.0040.0030.0030.000
    wonderful0.0060.0020.0020.0010.0100.0060.044
    unique0.0470.0040.0010.0020.0070.0030.002
    Table 4 – Strength of collocation
    [-6-]

    From these data we can see that "wonderful" collocates as highly with "very" in the Gutenberg texts as it does in the overall Google data, which might indicate that educated speakers are being more careful with this phrase than is warranted by past usage as revealed in the historical literature. On the other hand the number of collocates of "very unique" is one magnitude larger in the total Google corpus than in any of the domains we have investigated, indicating that this usage is indeed being spurned by current educated users. "Very gorgeous" shows the same strength of collocation across all domains studied.

    Discussion

    As a corpus tool, using Google in this way has these drawbacks:

    On the positive side,

    The foregoing indicates that there are instances when Google can take the place of a specialized corpus when the main object is to identify whether a particular phrase is used or not and perhaps to indicate to what extent it is used by educated speakers or writers compared to the "general masses" of public web pages.

    References

    Robb, T. (2003) Google as a Corpus Tool? In ETJ Journal, 4:1, Spring 2003 Available: http://www.kyoto-su.ac.jp/~trobb/googleAsConc.html

    Rundell, M. (2000). "The biggest corpus of all", Humanising Language Teaching. 2:3; May 2000. Available: http://www.hltmag.co.uk/may00/idea.htm [-7-]


    Examples from a full search of Google

    I couldn't help staring at the picture with your 2 very gorgeous friends. wow! www.geocities.com/vienna/opera/5193/geobook.html

    I was very excited to take a picture together with a Japanese girl dressed in a very gorgeous kimono in the Sensoji Temple. www-db.stanford.edu/~qluo/html/winter-pict.html

    As You Like It is a very wonderful comedy," says new Festival Theatre artistic director Jack Hickey. www.performink.com/5.9.03/_html/SummerShakes.html

    Venice is a very wonderful city—possibly the most wonderful in the world depts.washington.edu/engl/venice2001.html

    ... oh my god!!! It is so so very gorgeous - I loove Gold Dust and I love everything about the new layout. iridescentglow.com/blog/archives/00000312.html

    After Vanessa's Fashion Walk, Ben Stein gave her 4 stars, commenting, "Very, very gorgeous ;a little bit of a jerky walk." "You have a great look," noted Carol Leifer, ... www.cbs.com/primetime/star_search/model/011503.shtml


    Examples from Project Gutenberg ("Impeccable Sources")

    Instances of very gorgeous

    Love-at-Arms by Raphael Sabatini

    For however much Fanfulla's raiment might have suffered in yesternight's affray, it was very gorgeous still, and in the velvet cap upon his head a string of jewels was entwined.

    An Old-fashioned Girl by Louisa M. Alcott

    Never mind what its name was, it was very gorgeous, very vulgar, and very fashionable; so, of course, it was much admired, and every one went to see it.

    The Innocents Abroad by Mark Twain

    In the great Zoological Gardens we found specimens of all the animals the world produces, I think, including a dromedary, a monkey ornamented with tufts of brilliant blue and carmine hair—a very gorgeous monkey he was-- a hippopotamus from the Nile, and a sort of tall, long-legged bird with a beak like a powder horn and close-fitting wings like the tails of a dress coat. [-8-]

    Daddy-Long-Legs by Jean Webster

    Sallie and Julia and I went shopping together Saturday morning. Julia went into the very most gorgeous place I ever saw, white and gold walls and blue carpets and blue silk curtains and gilt chairs.

    Instances of very wonderful

    Mr. Pim Passes By by Alan Alexander Milne

    Dinah (coming to back of table L.C.). Of course, something very, very wonderful did happen last night. ( Backing away .) No, no! I'm not sure if I know you well enough?( She looks at him hesitatingly .)

    Redburn. His First Voyage by Herman Melville

    But I meant to speak about the fort. It was a beautiful place, as I remembered it, and very wonderful and romantic, too, as it appeared to me, when I went there with my uncle.

    Tom Swift in the Land of Wonders by Victor Appleton

    "But what has he got to do with a wonderful story? Has he written more about the lost city of Pelone? If he has I don't see anything so very wonderful in that."

    Instances of very unique

    Venus in Furs by Leopold von Sacher-Masoch, translated by Fernanda Savage

    Perhaps, after all, there isn't anything so very unique or strange in all your passions

    Main Street by Sinclair Lewis

    They've made the city plant ever so many trees, and they run the restroom for farmers' wives. And they do take such an interest in refinement and culture. So--in fact, so very unique .

    The Return of Sherlock Holmes by Arthur Conan Doyle

    "There are really some very unique features about this case, Watson," said he. [-9-]


    Instances from the Cobuild Corpus (online sampler)

    While the Cobuild Corpus might provide a similar number of instances for each type examined the sources do not have the same cachet of authority as those found in Gutenberg, and thus would probably not be as convincing to other teachers, and in particular, examination maker s.

    Very Beautiful - 40 (Maximum allowed for free)

    Man: [f] They told me you were very beautiful. [f] Woman: [f] Then they are

    to say?' [p] Well, that you're very beautiful [p] Karen: `Mm, what else?' [p]

    wrong. And its ideas are very beautiful. [p] Anti-science comes in a

    five minutes that she's really very beautiful. [p] In this recreation of her

    knows her craft inside out and very beautiful [p] And, er, could she be the

    [p] I could see Will - he's very beautiful , a much more beautiful man than I

    here, there's something very beautiful about it. Chekhov, dramatist in

    Very Gorgeous - 0

    Very Wonderuful - 7

    These are letters to a very wonderful actor and a very handsome man

    something very serious and very wonderful, Bob. And some wonderful people

    wonderful?' [p] It is indeed very wonderful, but one would expect no less

    in such a case it was not so very wonderful- -certainly not disproportionate

    said, `I want to welcome these very wonderful deluxe people." And I was very

    Some of these things were very wonderful , enabling the red haired savages

    Jimi Dean, Tasty Tim and the very wonderful Vicky Edwards. [p] [h] SUNDAY 11

    Very Unique - 7

    to do with it, and this is a very unique and specific acoustic, and you have

    If you want to work with some very unique effects the VINTAGE algorithm

    think it takes, again, a very, very unique individual who can handle the

    damaged. It's a very, very unique institution that I think needs to

    actor; he was an artist. And a very unique one," he says. `He somehow embodied

    playing but also for their very unique style and repertoire. Jean-Marie

    of reality with ideality, is a very unique thought, not found in other

    © Copyright rests with authors. Please cite TESL-EJ appropriately.

    Editor's Note: Dashed numbers in square brackets indicate the end of each page for purposes of citation.

    Return to Table of Contents Return to Top Return to Main Page

    [-10-]