In this case, the analysis began with a linguistic description of Diamond’s quotes (i.e., the quotations attributed to Daniel Wemp as his spoken words in the 4/21/08 New Yorker article). Those quotes were compared to an independent record of Daniel Wemp’s actual speech, based on verbatim transcripts of spoken interviews collected by Rhonda Roland Shearer. Quantitative analyses of the New Yorker quotes and the Daniel Wemp (DW) transcripts were carried out, to identify grammatical features that were frequent or rare.
Then, those quantitative findings were compared to previous large-scale corpus analyses of conversation and academic writing, to determine whether the New Yorker quotes were typical of the language normally used in conversation. The results were surprising: in many respects, the New Yorker quotes are much more similar to the language typically used in academic writing than to normal conversation.
Most of the corpus analyses used for this comparison were taken from the 1,200-page Longman Grammar of Spoken and Written English (LGSWE; Biber et al., 1999). The research for the LGSWE was based on analysis of a very large corpus that represents four major varieties: conversation, fiction writing, newspaper writing, and academic writing. For example, the sub-corpus for conversation includes approximately 6.4 million words, produced by thousands of speakers. The sub-corpus for academic writing includes 5.3 million words from 408 different texts. Computational / quantitative analyses of these corpora allow us to make strong generalizations about the grammatical characteristics that are frequent or rare in conversation, contrasted with the features that are frequent/rare in academic writing. The detailed research findings in the LGSWE can be applied to characterize the linguistic style of individual texts, to describe the extent to which the language of that text is typical of conversation or academic writing.
This linguistic analysis shows that the Diamond quotes (language attributed to Daniel Wemp in the 4/21/08 New Yorker article) are atypical of speech. Rather, these claimed quotes contain numerous grammatical constructions that are common in formal academic writing but very rarely used in normal speech. Further, those same grammatical constructions are not used in the verbatim transcripts of actual speech produced by Daniel Wemp (referred to as DW below).
Taken together, the linguistic analyses indicate that it is extremely unlikely that the New Yorker quotations are accurate verbatim representations of language that originated in speech. To put it simply, normal people do not talk using the grammatical structures represented in these quotations. However, these quotations do include several grammatical structures found commonly in academic writing, suggesting that the quotations were produced in writing rather than being transcribed from speech.
Grammatical characteristics of the quotations in the New Yorker article
Certain characteristics of conversation are easy to notice, and so almost any portrayal of speech will include these features. For example, even unskilled novelists are certain to include these stereotypical features in their fictional dialogue. Contractions are probably the most noticeable feature of speech (e.g., it’s, he’s, I’m), and the quotations in the New Yorker article are typical of speech (and fictional dialogue) in that they incorporate numerous contractions
The use of simple coordinators (especially and) is also a salient characteristic of conversation, and the Diamond quotes incorporate frequent use of that feature. One major function of and is to connect clauses, and this use occurs frequently in both the Diamond quotes and in the actual transcribed speech of DW. For example:
If you die in a fight, you will be considered a hero, and people will remember you for a long time.
I have given all these story and those stories are very true and those names are not fake.
However, many other grammatical characteristics are less apparent to the casual observer. This is where corpus analysis can be useful: to identify the grammatical features that are actually common or rare in conversation, especially features that would go unnoticed otherwise.
For example, Jared Diamond’s quotes frequently employ the coordinator and (as well as but) in two different ways: 1) to connect clauses, and 2) to connect two adjectives. As noted above, corpus research shows that the first use is in fact very common in conversation. However, the second grammatical pattern is rare in actual conversation, although it is common in formal writing. Examples from the Diamond quotes are:
my father was felt to be too old and weak
quick but correct decisions
my tall and handsome uncle
This grammatical pattern is rare in both the actual transcribed speech of DW and in the corpus of conversation generally. Thus, even in the use of the coordinator and (and but), the Diamond quotes are more similar to written language than to actual speech.
The noun phrase structures found in the Diamond quotes are especially atypical of normal speech. Corpus research shows that many of these structures are extremely rare in normal conversation, while they are quite common in academic writing.
In normal conversation, a majority of noun phrases are realized as pronouns. The Diamond quotes (and the transcribed speech of DW) are typical of conversation in that they use numerous pronouns.
However, the Diamond quotes are atypical of conversation in that they also include numerous noun phrase structures that are extremely rare in conversation. Most of these structures are also rare (or unattested) in the actual transcribed speech of DW. One structure of this type is noun phrases that have adjectives as modifiers, referred to as ‘attributive adjectives’. These adjectives are very common in the New Yorker quotes; for example:
biological father
lower left leg
hot pieces of wood
public battle
unexpected words
experienced fighters
bare hands
constant suffering
Corpus research shows that attributive adjectives are 3 to 4 times more common in formal writing than in conversation. But these adjectives are common in the Diamond quotes, occurring about 30 times per 1,000 words. This density of adjectives is about 2 to 3 times as frequent as in normal conversation. (That rate of occurrence is similar to the normal rate in written fiction.) The density of adjectives is also considerably more common in the Diamond quotes than in the transcribed speech of DW.
Relative clauses with a fronted preposition are extremely rare in everyday conversation, although they are relatively common in formal writing. Surprisingly, the Diamond quotes include three examples of this structure:
a stone quarry from which the Ombal enemy was throwing stones
a night raid in which we sneak into an enemy village
each battle in which we succeeded in killing an Ombal
There are no examples in the formal statement of DW. (There might be one example of this type in the verbatim interview transcripts, but the actual structure is difficult to interpret.)
A third unusual noun phrase structure in the New Yorker quotes is the dense use of prepositional phrases as noun modifiers. Similar to the previous two structures, prepositional phrases as noun modifiers are extremely common in formal writing but rare in normal conversation. There are numerous examples of this structure in the New Yorker quotes; for example:
a strong young man in his prime
The original cause of the wars between the Handa and Ombal clans
real enemies of your target
grievances of their own
the mistake of hiring a man who actually does not consider your target to be his own enemy
feeling of anger
Both men and women on the other side
our endless cycles of revenge killings