The Craft of Writing: Lexical Density and You

Posted On November 21, 2015

According to the study, Success with Style: Using Writing Style to Predict the Success of Novels, by Stony Brook University’s Vikas Gajingunte Ashok, Song Feng, and Yejin Choi, whether or not a book will sell can be determined by several quantifiable factors.

The researchers downloaded classic literature from the Project Gutenberg archive, used more recent award-winning novels and analyzed low-ranking books on Amazon — and included genres from science fiction to classic literature and even poetry.

Successful books utilized a high percentage of nouns and adjectives, conjunctions, prepositions, pronouns, determiners and adjectives. They found that successful books made great use of conjunctions to join sentences (“and” or “but”) and prepositions than less successful books.

Less successful books had a higher percentage of verbs, adverbs, and foreign words. Such books also relied heavily on clichés, extreme and negative words. Less successful books also rely on dull verbs that describe direct action, such as “took,” “promised” and “cried,” while more successful books use more verbs that describe thought-processing, such as “recognized” and “remembered.”

The least known books further describe actions and emotions and, conversely, the most renowned have a vocabulary associated with reflection, thought and memories…

What the researchers found but didn’t identify by name was that lexical density contributed to books that readers consistently find engaging, whether books written by masters of the past or current best selling authors.

But What Is Lexical Density?

Lexical density is defined as the number of lexical words (or content words) divided by the total number of words. Lexical words give a text its meaning and provide information regarding what the text is about. More precisely, lexical words are simply nouns, adjectives, verbs, and adverbs.

And Grammarly says this:

Lexical Density is term used in text analysis. It measures the ratio of content words to grammatical words. Content words are nouns, adjectives, most verbs, and most adverbs. Grammatical (sometimes called functional) words are pronouns, prepositions, conjunctions, auxiliary verbs, some adverbs, determiners, and interjections.
Lexical density also considers the number of unique words. If you’ve re-used words, you’ve reduced your lexical density.

Well, What Is A Common Lexical Density?

Analyze My Writing says:

Fiction on average tends to score between 49% and 51%. The reader may verify this by trying this experiment.
More general prose tends to have slightly lower lexical densities near 48% and 50% as observed in this experiment.

However, another website says:

Unfortunately, there is no reference for lexical density as such. It is a well-known measure of lexical variation which is used in many linguistic analyses. If you search the internet for ‘lexical density’ you will find several of these. I do not know who was the first person to use a measure of lexical density in a study but it is now well-known and, as it is in the public domain, no one really references its use anymore in articles, reports, and so on.

As we will see, the difference between some current bestsellers is quite different from the 48% to 51% percentages tagged for lexical density. I suspect this is because it seems the amount of text used for these calculations are quite small, while an analysis of a whole piece of fiction would show a smaller percentage of lexical density because of the addition of dialogue, which tends to rely more on those elements, verbs, and adverbs.

Case in point. I started my comparisons with Grisham’s Gray Mountain with the first twenty-five hundred words of text, which revealed a lexical density of 37%. However, with the other books, I chose the first thousand words, so I figured I’d have to compare apples to apples. Sure enough, the lexical density of the first thousand words of Gray Mountain shot up to 48%. Then I checked again the samples used by the Analyze My Writing Website to see they were using either the first twenty sentences or the first paragraph for analysis, where they report lexical densities of 48% to 51%.

This makes sense. One parameter of lexical density is the number of words that are used more than once. The larger the writing sample, the more often words are reused. Smaller word samples mean fewer repeat words, hence the lexical densities they report of 48% to 51%.

Where Did This Sucker Come From Anyway?

Researching the exact whys and wherefores of lexical density runs a little far afield from the original intent of this post. There are a few clues out there on the web, but I’m not going to spend a week figuring out the intricacies of the history of the inception of lexical density.

From what I can find out, lexical density was coined in 1971 by J. Ure, who wrote a paper called Lexical density: a computational technique and some findings, which was published in M. Coultard (Ed.) Talking about Text (pp. 27–48). Birmingham: English Language Research, University of Birmingham. This paper is referenced multiple times in different research papers. However, in this context, lexical density was used as a measurement of how English as a Second Language students used the English language in their daily speech. Along the way, someone (and no, I don’t know who) grabbed onto lexical density and applied it as a measurement of readability, so we have one website reporting this definition of lexical density:

The Lexical Density Test is a Readability Test designed to show how easy or difficult a text is to read. The Lexical Density Test uses the following formula:
Lexical Density = (Number of different words / Total number of words) x 100
The lexical density of a text tries to measure the proportion of the content (lexical) words over the total words. Texts with a lower density are more easily understood.
As a guide, lexically dense text has a lexical density of around 60-70% and those which are not dense have a lower lexical density measure of around 40-50%.

Personally, I have some problems with the formulas and methods for determining lexical density. As we’ve seen, computations are taken from very small samples of larger works. The method of computation may work for spoken English because people don’t vary how they speak very often. But this doesn’t hold true in a piece of prose. Writers may not repeat the pattern of word use in the first paragraphs. We’ll add dialogue, which has an entirely different lexical density, or just vary how we use our words. As I’ve found in analyzing different pieces of writers’ works, lexical density goes down the more words you look at, most likely because the repetition of words counts against lexical density. The more you write, the more often you will use certain words. If you analyze a whole piece of work, the results may vary so widely from the first hundred words that you might doubt it was the same piece of work.

So we have a readability test that morphed into a measurement of writing skill, which, for the lack of evidence, we could use as an interesting discussion, except for the good people at Stony Brook who stumbled onto and affirmed a key element of lexical density. And that element is what they said in their paper and bears repeating:

Successful books utilized a high percentage of nouns and adjectives, conjunctions, prepositions, pronouns, determiners and adjectives.

And:

The more dense and complicated is the novel, the most likely it will stand out.

And that is worth looking into. I’ll be writing more on this. And these posts are:

The Craft of #Writing: Lexical Density Compared to Writing Rules
The Craft of Writing: 200 Most Common Words As Parts of Speech
The #Writing Craft: #Write Like A Best Selling #Author

References:
http://www.upi.com/Science_News/2014/01/09/Scientists-devise-algorithm-to-predict-success-of-novels/2021389295431/#ixzz3OweZqXXX

Sources:
Success with Style: Using Writing Style to Predict the Success of Novels, http://aclweb.org/anthology/D/D13/D13-1181.pdf

Photo published under a Creative Commons License issued by Flickr user Nina Jean.

Tags:Lexical Density, Song Feng and Yejin Choi, Stony Brook University, text analysis, The Craft of Writing, Vikas Gajingunte Ashok

6 Comments

Genghis

That’s an interesting idea. I wonder if there isn’t more going on that just that, because if that study found more nouns & adjectives were associated with success, and fewer verbs & adverbs, maybe there are different ways of presenting the lexical density?
November 23, 2015 Reply
Genghis

That’s an interesting idea. I wonder if there isn’t more going on that just that, because if that study found more nouns & adjectives were associated with success, and fewer verbs & adverbs, maybe there are different ways of presenting the lexical density?
November 23, 2015 Reply
Greg

This morning I’ve been looking at different texts, novels I enjoyed versus those that did not engage me. Trying to put my finger on what it is about the good engaging books that makes the difference. I stumbled first onto an interview with Justin Cronin where he is talking about a book of novellas by Stephen King and says, “the long stories have the density of short fiction. That’s one thing I hope people like about The Passage, that the writing in it is obedient to that principle of having density.” This sent me to google, because the idea of density in short fiction grabbed me – yes, good short stories do have a very obvious sort of density to them, a preciseness, each word punched out clean and neat and purposefully, right? I then found a blog talking about a book by William Sloane (The Craft of Writing) – which is actually a book of his notes put together by his wife after his death but anyway – and the blog (http://paulettealden.com/blog/william-sloane-on-density-in-writing) quotes at length Sloane’s thoughts on Density in fiction, which to put it simply is the overall union of a work, every word, sentence, paragraph humming along in unity, nothing wasted, nothing trivial. Then I googled on to your site. Lexical Density does seem to point us very near the ‘thing’ as well. Perhaps it is flawed as you say. But I do think it gets us somewhat nearer to answering the question. If you crack open some poorly written book in your least favorite genre and start reading, the words do seem almost meaninglessly strung together, very lazy, very breezy, you could skip a paragraph and be no worse off in comprehension or experience, you know? But in a good book, you are hanging onto each word and sentence, the very reading of it pleasurable and instantly changing and directing the reader’s dream you’re living while reading it. I don’t know. I am still unsure what I’m thinking and even what I’m searching to understand. But I am very much looking forward to hearing your thoughts in the next posts in this series! Thanks!
November 28, 2015 Reply
Greg

This morning I’ve been looking at different texts, novels I enjoyed versus those that did not engage me. Trying to put my finger on what it is about the good engaging books that makes the difference. I stumbled first onto an interview with Justin Cronin where he is talking about a book of novellas by Stephen King and says, “the long stories have the density of short fiction. That’s one thing I hope people like about The Passage, that the writing in it is obedient to that principle of having density.” This sent me to google, because the idea of density in short fiction grabbed me – yes, good short stories do have a very obvious sort of density to them, a preciseness, each word punched out clean and neat and purposefully, right? I then found a blog talking about a book by William Sloane (The Craft of Writing) – which is actually a book of his notes put together by his wife after his death but anyway – and the blog (http://paulettealden.com/blog/william-sloane-on-density-in-writing) quotes at length Sloane’s thoughts on Density in fiction, which to put it simply is the overall union of a work, every word, sentence, paragraph humming along in unity, nothing wasted, nothing trivial. Then I googled on to your site. Lexical Density does seem to point us very near the ‘thing’ as well. Perhaps it is flawed as you say. But I do think it gets us somewhat nearer to answering the question. If you crack open some poorly written book in your least favorite genre and start reading, the words do seem almost meaninglessly strung together, very lazy, very breezy, you could skip a paragraph and be no worse off in comprehension or experience, you know? But in a good book, you are hanging onto each word and sentence, the very reading of it pleasurable and instantly changing and directing the reader’s dream you’re living while reading it. I don’t know. I am still unsure what I’m thinking and even what I’m searching to understand. But I am very much looking forward to hearing your thoughts in the next posts in this series! Thanks!
November 28, 2015 Reply
Leith Lindenstrauss

Thank you, Beth, for the post. And you too, Greg, for your comment with its added insight. This is my first visit to the bethturnage site because I was looking for this exact analysis of lexical density. My first chapter’s first 1000-word density was 50% +, yet the 4000 words for chapter overall tested around 36%. You’ve provided a thought feast. Provocative too is “The more dense and complicated is the novel, the more likely it will stand out” because my critique group constantly dings me for density. I carefully, almost obsessively, focus my paragraphs, am equally conscious of “function versus emphasis” yet garner comments of “too much information” and “it doesn’t breathe.” My critics are good readers and writers and I believe their perceptions are accurate. Am discovering, I do so sincerely hope, the solution to making these passage more “breathable” is not to remove information but to further develop it. Will stay tuned to see what other gems you post/posted here. Thanks!
January 12, 2017 Reply
Leith Lindenstrauss

Thank you, Beth, for the post. And you too, Greg, for your comment with its added insight. This is my first visit to the bethturnage site because I was looking for this exact analysis of lexical density. My first chapter’s first 1000-word density was 50% +, yet the 4000 words for chapter overall tested around 36%. You’ve provided a thought feast. Provocative too is “The more dense and complicated is the novel, the more likely it will stand out” because my critique group constantly dings me for density. I carefully, almost obsessively, focus my paragraphs, am equally conscious of “function versus emphasis” yet garner comments of “too much information” and “it doesn’t breathe.” My critics are good readers and writers and I believe their perceptions are accurate. Am discovering, I do so sincerely hope, the solution to making these passage more “breathable” is not to remove information but to further develop it. Will stay tuned to see what other gems you post/posted here. Thanks!
January 12, 2017 Reply

The Craft of Writing: Lexical Density and You

Related

Add a Comment