Last year I wrote an article showing the Opening paragraph of top selling books. A while ago I read an article that talked about the top words in top selling books. I've lost that article so I decided to use my data mining skills and make a similar research on my own. The books I've included are a few of the top selling books from the last article:
- A tale of two cities
- And then there were none
- The Hobbit
But since all of the top selling books are not available in the correct format needed to analyze the words, I've also included:
- The Bible
- Alice in wonderland
- The Engineer (which is my own biography book on the entrepreneur Elon Musk)
This is the result:
Alice | Two cities | Hobbit | And then | Bible | Engineer |
the | the | the | the | the | the |
and | and | and | a | and | to |
to | of | of | of | of | a |
a | to | to | to | to | and |
she | a | a | and | that | of |
of | in | he | he | in | in |
it | his | in | was | he | was |
said | it | was | said | shall | he |
it | that | they | I | unto | Elon |
in | I | it | it | for | that |
But, as you can see, most of the words in the results above are so-called stop words, so I've also tried to see what happens if these words are removed. This is the result:
Alice | Two cities | Hobbit | And then | Bible | Engineer |
said | said | said | said | shall | elon |
alice | mr | bilbo | lombard | unto | said |
little | lorry | dwarves | blore | lord | car |
know | man | came | vera | thou | tesla |
like | defarge | like | armstrong | thy | like |
went | little | long | rogers | god | company |
thought | time | thorin | mr | said | electric |
queen | hand | time | know | ye | rocket |
time | miss | great | went | thee | space |
did | know | did | little | man | didnt |
If I recall that old article I read on the top words in top selling books, I think the words were "and" and maybe "but." Anyway, it was interesting to see that the words in my own book matched almost all of the words in the other top selling books. But I think we will need more books to come up with any real conclusions.
Bonus!
As a bonus, I've also calculated the average words per sentence in each book. This is the result:
- Alice: 27
- Two cities: 18
- Hobbit: 20
- And then: 9
- Bible: 28
- Engineer: 20
...and average length per word:
- Alice: 4.09
- Two cities: 4.33
- Hobbit: 4.17
- And then: 4.21
- Bible: 3.94
- Engineer: 4.46
Comments
Post a Comment