September marks the 20th anniversary of the Harry Potter and the Sorcerer’s Stone’s American arrival, following J. K. Rowling’s enchanting debut in the U.K. the previous summer. The book was dutifully translated, lest Americans readers determine that “sellotape” — that is, “Scotch tape” — must be one of those Wizarding neologisms alongside “Muggle” and “Quidditch.”
For last year’s British anniversary, TIME introduced the union of magic and science in our research-based Sorting Hat quiz, developed in partnership with personality experts at the University of Cambridge. In that same spirit, we now present a trivia contest based on our in-house data analysis of the complete text of the series, which is north of a million words long. Needless to say, the answers and following commentary contain copious spoilers.
To develop these questions, I extracted the text and images from e-book editions of all seven volumes. (Fear not for TIME’s expenses. I already had them all.) After carefully cleaning the output to exclude all text outside the story itself, like the publisher information, I ran a series of analyses on the corpus, from basic word frequencies to the role of colors and which adjectives are most associated with which characters. (I love treating text as data almost as much as I love the Harry Potter books. Words are both the “most inexhaustible source of magic” and an inextinguishable font of information beyond just the narrative meaning of the sentences.)
This begins with a simple tally of the words themselves over the span of the 199 chapters in the seven books. While not a sophisticated calculation, one can turn up plenty of curiosities this way. For example, the prevalence of the word “magic” spikes highest in two places: When Harry discovers he is a wizard in the first book, and when Dumbledore recounts the moment that Tom Riddle learned he was a wizard in the sixth.
Another fun game is to look at which adjectives are most associated with a particular character. If you come across the word “girlish,” for example, look over your shoulder, because Dolores Umbridge is almost certainly nearby.
This turns up one habit of Rowlings’ that I’ve never liked, which is her penchant for sizeism — and I’m not referring to Hagrid. The early chapters of several books are resplendent with jokes about Dudley’s weight, and Umbridge is described as “pouchy” four times. Draco Malfoy’s cronies, Vincent Crabbe and Gregory Goyle, are routinely described as lumbering hulks. (From Sorcerer’s Stone: “There was of course nothing at all little about Crabbe and Goyle, but as the High Table was full of teachers, neither of them could do more than crack their knuckles and scowl.”)
The plumpest character of no ill repute is probably Mrs. Weasley, though she’s model-thin compared to the less endearing Horace Slughorn.
Meanwhile, Harry is never anything but too skinny.
On a more amusing note, here’s a funny measure: For all the perseverant crises he endures at Hogwarts, poor Harry never has an easy time merely reaching the school. Putting aside the last book, his longest struggle is in the fifth book, where it takes him more than 57,000 words to get through that nasty business with the dementors. (The Great Gatsby is 47,000 words.) Then again, Harry Potter and the Order of the Phoenix is by far the longest volume at 255,000 words, about 20 percent longer than Moby Dick. Speaking of sizeism.
All analyses were run on the American editions of the books. My word counts may be slightly lower than other tabulations since we can’t all agree on what a word is — I count hyphenations like “half-blood” and “You-Know-Who” as one word while others split them up. The full text of the books is not contained in the source code or anywhere else that is publicly available.