In November of 1864, a woman named Lydia Bixby received a letter from President Abraham Lincoln. He had been told that she had lost five sons to the then-ongoing Civil War. A Massachusetts state official had, learning of her plight, passed along her story. His request eventually made it to the White House.
Though Bixby's original copy of the letter was quickly destroyed or lost, the state official had also shared the text with the Boston Evening Telegraph, which published it.
"I pray that our Heavenly Father may assuage the anguish of your bereavement," the letter noted, "and leave you only the cherished memory of the loved and lost, and the solemn pride that must be yours to have laid so costly a sacrifice upon the altar of freedom."
The brief, but eloquent missive struck a chord for many in the war-torn nation, and it has since become famous as one of the best letters written in the history of the English language.
But it's also one of the most "controversial" documents in Lincoln's large body of writings. Though the letter has other complications to its history — such as, for example, the fact that it wasn't true that Bixby had lost five sons, and despite her Boston address, her family said she was a Confederate sympathizer — the main point of contention has been whether or not Lincoln actually wrote it. Many historians have wondered whether perhaps it was written instead by his secretary, John Hay.
Now, a team of forensic linguistics researchers think they have arrived at an answer once and for all. In a paper submitted to the journal Digital Scholarship in the Humanities, which will be presented at a linguistics conference that begins next week, they explain why they believe that the numbers show that the letter was written by Hay.
"We’d never heard of Hay but we’d heard of Lincoln, obviously, and there’s loads of data," says Jack Grieve, who worked on the project with colleagues Emily Carmody, Isobelle Clarke, Hannah Gideon, Annina Heini, Andrea Nini and Emily Waibel, as part of a working group at the Center for Forensic Linguistics at Aston University. Such teams have worked on everything from trying to use forensic linguistics to identify the creator of Bitcoin to using the same techniques to trace cybersecurity breaches by identifying hackers' native languages.
Using a technique they developed, called n-gram tracing, they arrived at the conclusion that the Bixby letter was was "almost certainly" written by Hay.
But why was there ever any doubt?
As Michael Burlingame has recounted in the Journal of the Abraham Lincoln Association, the doubt wasn't just a matter of the lack of an original. Hay, who lived for decades after Lincoln's assassination, apparently told several people privately during his lifetime that he had written it. The rumor spread, and by about a century ago, it had made its way to history books. On the other hand, Lincoln's famed eloquence and the fact that the rumor only spread after both men's deaths have made others reluctant to give anyone else credit for the missive, preferring instead to believe that if Hay said he had written it, he merely meant he had transcribed his boss' dictation or copied it from a draft.
With both parties long dead and the original missing for more than a century and a half, there seemed to be little hope of settling the question with any certainty.
Enter forensic linguistics. This field — perhaps most famous in recent years for helping to out J.K. Rowling as Robert Galbraith — relies on the theory that, just as people from different regions may speak different dialects of the same language, each individual speaks and writes an even more subtle personal version of their language, known as an idiolect. "We pick up these idiolects over our lifetimes, not just because of where we grew up, but where we went to school, what kind of job we do, our personal history," Grieve explains. Though the naked ear can't often pick them up, computers can find and compare them by picking apart details such as the frequency of use of words as common as "the" or "and."
But while efforts have been made to analyze the Bixby letter to see whether it matches Lincoln's or Hay's writing style more, those attempts have never been conclusive, in particular because — though both Lincoln and Hay left countless examples of their writing styles — the letter itself is so short, containing only 139 words. “To make an analogy, if you take 10 people walking down the street in some American city and you look at their demographics, how many men you saw and how many women, different age groups and ethnicities, you wouldn’t get a very good estimate of the U.S. population," Grieve says. "It’s so little data that you’re not getting a good measure of the real features of the population."
A solution to this problem, Grieve and his colleagues posit, can be found by breaking the language down into components that are inherently more numerous: n-grams. An n-gram is a "sequence of one or more linguistic forms," as they phrase it in the paper. How to understand that? Well, on the word level, that last sentence contains one 4-gram (how-to-understand-that), two 3-grams (how-to-understand and to-understand-that), three 2-grams (how-to, to-understand, understand-that) and four 1-grams (each word). The same principle can also be applied to the individual characters in words. In n-gram tracing, the computer looks to see whether any individual n-gram that shows up in the sample text shows up at all in the comparison text. In this case, the Bixby letter is the disputed sample. The researchers tested 500 texts by Hay and a commensurate random sample from texts from the much larger body of work known to be by Lincoln.
Because the question is whether something shows up or not — rather than asking how frequently it shows up — a short text can still be broken down by this algorithm and yield a good result. The authors of the paper claim their method is able to distinguish between the two writers "with a very high degree of accuracy," even for very short texts.
And their result?
Nearly 90% of the time, the n-gram tracing method identified Hay as the author of the Bixby letter. The other roughly 10% of the time, the analysis was inconclusive. (Those times were when the researchers used groupings of just 1 or 2 letters at a time, rather than whole words, and those combinations proved extremely common overall.) That means, they believe, that the century of wondering about the Bixby letter can come to an end — leaving history buffs free to appreciate the letter's beauty, as well as the uncontested writing skills of both men, without the distraction of this lingering question.
It also leaves Grieve and his colleagues free to apply their techniques to other problems. One of his colleagues wants to tackle letters supposedly written by Jack the Ripper, for example. And what's Grieve's dream project in his field? "This," he says, "was kind of it."