June 20, 2012

Mapping the Oz genome


The Royal Book of Oz, originally credited to L. Frank Baum, but in all likelihood written by Ruth Plumly Thompson.

Earlier this week, Slate’s Lexicon Valley podcast examined the question of writers’ individual style and whether they are distinctive enough to be recognized. Intuition tells us that they should be, at least among writers who have worked to develop a particular voice that could be identified by their devotees. But, as hosts Mike Vuolo and Bob Garfield explain, a mathematician has done a study to scrutinize the words used by authors to determine whether they have identifiable “fingerprints.” Using L. Frank Baum and his Wizard of Oz series as a jumping off point, they describe how after Baum’s death, his publisher had another writer, Ruth Plumly Thompson, take over the series—though the fifteenth book in the series, the first published after Baum had died, still credited him as the author, with the note that it was “enlarged and edited by Ruth Plumly Thompson.” There has been question over whether that book, The Royal Book of Oz, was Baum’s last or Thompson’s first in the series (she went on to write and receive full credit for twenty additional Oz books), further complicated by the fact that Thompson was likely emulating Baum’s style in order to keep the series consistent.

Vuolo and Garfield go on to explain the various ways people have tried to find identifying aspects of authors’ work. They start with sentence and word length, both of which turn out to be unreliable; the average length of each is so consistent it doesn’t tell us anything. But they move on to discuss José Binongo, a mathematician who scanned the books that are known to have been written by Baum, and those by Thompson and identified their fifty most-used “function words,” those that serve a particular purpose, excluding things like personal nouns, which are frequently dictated by circumstance and necessity, not an author’s personal tendencies. At this point, the math gets truly baffling, as he conceived of a fifty-dimension space with each axis representing a different word, then collapsed those fifty dimensions into a two-dimensional image that, while it loses some information, is still a “good approximation” according to Binongo. He ended up with two distinct sets of data, with points representing Baum’s works in one cluster and Thompson’s in another—effectively creating a mathematical signature for each of their writing styles.

In order to check his work, Binongo also applied the method to five of Baum’s works outside the series and found that they perfectly matched the pattern of his fourteen known Oz books. The Royal Book of Oz, however, overlapped with Thompson’s work, providing strong evidence that she was the sole author, and that the publisher probably put Baum’s name on the book to keep his fans interested in the series. Many people had already come to that conclusion before Binongo’s experiment (as early as the 1980s, many editions were changed to credit Thompson as the author), but his mathematical proof certainly strengthens the claim, and it has terrific potential for future application.


