Quantitative Stylistics Resources

Published
23. February 2009

Here are resources I've found useful for quantitative stylistics work. I like the term "quantitative stylistics"—it seems to bridge in one breath the considerable distance between, say, computational linguistics and comparative literature. Franco Moretti (below) paints an interesting picture of what it might involve:

Looking at prose style from below . . . With digital databases, this is now easy to imagine: a few years, and we’ll be able to search just about all novels that have ever been published, and look for patterns among billions of sentences. Personally, I am fascinated by this encounter of the formal and the quantitative. Let me give you an example: all literary scholars analyse stylistic structures—free indirect style, the stream of consciousness, melodramatic excess, whatever. But it’s striking how little we actually know about the genesis of these forms. Once they’re there, we know what to do; but how did they get there in the first place? How does the ‘confused thought’ (Michel Vovelle) of mentalité, which is the substratum for almost all that happens in a culture—how does messiness crystallize into the elegance of free indirect style? Concretely: what are the steps? No one really knows. By sifting through thousands of variations and permutations and approximations, a quantitative stylistics of the digital archive may find some answers. It will be difficult, no doubt, because one cannot study a large archive in the same way one studies a text: texts are designed to ‘speak’ to us, and so, provided we know how to listen, they always end up telling us something; but archives are not messages that were meant to address us, and so they say absolutely nothing until one asks the right question. (Franco Moretti, NLR 52)

Articles

  • Franco Moretti, Graphs, Maps, Trees: Abstract Models for a Literary History (London: Verso, 2005). (Collection of three NLR articles)
  • Cosma Shalizi, "Graphs, Trees, Materialism, Fishing," The Valve - A Literary Organ, January 24, 2006.
  • Franco Moretti, “ The Novel: History and Theory,” New Left Review, no. 52 (August 2008).
  • Tanya E. Clement, “'A thing not beginning and not ending': using digital tools to distant-read Gertrude Stein's The Making of Americans,” Literary & Linguistic Computing 23, no. 3 (September 1, 2008): 361-381, doi:10.1093/llc/fqn020.
  • Bei Yu, “An evaluation of text classification methods for literary study,” Literary & Linguistic Computing 23, no. 3 (September 1, 2008): 327-343, doi:10.1093/llc/fqn015.
  • Tanya Clement, Sara Steger, John Unsworth, Kirsten Uszkalo, "How Not to Read a Million Books", October 2008.

Software

More on Moretti

More technical

  • Bei Yu, “An Evaluation of Text-Classification Methods for Literary Study” (Dissertation, UIUC), http://www.noraproject.org/publications.php.
  • Moshe Koppel, Navot Akiva, and Ido Dagan, “A Corpus-Independent Feature Set for Style-Based Text Categorization” (2003), doi:10.1.1.2.2.
  • Fabrizio Sebastiani and Consiglio Nazionale Delle Ricerche, “Machine learning in automated text categorization,” ACM Computing Surveys 34 (2002): 1--47, doi:10.1.1.17.6513 (helpful summary of methods).
  • Matthew Wilkens's blog posts on Evaluating POS Taggers

Textbooks and courses



Leave a Comment: