Here are resources I've found useful for quantitative stylistics work. I like the term "quantitative stylistics"—it seems to bridge in one breath the considerable distance between, say, computational linguistics and comparative literature. Franco Moretti (below) paints an interesting picture of what it might involve:
Looking at prose style from below . . . With digital databases, this is now easy to imagine: a few years, and we’ll be able to search just about all novels that have ever been published, and look for patterns among billions of sentences. Personally, I am fascinated by this encounter of the formal and the quantitative. Let me give you an example: all literary scholars analyse stylistic structures—free indirect style, the stream of consciousness, melodramatic excess, whatever. But it’s striking how little we actually know about the genesis of these forms. Once they’re there, we know what to do; but how did they get there in the first place? How does the ‘confused thought’ (Michel Vovelle) of mentalité, which is the substratum for almost all that happens in a culture—how does messiness crystallize into the elegance of free indirect style? Concretely: what are the steps? No one really knows. By sifting through thousands of variations and permutations and approximations, a quantitative stylistics of the digital archive may find some answers. It will be difficult, no doubt, because one cannot study a large archive in the same way one studies a text: texts are designed to ‘speak’ to us, and so, provided we know how to listen, they always end up telling us something; but archives are not messages that were meant to address us, and so they say absolutely nothing until one asks the right question. (Franco Moretti, NLR 52)
Articles
Software
More on Moretti
More technical
Textbooks and courses
Christopher D. Manning and Hinrich Schuetze, Foundations of Statistical Natural Language Processing, 1st ed. (The MIT Press, 1999).
CS229 Machine Learning Autumn 2008
Videos from Stanford's Machine Learning Course (CS229 Autumn 2008). A bit over my head for now. I'm hoping to return to this once I grasp SVMs.
Leave a Comment: