SubtlexUS is database containing word frequencies based on English and American movies and TV series subtitles (51 million words in total). Two measures are provided:
- The frequency per million words, called SUBTLEXWF (Subtitle frequency: word form frequency)
- The percentage of films in which a word occurs, called SUBTLEXCD (Subtitle frequency: contextual diversity; see Adelman, Brown, & Quesada (2006) for the qualities of this measure).
Authors: Boris New and Marc Brysbaert
Brysbaert, Marc, and Boris New. 2009. “Moving beyond Kučera and Francis: A Critical Evaluation of Current Word Frequency Norms and the Introduction of a New and Improved Word Frequency Measure for American English.” Behavior Research Methods 41 (4): 977–990. (pdf)
Brysbaert, Marc, Boris New, and Emmanuel Keuleers (2012). “Adding Part-of-Speech Information to the SUBTLEX-US Word Frequencies.” Behavior Research Methods 44 (4): 991–997. (pdf)