The target word, key, and distractors for each item were carefully considered along a variety of dimensions. Word features including lexical, semantic, orthographic, and phonological features were coded. See below for a list of the databases and other external sources that were used in the item feature coding:

Lexical Features

  • Word Frequency of individual words: The Educator’s Word Frequency Guide (Zeno et al., 1995; [no link available]); Wikipedia Corpus (Davies, 2015); Brown Corpus [no link available]; and Hyperspace Analogue to Language (HAL) Corpus (Lund & Burgess, 1996; [no link available]).
  • Word Age: number of years between first recorded use of the word and the year 2000, from Google ngrams

Semantic Features

  • Number of Morphemes present in the word, from the English Lexicon Project (Balota et al., 2007)
  • Number of Meanings and Senses associated with the word, from Word Net (Fellbaum, 1998; Miller et al., 1990)
  • Semantic Precision: depth of hypernym chain for a word, from Word Net (Fellbaum, 1998; Miller et al., 1990)
  • Dispersion: number of different subject areas where a word appears (scaled from 0-1), from the Educator’s Word Frequency Guide (Zeno et al., 1995; [no link available])
  • Semantic Diversity: semantic similarities of all contexts in which a word appears, from Hoffman et al., 2013
  • Contextual Diversity: the number of different texts in which a word appears. From Adelman et al., 2006 [no link available]; and from SUBTLEX-UK (van Heuven et al., 2014)
  • Semantic Similarity: similarities between target word and key word, within a constructed semantic space, from LSA Project, CU Boulder

Orthographic Features

  • Word Length: number of letters in the word
  • Mean Bigram Frequency: the average bigram (two letter string) frequency of all bigrams in a word, from the English Lexicon Project (Balota et al., 2007)
  • Number of Orthographic Neighbors: Coltheart’s N, number of words that can be made by substituting one letter in the word, from the English Lexicon Project (Balota et al., 2007)
  • Levenshtein Distance (LD): the minimum number of operations (substitution, insertion, deletion) needed to turn one letter string into another. Distance between target and key, calculated using the R package vwr (Keuleers, 2013).
  • Orthographic Levenshtein Distance 20 (OLD20): the mean LD from a word to its 20 closest orthographic neighbors, Yarkoni et al., 2013, calculated using the R package vwr (Keuleers, 2013)
  • Decodability: a measure of the ease with which the word can be decoded, from Saha et al. (under review)

Phonological Features

  • Number of Phonemes present in the word, from the English Lexicon Project (Balota et al., 2007)
  • Number of Syllables present in the word, from the English Lexicon Project (Balota et al., 2007)
  • Number of Phonological Neighbors: number of words that can be formed by substituting one phoneme in the word, from the English Lexicon Project (Balota et al., 2007)
  • Number of Phonographic Neighbors: number of words that are both orthographic and phonological neighbors of the word, from the English Lexicon Project (Balota et al., 2007)
  • Phonological Levenshtein Distance 20 (PLD20): the mean LD from a word to its 20 closest phonological neighbors, from the English Lexicon Project (Balota et al., 2007)