The target word, key, and distractors for each item were carefully considered along a variety of dimensions. Word features including lexical, semantic, orthographic, and phonological features were coded. See below for a list of the databases and other external sources that were used in the item feature coding:
Lexical Features
- Word Frequency of individual words: The Educator’s Word Frequency Guide (Zeno et al., 1995; [no link available]); Wikipedia Corpus (Davies, 2015); Brown Corpus [no link available]; and Hyperspace Analogue to Language (HAL) Corpus (Lund & Burgess, 1996; [no link available]).
- Word Age: number of years between first recorded use of the word and the year 2000, from Google ngrams
Semantic Features
- Number of Morphemes present in the word, from the English Lexicon Project (Balota et al., 2007)
- Number of Meanings and Senses associated with the word, from Word Net (Fellbaum, 1998; Miller et al., 1990)
- Semantic Precision: depth of hypernym chain for a word, from Word Net (Fellbaum, 1998; Miller et al., 1990)
- Dispersion: number of different subject areas where a word appears (scaled from 0-1), from the Educator’s Word Frequency Guide (Zeno et al., 1995; [no link available])
- Semantic Diversity: semantic similarities of all contexts in which a word appears, from Hoffman et al., 2013
- Contextual Diversity: the number of different texts in which a word appears. From Adelman et al., 2006 [no link available]; and from SUBTLEX-UK (van Heuven et al., 2014)
- Semantic Similarity: similarities between target word and key word, within a constructed semantic space, from LSA Project, CU Boulder
Orthographic Features
- Word Length: number of letters in the word
- Mean Bigram Frequency: the average bigram (two letter string) frequency of all bigrams in a word, from the English Lexicon Project (Balota et al., 2007)
- Number of Orthographic Neighbors: Coltheart’s N, number of words that can be made by substituting one letter in the word, from the English Lexicon Project (Balota et al., 2007)
- Levenshtein Distance (LD): the minimum number of operations (substitution, insertion, deletion) needed to turn one letter string into another. Distance between target and key, calculated using the R package vwr (Keuleers, 2013).
- Orthographic Levenshtein Distance 20 (OLD20): the mean LD from a word to its 20 closest orthographic neighbors, Yarkoni et al., 2013, calculated using the R package vwr (Keuleers, 2013)
- Decodability: a measure of the ease with which the word can be decoded, from Saha et al. (under review)
Phonological Features
- Number of Phonemes present in the word, from the English Lexicon Project (Balota et al., 2007)
- Number of Syllables present in the word, from the English Lexicon Project (Balota et al., 2007)
- Number of Phonological Neighbors: number of words that can be formed by substituting one phoneme in the word, from the English Lexicon Project (Balota et al., 2007)
- Number of Phonographic Neighbors: number of words that are both orthographic and phonological neighbors of the word, from the English Lexicon Project (Balota et al., 2007)
- Phonological Levenshtein Distance 20 (PLD20): the mean LD from a word to its 20 closest phonological neighbors, from the English Lexicon Project (Balota et al., 2007)