This repository provides supplemental information from the following open access paper

Knoph, R. E., Lawrence, J. F., & Francis, D. J. (2023). The Dimensionality of lexical features in general, academic, and disciplinary vocabulary. Scientific Studies of Reading.

This study analyzed 22 word features across three established word lists to determine if latent lexical dimensions exist.  We found five stable factors for high-frequency words and general academic words, while disciplinary vocabulary aligned best with the latter two factors combined:

  • Frequency – composed of frequency measures from a variety of corpora, along with large-grain measures of contextual diversity
  • Complexity – composed of the number or letters, syllables, phonemes, etc., plus Levenshtein distances
  • Proximity – composed of neighborhood densities
  • Polysemy – composed of the number of senses and meanings
  • Diversity – composed of semantic dispersion and (negatively) precision


Explore the Estimated Scores

Download estimated scores for original words


View the Exploratory Factor Analyses Results

These are the results of the exploratory factor analyses across the three reference groups.  Only loadings greater than 0.3 are included in the figures to aid readability.

GSL-Reference Model AWL-Reference Model AVL-DS-Reference Model




Watch the Density Plots

Here you can view the density plots of each lexical dimension by reference model and word sample.

  • Columns represent each of the five lexical dimensions: Frequency, Complexity, Proximity, Polysemy, and Diversity.
  • Rows represent each reference model: the model estimated using words from the General Service List (GSL; West, 1953), words from the Academic Word List (AWL; Coxhead, 2000, and words from the domain-specific section of the Academic Vocabulary List (AVL-DS; Gardner & Davies, 2014).
  • Animations loop through the density plots of four different word sets. The first is estimates for the GSL words, the second is for AWL words, third is for AVL-DS words, and the final is for all three sets of words combined.

You can use this example still image of the top left animation:

  • The plot furthest to the right is estimates on Frequency for GSL words using the GSL-reference model.
  • The middle plot is estimates on Frequency for AWL words using the GSL-reference model.
  • The plot furthest to the left is estimates on Frequency for AVL-DS words using the GSL-reference model.
  • The flat plot that overlaps all of the plots is estimates on Frequency for all of the word sets using the GSL-reference model.

Density Plot for Word Set 3.909


Animated Representations of Figure 5. Word Set 1 = GSL, Word Set 2 = AWL, Word Set 3 = AVLDS, Word Set 4 = All Words
Frequency Complexity Proximity Polysemy Diversity
GSL-Reference Model
AWL-Reference Model
AVL-DS-Reference Model


Related Work

See related work that uses and discusses these latent dimensions:

Lawrence, J. F., Knoph, R. E., McIlraith, A., Kulesz, P. A., Francis, D. J. (2022). Reading comprehension and academic vocabulary: Exploring relations of item features and reading proficiency. Reading Research Quarterly, 57(2), 669-690.

This study analyzed academic vocabulary assessment data from monolingual English students in middle school to understand the relationship between various word characteristics and students’ knowledge across reading abilities. Results showed that words with multiple meanings (high Polysemy scores) were generally easier for all students.  It also showed that strong readers were more sensitive to word frequency and struggling readers were more sensitive to word complexity.


Lawrence, J. (2022, May 13). Teaching Academic Vocabulary Words. Reading Ways.

This blog discusses the importance of teaching academic vocabulary so that students can understand texts across disciplines in school.  It provides an overview of the two previous research papers for practitioners.