Skip to content

Choosing a Metric

readscore supports nine readability metrics. They are not interchangeable. Each formula was built for a specific context, makes different assumptions, and targets a different kind of reader. Picking the wrong metric gives you a number that means nothing.

This guide helps you choose.

Find your situation in the left column. The right column tells you which metric to start with.

My text is…Use this metric
For young readers (grades 1–3)Spache
For children (grades 4+)Dale-Chall
For a general adult audienceFlesch Reading Ease
For health communications or patient materialsSMOG
For technical documentation or manualsARI
For military or government documentsARI or Linsear Write
I need a US grade level number directlyFlesch-Kincaid
I need the most widely recognized single metricFlesch Reading Ease
My text has fewer than 30 sentencesAnything except SMOG
I want to avoid syllable countingARI or Coleman-Liau

SMOG and Flesch-Kincaid Are Not Comparable

Section titled “SMOG and Flesch-Kincaid Are Not Comparable”

The inputs a metric uses tell you a lot about what it can and cannot detect.

MetricSentence lengthSyllables per wordCharacter countWord list
Flesch Reading EaseYesAverage
Flesch-KincaidYesAverage
Gunning FogYesCount (3+)
SMOGCount (3+)
ARIYesLetters + digits
Coleman-LiauYesLetters only
Dale-ChallYes3,000 familiar words
SpacheYesPrimary-grade word list
Linsear WriteYesWeighted

Character-based metrics (ARI, Coleman-Liau) count letters instead of syllables. This makes them fully deterministic — two systems always agree — and better at handling technical jargon where syllable counting is unreliable.

Word-list metrics (Dale-Chall, Spache) check each word against a list of familiar words. This captures vocabulary difficulty more directly but requires the word list to match your audience and era. Both lists were built in the mid-20th century, which affects modern vocabulary coverage.

Syllable-based metrics (Flesch, Flesch-Kincaid, Gunning Fog, SMOG) treat longer words as harder. This holds up well across general prose but breaks down for text with many short technical terms or long common words.

All metrics except Flesch return a US grade level or education level. Flesch returns an ease score that runs from 0 to 100.

MetricScore typeExtra fields
Flesch Reading Ease0–100 ease score (higher = easier)ease (text label)
Flesch-KincaidUS grade level
SMOGUS grade level
ARIUS grade levelages (age range)
Coleman-LiauUS grade level
Gunning FogYears of formal education needed
Dale-ChallRaw score mapped to grade bands
SpacheUS grade level
Linsear WriteUS grade level
MetricMinimumWhat happens if too short
Most metrics100 words (default)ValueError on construction
SMOG30 sentencesValueError by default; pass ignore_length=True for a warning

You can lower the 100-word default with Readability(text, min_words=N), but scores from short texts are less reliable.

Running all nine metrics on the same paragraph shows how much the scores can vary. The paragraph below is from this library’s README — a description of the Gunning Fog index, written at roughly a 10th–12th grade level.

from readscore import Readability
text = """
In linguistics, the Gunning fog index is a readability test for English writing.
The index estimates the years of formal education a person needs to understand
the text on the first reading. For instance, a fog index of 12 requires the
reading level of a United States high school senior (around 18 years old).
The test was developed in 1952 by Robert Gunning, an American businessman
who had been involved in newspaper and textbook publishing.
"""
r = Readability(text)
print(r.flesch().score) # ~52.4 → "fairly_difficult" → grades 10–12
print(r.flesch_kincaid().score) # ~11.8 → grade 12
print(r.ari().score) # ~11.9 → grade 12, ages [17, 18]
print(r.coleman_liau().score) # ~12.1 → grade 12
print(r.gunning_fog().score) # ~13.2 → college level
print(r.dale_chall().score) # ~7.4 → grades 9–10
print(r.linsear_write().score) # ~11.0 → grade 11
# SMOG raises ValueError — this paragraph has fewer than 30 sentences
# r.smog() → ValueError: SMOG requires at least 30 sentences. 6 found.
# Spache runs but is outside its valid range — designed for grades 1–3
print(r.spache().score) # ~4.1 → grade 4 (not meaningful for adult text)

The Spache result illustrates why metric selection matters. Spache was designed for primary school text. Applying it to adult prose produces a score that is technically computed but practically meaningless.

Running several metrics and comparing the results is a valid approach when:

  • You want to reduce the effect of any single formula’s quirks on your assessment
  • You want to flag text where metrics disagree significantly, which can indicate unusual sentence structure, heavy jargon, or formatting that confuses the parsers
r = Readability(text)
grade_estimates = [
r.flesch_kincaid().grade_level,
r.ari().grade_level,
r.coleman_liau().grade_level,
]
average = sum(grade_estimates) / len(grade_estimates)
spread = max(grade_estimates) - min(grade_estimates)
print(f"Average grade estimate: {average:.1f}")
print(f"Spread across metrics: {spread} grades")

A large spread (4+ grade levels) between metrics on the same text is a signal to look more closely. It often means the text has features that one formula handles differently — heavy technical terms, very short sentences with long words, or non-prose formatting.

MetricParticularly strong when…
Flesch Reading EaseCommunicating ease to non-technical audiences; legal compliance
Flesch-KincaidReporting US grade level; government and education contexts
SMOGHealth communications; when 100% comprehension matters
ARITechnical manuals; avoiding syllable-counting errors on jargon
Coleman-LiauBatch processing; deterministic results at scale
Gunning FogBusiness and journalism contexts; identifying “foggy” prose
Dale-ChallEducational materials; validating vocabulary for grade 4+ audiences
SpachePrimary school materials; grades 1–3 only
Linsear WriteMilitary and government technical documents