Skip to content

Coleman-Liau Index

Meri Coleman and T.L. Liau published this formula in 1975 in the Journal of Applied Psychology. They designed it for optical scanning devices that could count how many characters were in a word without knowing which letters they were. That hardware constraint is long gone, but the formula remains useful: it is fully deterministic, requires no phonological analysis, and scores consistently across processing environments.

OutputUS grade level
Best forEducational content, curriculum materials, legal and health literacy analysis
Methodr.coleman_liau()
What it countsLetters per word (no digits), sentences per 100 words
Minimum text100 words (default)
  • You are evaluating educational or curriculum materials. Coleman and Liau developed the formula specifically for assessing US public school textbooks. It is well calibrated for grade 4 through 12 content.
  • You want fully deterministic scores. Coleman-Liau counts only letters and sentence boundaries. No syllable heuristics, no phoneme rules. Two tools always agree on the same text.
  • You are running readability checks at scale. The formula is a single pass over the text with no dictionary lookups. For batch processing of large document collections, this matters.
  • You want to cross-check ARI. The two formulas use similar approaches but handle numerals differently. Running both and comparing is a quick way to detect whether digit-heavy content is driving a score in either direction.

Coleman-Liau counts letters in each word and sentence boundaries across the text. It then normalizes both counts to a per-100-words rate before applying the formula. This normalization makes the score the same whether you analyze 200 words or 2,000.

score = 0.0588 × L - 0.296 × S - 15.8

Where:

  • L = average letters per 100 words (letters only — digits do not count)
  • S = average sentences per 100 words

A higher L means longer words on average, which raises the score. A higher S means more (shorter) sentences per 100 words, which lowers the score. The constant -15.8 anchors the output to US grade levels.

The result is rounded to the nearest integer for the grade level.

CLI ScoreGrade LevelReading Context
1–2Grades 1–2Early primary
3–4Grades 3–4Primary school
5–6Grades 5–6Late primary
7–8Grades 7–8Middle school
9–10Grades 9–10Early high school
11–12Grades 11–12High school
13+College+Post-secondary

General-audience materials typically target grade 8 to 10. Patient-facing health materials generally target grade 6 to 8. Legal documents often score at grade 12 to 15.

r.coleman_liau() returns a ColemanLiauResult object:

FieldTypeDescription
scorefloatRaw CLI score before rounding
grade_levelslist[str]Grade level as a single-element list, e.g. ["10"]
grade_levelstrThe first item in grade_levels (property on the result)

Coleman-Liau has no additional fields. The result is a single grade level estimate with no supplementary data.

The library defaults to a 100-word minimum. Passing fewer words raises a ValueError.

# Fewer than 100 words raises ValueError by default
r = Readability(short_text)
r.coleman_liau() # ValueError: 100 words required.
# Lower the threshold with min_words — but scores become less reliable below 100 words
r = Readability(short_text, min_words=50)
r.coleman_liau()

Coleman and Liau’s original paper recommended samples of at least 300 words for reliable results. The 100-word minimum is a practical floor, not a guarantee of accuracy.

  • Letters-only counting creates edge cases with alphanumeric content. Coleman-Liau treats “H2O”, “CO2”, and “3D” differently from ARI because digits are excluded. Documents with chemical formulas, version numbers, or identifier codes will produce different character-to-word ratios than documents where the same concepts are written out in words.
  • Common long words score as hard. Like ARI, Coleman-Liau cannot tell “interesting” from “exsanguination.” Both count the same number of letters. The formula has no knowledge of word frequency or familiarity.
  • Prose only. The formula assumes running sentences with clear sentence boundaries. Bullet lists, tables, code blocks, and structured forms produce unreliable scores because sentence detection fails on non-prose content.
  • English only. The formula was developed and validated on English-language US school textbooks. Letter-length distributions differ across languages, and applying Coleman-Liau to non-English text is not appropriate.
from readscore import Readability
text = """
The judicial opinion in this matter turns on two questions. First, whether the contract
language unambiguously establishes the parties' intent. Second, whether the subsequent
conduct of the parties confirms or contradicts that reading. The court finds that the
contract is ambiguous on its face. The term "reasonable notice" is not defined anywhere
in the agreement, and the parties' course of dealing does not resolve the ambiguity.
Accordingly, the matter is remanded for further factual findings on the parties' intent
at the time of contracting. Each party shall bear its own costs on this appeal.
"""
r = Readability(text)
result = r.coleman_liau()
print(f"Score: {result.score:.1f}") # Score: ~13.4
print(f"Grade: {result.grade_levels}") # ['13']
  • ARI — same character-based approach, but counts digits too; includes a reader age range in the result
  • Flesch-Kincaid Grade Level — uses syllables instead of characters; better for content where word familiarity matters more than word length
  • Choosing a Metric — decision guide for all nine metrics