Baumol’s Cost Disease


“In 2000, X hours of work bought one television. Today, the same hours of work buy over 100 TVs.”

GPT-5.2 Thinking: Misleading. You can make the “100 TVs” line come out true only if you use the BLS’s quality-adjusted CPI index for televisions (hedonic adjustment). As a literal “how many TVs could I buy off the shelf?” claim, it’s not credible.


Also GPT-5.2 Thinking (fact-checking the Gemini Deep Research report used to generate the podcast episode):

The central mechanism of Baumol’s cost disease is explained mostly correctly (two sectors, wage catch-up, rising relative prices for labour-intensive services).

But the report’s empirical “proof” section is messier than it looks: it uses a popularised AEI-derived CPI chart and then misstates several headline figures and over-interprets what CPI categories mean (especially for healthcare and education).

It also oversells inevitability (“without limit”, “most empirically accurate forecasts in economic history”), uses incorrect terminology (“hyperinflation”), and makes speculative claims about AI “breaking the curve” that aren’t supported by evidence.


And a guide to critical discernment:

Here’s a practical “epistemic vigilance” guide for reading LLM-generated reports like the one you shared—reports that are often coherent, persuasive, and wrong in very specific, repeatable ways.

The core mindset

Treat an LLM report like a confident first draft by a bright intern who:

  • writes fluently,

  • knows many buzzwords,

  • blurs distinction between what’s true, what sounds plausible, and what was statistically likely to follow.

Your job is not to “fact-check everything.” It’s to find the load-bearing claims and stress-test them efficiently.

1) The three failure modes to expect (and pattern-match)

A) Correct mechanism, wrong magnitudes

LLMs often get the story right and the numbers wrong (or swap ranges, units, endpoints, or denominators).
Tell-tales

  • “over 200%” / “nearly 180%” style roundings

  • mixed time windows (e.g., “since 2000” but chart is 2000–2024)

  • “annual” vs “total” confusion

Your move

  • Identify 3–5 headline numbers and verify the underlying series/definition.

  • Ask: What exactly is being measured? What are the endpoints? What’s the base year?

B) Category error: price vs cost vs spending vs affordability

Many reports slide between:

  • prices (unit cost),

  • expenditures (price × quantity),

  • out-of-pocket (net of insurance/subsidy),

  • affordability (relative to income/wealth).

Tell-tales

  • “Costs are rising, therefore people can’t afford it”

  • “This sector is expensive, therefore it’s inefficient”

  • “Inflation in X proves Y policy failure”

Your move

  • Force the report into a sentence with explicit variables:

    • Spending = price × quantity

    • Out-of-pocket = spending − transfers/insurance

    • Affordability = out-of-pocket / income (distribution matters)

C) “Evidence laundering” via citations

LLMs frequently cite something real but use it to support a stronger claim than the source warrants.
Tell-tales

  • citations to:

    • advocacy sites, blogs, “visualization” pages,

    • generic institutional pages used as proof of a specific numeric claim,

    • ResearchGate links without publication details.

  • “One study found…” with a very crisp number.

Your move

  • For any sentence starting “studies show”:

    1. locate the study,

    2. confirm the number exists,

    3. confirm the number means what the sentence claims (scope, population, time period, causal claim).

2) The “Load-bearing claim” workflow (fast and brutal)

For each section, find one sentence that everything depends on. Then interrogate it.

Step 1: Extract the claim in standard form

Rewrite as:

  • Claim: X increased by Y over period T (in region R, measured by M).
    or

  • Causal claim: X causes Y through mechanism Z.

If you can’t rewrite it cleanly, it’s probably mush.

Step 2: Identify the hidden assumptions

Most LLM writing smuggles assumptions like:

  • “wages equalize across sectors”

  • “quality is held constant”

  • “prices reflect costs”

  • “measurement is comparable across categories”

  • “trend generalizes across countries/time”

Write them down. If the report never states them, penalize credibility.

Step 3: Decide the minimum verification

You don’t need to verify every sentence; you need to verify what would flip your conclusion.

  • If the report’s punchline depends on a chart, verify that chart’s origin and definitions.

  • If it depends on a study, verify the exact statistic and context.

  • If it depends on a causal story, look for at least one alternative mechanism it didn’t address.

3) Red-flag lexicon (words that often signal overreach)

When you see these, slow down and demand definitions:

  • “inevitable”, “without limit”, “proves”, “explains”, “the reason”

  • “hyperinflation” used for relative price changes

  • “the data show” without specifying dataset/units/endpoints

  • “one study found” followed by a precise number

  • “break the curve”, “crash prices”, “solve the problem” (especially with AI)

These words are not always wrong. They’re just where LLMs hide the leap.

4) Quant hygiene: a checklist that catches most numeric hallucinations

Run this on any quantitative paragraph:

  1. Unit sanity: percent vs percentage points; annual vs total; real vs nominal; per-capita vs aggregate.

  2. Denominator: “share of GDP” vs “per person” vs “per recipient”.

  3. Endpoints: exactly which months/years? any cherry-picked start?

  4. Deflator: CPI? PCE? sector-specific? quality-adjusted?

  5. Composition: is the category stable over time? (education/health categories often aren’t)

  6. Distribution: “average wage” is not “typical household”.

If a report doesn’t declare 3–5 of these, it’s doing vibes, not analysis.

5) Mechanism vigilance: how to check whether the story is coherent

LLMs excel at telling plausible stories. Your job is to see if the story is constrained.

Ask three questions:

  1. What observable implication would refute this?
    If none is offered, it’s unfalsifiable narrative.

  2. What would you expect to see in a cross-country comparison?
    If the report makes universal claims but never touches cross-country patterns, that’s a gap.

  3. What’s the simplest competing explanation?
    Example templates:

    • “It’s demand/quantity, not unit prices.”

    • “It’s market power/regulation, not productivity.”

    • “It’s measurement/quality adjustment, not real price change.”

A report that doesn’t mention plausible competitors is not doing serious work.

6) The “AI paragraph” heuristic

LLM reports often tack on an AI section that is:

  • confident,

  • unmeasured,

  • causally sloppy.

How to read it

Force it into an accounting identity:

  • Cost = labour + capital + materials + compliance + coordination
    Then ask:

  • Which component does AI reduce?

  • By how much, in which workflows, with what adoption constraints?

  • What second-order effects increase demand, compliance, or coordination?

If the section can’t survive this, treat it as futurist garnish.

7) Reference triage: ranking sources by epistemic weight

When a report cites sources, assign weights:

High weight (usually)

  • government statistical agencies (BLS/ONS/BEA), OECD, IMF (for definitions/series)

  • peer-reviewed journals and NBER/IZA working papers (with caution, but real)

  • methodology notes explaining how measurement is done

Medium weight

  • major think tanks with transparent methods (still can be ideological)

  • reputable journalism summarizing primary sources (verify anyway)

Low weight

  • blogs/visualization sites presenting charts without method

  • “X Institute” with a strong agenda and weak primary documentation

  • “ResearchGate link” without publication details

Then ask: Are the strongest claims supported by high-weight sources?
If the answer is no, the report is entertainment.

8) A scoring rubric you can actually use

Score each report (0–2 each; max 10):

  1. Definitions: key terms measured precisely?

  2. Data traceability: can you locate the exact series/study behind numbers?

  3. Quant hygiene: units/endpoints/deflators stated correctly?

  4. Competing explanations: acknowledged and handled fairly?

  5. Uncertainty discipline: does it distinguish fact, inference, speculation?

Interpretation:

  • 0–3: Narrative generator; don’t rely on it.

  • 4–6: Useful for ideas; verify before using.

  • 7–8: Rare; can inform decisions with targeted checks.

  • 9–10: Almost never from an LLM without strong human editing.

9) A “template” you can paste into your own reading notes

Use this to force structure:

  • Main claim (1 sentence):

  • Mechanism:

  • Load-bearing numbers (3):

    1. … source?

    2. … source?

    3. … source?

  • Definitions missing / ambiguous:

  • Alternative explanations ignored:

  • What would change my mind / refute it:

  • Decision: (discard / use as hypothesis / accept with caveats)

This turns your brain from “reader” into “auditor.”

10) The single best habit

When you feel persuaded, stop and ask:

“What sentence, if false, would collapse the whole argument?”

Then verify only that sentence properly.
This catches most LLM failures with minimal effort.

If you want, I can turn this into a one-page printable checklist, or tailor it to the kinds of reports you read most (policy, finance, scientific summaries, legal memos, etc.).

Next
Next

Survival Plan