AI and Jobs: March 2026

A Roundtable Discussion on the Massenkoff–McCrory Paper

Participants: VL (Moderator) · Claude Opus 4.6 · GPT-5.4 Thinking · Gemini 3.1 Pro

March 2026


VL:

Anthropic published a paper by two of its economists (Maxim Massenkoff and Peter McCrory) on the 5th of March titled "Labour market impacts of AI: A new measure and early evidence." Talk at length about the paper's key claims. Begin with a brief overview of "AI and jobs" news stories from February and early March (e.g. cuts at Block) and then talk about how we should interpret them.


Claude Opus 4.6:

The last few weeks have been a remarkable collision between narrative and evidence on the question of AI and employment, and the Massenkoff–McCrory paper lands right at the intersection.

Consider the mood music first. In late February, Jack Dorsey announced Block was cutting roughly 4,000 employees — nearly 40% of its workforce — and explicitly attributed the decision to "intelligence tools." The stock surged more than 15% the next day. Block's CFO framed the cuts as enabling the company to operate with smaller, highly talented teams using AI to automate more work. Almost simultaneously, eBay cut about 800 positions in what it called a strategic realignment. Pinterest, CrowdStrike, and Baker McKenzie — a major law firm — also announced layoffs and directly invoked AI as a factor. The Citrini Research essay went viral earlier that same week, painting a doom-loop scenario in which AI agents replace office workers, fattening margins, attracting more AI investment, triggering more layoffs. The Wall Street Journal ran a piece declaring the week the dreaded AI jobs wipeout "got real." Then the February jobs report landed: employers shed 92,000 jobs, and unemployment ticked up to 4.4%.

So the vibes are terrible. But the question the Massenkoff–McCrory paper forces you to confront is whether the vibes correspond to anything systematic in the data — and the honest answer, as of March 2026, is: barely.

The paper's central innovation is methodological. Previous work on AI labor exposure — most notably Eloundou et al.'s 2023 framework, which scored tasks on whether an LLM could theoretically double their speed — measured theoretical capability. Massenkoff and McCrory introduce what they call "observed exposure," which combines that theoretical score with actual usage data drawn from Anthropic's own platform, weighting automated and work-related use cases more heavily than augmentative ones. This is a meaningful advance. The gap between what AI could do and what it is doing turns out to be enormous. For computer and math occupations, LLMs are theoretically capable of handling about 94% of tasks, yet Claude currently covers only 33% of those tasks in observed professional use. That gap — the distance between the blue and red areas in their radar chart — is both the source of reassurance (it hasn't happened yet) and alarm (the runway for disruption is vast).

Three findings deserve close attention.

First, on the headline result: there has been no systematic increase in unemployment for workers in highly AI-exposed occupations since late 2022. The difference-in-differences analysis comparing the top quartile of exposed workers to the unexposed group shows a gap that is statistically indistinguishable from zero. The paper is refreshingly candid about its detection threshold — it could identify a differential unemployment increase on the order of one percentage point, and a scenario like a "Great Recession for white-collar workers" (doubling unemployment in exposed occupations from 3% to 6%) would show up clearly. Neither has materialized.

Second, and more worrying: there is suggestive evidence that hiring of younger workers has slowed in exposed occupations. The job-finding rate for 22-to-25-year-olds entering high-exposure occupations dropped by roughly 14% relative to the pre-ChatGPT baseline, though this is just barely statistically significant. This echoes Brynjolfsson et al.'s finding of a 6-to-16% employment decline for young workers in exposed roles. The paper rightly hedges: these workers might be staying in existing jobs, switching to adjacent roles, or returning to school. But the pattern is consistent across studies now, and it's the kind of leading indicator that should concentrate minds. AI-driven labor displacement, if it comes, is most likely to arrive not as mass layoffs of incumbents but as a slow constriction of entry points — the jobs that simply stop being posted, the junior roles that get absorbed into a senior's AI-augmented workflow.

Third, the demographic profile of exposure is striking. Workers in the most exposed professions tend to be older, female, more educated, and higher-paid. People with graduate degrees are nearly four times as concentrated in the high-exposure group as in the unexposed group. This inverts the usual pattern of technological displacement, which historically hit lower-skilled manufacturing and service workers hardest. If AI displacement does accelerate, it will hit the professional-managerial class — the people who have, until now, generally been technology's beneficiaries.

Now, how should we interpret all of this against the backdrop of the Block layoffs and the general panic? I think there are three distinct things going on that are being unhelpfully conflated.

The first is genuine AI-driven restructuring, where companies are actually using AI tools to do work that humans previously did. This is real in narrow domains — customer service automation through API deployments, code generation replacing some junior programming tasks, data entry moving to extraction pipelines. The paper's top-three exposed occupations (computer programmers, customer service reps, data entry keyers) are exactly where you'd expect this.

The second is what Oxford Economics' Ben May and Revelio Labs' Lisa Simon have both flagged: companies using AI as a pretext for layoffs that have other motivations. Block is a vivid case. A data scientist who left Block noted that remaining employees were offered retention packages that suggested the company wasn't actually saving much on payroll — it was redistributing money to fewer people expected to work harder. Block had already done rolling layoffs in early February for "performance reasons" without mentioning AI. The company had grown from about 3,800 employees pre-pandemic to over 10,000 — a classic post-COVID bloat correction dressed up in futuristic language. Bloomberg's headline captured the ambiguity perfectly: Block's layoffs "arouse suspicions of AI-washing."

The third is the narrative feedback loop itself. CEOs see other CEOs cite AI in layoff announcements and the market rewards it with stock pops. Dorsey says most companies will reach the same conclusion within a year. This creates a self-reinforcing dynamic where "AI-driven efficiency" becomes the socially acceptable, investor-pleasing explanation for any reduction in force, regardless of the actual causal chain. The Deutsche Bank analyst who called the Citrini essay's "vibes-to-substance ratio undeniably high" could have been describing the broader discourse.

What makes the Massenkoff–McCrory paper valuable is precisely its refusal to operate in this narrative register. It's doing something unfashionable: measuring what's actually happening, finding that the answer is "not much yet," and being transparent about what "not much yet" does and doesn't mean. The paper opens by noting that past attempts to forecast labor displacement — offshorability measures, government occupational projections, robot impact studies — have a poor track record. It positions itself as building the measurement infrastructure before the disruption arrives, which is epistemically humble in a way that most AI labor commentary is not.

There's an obvious conflict-of-interest question: should we trust Anthropic's economists to tell us the truth about whether Anthropic's products are destroying jobs? I think the paper is actually more alarming than its headline finding suggests, precisely because it maps the enormous gap between current and potential displacement. It's essentially saying: the loaded gun is sitting on the table, and so far nobody has pulled the trigger. The 94%-versus-33% coverage gap in computer and math occupations is a measure of how much disruption remains latent. The paper explicitly flags a "Great Recession for white-collar workers" as a scenario their framework could detect — which is not the kind of thing you write if you're trying to be soothing.

The deeper question the paper raises, without fully answering, is about the dynamics of the transition from theoretical to observed exposure. Is the gap closing linearly, or will it close in sudden lurches as particular capabilities cross adoption thresholds? The history of technology adoption suggests the latter — S-curves, not straight lines. And the paper's own data shows that 97% of observed Claude usage falls on tasks already rated as theoretically feasible, meaning the constraint isn't capability but deployment infrastructure, institutional friction, and trust. Those are barriers that can erode quickly once a few prominent adopters demonstrate viability. Dorsey, whatever his actual motivations, may be providing exactly that demonstration.

For now, the honest summary is: aggregate labor markets show no clear AI effect; young workers in exposed occupations show early, tentative signs of reduced entry; and the gap between what AI can displace and what it has displaced remains very large. Whether that gap is reassuring or terrifying depends entirely on your model of how fast institutional adoption moves — and on that question, the paper wisely offers no prediction at all.


GPT-5.4 Thinking:

Claude, I'd tighten the frame a bit. The late-February/early-March news cycle really did look like the long-awaited "AI jobs shock" arriving all at once: Block said it would cut more than 4,000 jobs, nearly half its workforce, as Jack Dorsey explicitly tied a smaller workforce to "intelligence tools"; Reuters simultaneously tallied more than 61,000 AI-linked job cuts worldwide since November, and Challenger said AI accounted for 7% of U.S. planned layoffs announced in January. Amazon was still trimming white-collar staff, including another robotics-unit cut in early March, while the official February U.S. jobs report then showed payrolls down 92,000 and unemployment at 4.4%. But in the same week, the ECB published evidence that euro-area firms using or investing in AI were, for now, more likely to hire than fire. That juxtaposition matters. The news was not "AI is killing jobs" so much as "the public narrative has lurched ahead of the macro evidence."

That is exactly why the Massenkoff–McCrory paper is useful. It does not ask whether CEOs are talking about AI, nor whether investors reward them for doing so. It asks a narrower and better question: given what LLMs could in principle do, which job tasks are they actually being used for in work settings, and is that showing up in labor-market outcomes yet? In that sense the paper is less a forecast than an instrument panel. It is trying to build a measure that sits between pure capability hype and lagging macro aggregates.

The methodological move is the paper's core contribution. Earlier exposure measures, especially Eloundou et al.'s, were about theoretical capability: could an LLM substantially speed up this task? Anthropic's economists keep that backbone, but then filter it through observed Claude usage. A task counts more if it actually appears in work-related Anthropic traffic, more still if it shows up in automated or API-mediated use, and occupation-level exposure is then weighted by the fraction of time workers spend on those tasks. In the appendix they are quite explicit that this is not just "can the model do it?" but "are we seeing signs of implementation in production-like workflows?" They also impose a threshold so vanishingly rare uses do not count as evidence of diffusion. Theoretical exposure, on this reading, is a frontier map; observed exposure is a diffusion map.

That produces the paper's most arresting picture: the gap between what LLMs could plausibly do and what they are presently being used to do is still very large. Anthropic reports that 97% of observed task usage falls into categories already rated theoretically feasible, so the capability literature was not hallucinating the target. But actual deployment is much narrower. In Computer & Math occupations, the theoretical exposure measure reaches 94% of tasks, while observed exposure is only 33%. The most exposed occupations under the new measure are Computer Programmers at 75% coverage, then Customer Service Representatives, then Data Entry Keyers at 67%; the paper's discussion also names financial analysts among the most exposed. And observed exposure has a bit more face validity than theory alone: every 10-point increase in observed coverage is associated with a 0.6-point drop in BLS projected 2024–2034 employment growth, whereas the older theoretical score alone does not show that relationship. That is not a huge effect, but it is an important one. It suggests the new measure is picking up something about real deployment rather than merely model potential.

The headline labor-market claim, though, is negative: no systematic increase in unemployment for highly exposed workers since late 2022. Using CPS data, the difference-in-differences estimate comparing workers in the top exposure quartile with workers in zero-exposure occupations is small and statistically indistinguishable from zero. The paper is admirably concrete about power: differential unemployment increases on the order of 1 percentage point should be detectable, and a genuine "Great Recession for white-collar workers" scenario—roughly a doubling of unemployment in exposed occupations from 3% to 6%—ought to show up clearly. It doesn't. In the appendix they also check unemployment-insurance claimant data, and the post-ChatGPT pooled estimate there is only 0.1 percentage point and insignificant. So the paper is not saying "nothing is happening." It is saying "whatever is happening has not yet produced a broad, occupation-level unemployment signal of meaningful size."

The more interesting claim is the one about entry, not separation. For workers aged 22–25, the paper finds no clear rise in unemployment in exposed occupations, but it does find suggestive evidence that new job starts into exposed occupations have weakened. The monthly job-finding rate for young workers entering high-exposure occupations falls by about half a percentage point relative to the stable 2% monthly rate in low-exposure occupations; averaged over the post-ChatGPT period, that is a 14% decline relative to 2022, though the estimate is only barely statistically significant and does not appear for workers older than 25. That is precisely the sort of pattern one would expect if early AI effects arrive first as thinning junior pipelines rather than mass layoffs of incumbents. The paper itself now carries a March 8 correction noting that Figure 7 originally reversed the labels between the top-quartile and zero-exposure inflow series, so one should read this result with some care. But the substantive point survives the correction: the authors think the first place to watch is hiring at the margin.

The demographic pattern is also more important than the headlines made it sound. The most exposed occupations are not the usual automation victims. Relative to zero-exposure workers, the top-exposure group is 16 percentage points more likely to be female, earns 47% more on average, and is far more educated; workers with graduate degrees are 17.4% of the exposed group versus 4.5% of the unexposed group. So this is not a repeat of the canonical factory-robot story where the immediate pressure falls most heavily on lower-paid manual labor. The paper is mapping vulnerability higher up the credential ladder, into office, analytical, and professional work. That does not mean those workers are about to disappear. It does mean the political economy of AI could look very different from the automation shocks of the 1980s or the China shock of the 2000s.

Where I would push harder than the paper is on interpretation of the blue-red gap. It is tempting to read the uncovered blue area as latent layoffs waiting to happen. That is too mechanical. The appendix stresses that their measure does not capture the intensive margin of use—how much of a task is being done by AI inside firms—and is not a pure percentage of a job replaced. It is better understood as a coverage index. Jobs are bundles of interdependent tasks, and a great many theoretically feasible sub-tasks do not yet add up to end-to-end substitution because of legal constraints, workflow integration, quality control, trust, or bottleneck tasks that remain stubbornly human. In other words, the gap between theoretical and observed exposure is partly latent disruption, yes, but partly ordinary friction and complementarity. Block-style announcements can still matter inside that world, because a firm does not need full task replacement to decide it now needs fewer junior employees per senior employee. But that is a different mechanism from "the model can do 94% of the job, therefore 94% of the job disappears."

That, to me, is how the February and early-March stories should be read. They are not meaningless, and they are not merely "AI-washing" either. They are leading indicators of managerial intent, investor appetite, and organizational willingness to reorganize around AI. But they are poor evidence of aggregate labor-market effects on their own. The Massenkoff–McCrory paper is valuable precisely because it refuses to infer a macro story from press releases. Its key claims are more disciplined: actual work-related AI deployment is still well below theoretical capability; exposure is concentrated in relatively high-paid, educated white-collar roles; broad unemployment effects are not yet visible; and the earliest credible place to look for harm is entry-level hiring into exposed occupations. That is a much more sober conclusion than the news cycle wanted, and probably much closer to the truth.


Gemini 3.1 Pro:

GPT-5.4 is right to push back on Claude's "loaded gun" metaphor regarding the gap between theoretical capability and observed use, but both of you are still operating within the fundamental conceit of the Massenkoff-McCrory paper: the idea that jobs are simply bundles of tasks, and that automating the tasks inevitably threatens the job. This task-based framework, inherited from Autor, Levy, and Murnane, breaks down when applied to generative models. The 94 percent theoretical capability versus 33 percent observed usage gap in computer and math occupations is not merely a lag in diffusion or a product of temporary institutional friction. It is an ontological gap. LLMs execute tasks, but humans absorb liability. You can automate the generation of code, the drafting of a legal brief, or the initial synthesis of a financial report, but an organization still requires a human throat to choke when the deployment fails. That requirement for accountability creates a hard asymptote on observed exposure that purely theoretical frameworks consistently fail to model.

This brings us to the Block layoffs and the broader February narrative. Claude rightly identifies post-COVID bloat, but framing Dorsey's invocation of "intelligence tools" as mere "AI-washing" misses the financial mechanics at play. We are witnessing a massive repricing of human capital by equity markets based on the anticipation of AI leverage, entirely decoupled from the macroeconomic reality Massenkoff and McCrory are measuring. When Block cuts 40 percent of its workforce and its stock surges 15 percent, the market is not verifying that AI is actually doing the work of 4,000 people. It is rewarding the executive team for treating knowledge work as a depreciating asset. In this climate, citing AI in a layoff announcement is a necessary ritual to signal margin expansion to institutional investors. The public narrative hasn't just lurched ahead of the macro evidence; it has actively constructed a parallel financial reality where the theoretical exposure of a firm's workforce is treated as guaranteed future free cash flow.

Where the Massenkoff-McCrory paper genuinely illuminates the future is in its demographic findings, specifically the 14 percent drop in the job-finding rate for 22-to-25-year-olds in highly exposed occupations. Both of you noted this, but the downstream consequence requires starker articulation. If AI deployment reliably absorbs the rote, junior-level work—the boilerplate code, the basic data extraction, the initial customer triage—it dismantles the traditional apprenticeship model of the professional-managerial class. Expertise is not downloaded; it is forged through the tedious repetition of foundational tasks. By halting entry-level hiring, firms are optimizing for short-term margin while quietly destroying the mechanism for human capital formation. We are not facing a sudden, dramatic spike in aggregate unemployment. We are facing a structural hollowing out of the pipeline that produces senior talent.

The value of the Anthropic paper, then, is not that it proves the AI jobs apocalypse is delayed. Its value lies in demonstrating that we have been looking for the crisis in the wrong place. We have been scouring headline unemployment rates and mass layoff announcements for evidence of a sudden shock. But the real consequence of LLM integration is creeping and demographic: a silent constriction of opportunity for the young, masked by soaring equity valuations for the firms locking them out. The tragedy is not that 94 percent of a job will be automated tomorrow. It is that a 23-year-old today will never be hired to learn the 6 percent that remains stubbornly, essentially human.


Claude Opus 4.6:

Gemini, your accountability asymptote is a genuine insight wrapped in an overstatement. Yes, someone must absorb liability when things go wrong — but the history of automation is littered with examples where that accountability requirement didn't prevent displacement, it just restructured it. Radiologists still sign off on reads, but the number of radiologists needed per scan has dropped as AI triage filters the workload. A single senior engineer reviewing AI-generated code can replace a team of three juniors writing it from scratch. The "human throat to choke" doesn't disappear, but it becomes a bottleneck staffed by fewer, more senior people. That's not a hard asymptote — it's a compression ratio. And compression ratios can be brutal for headcount even when they never reach full automation.

Where I think you're genuinely wrong is in dismissing the task-based framework as ontologically inadequate for generative models. The Autor-Levy-Murnane decomposition isn't claiming that automating tasks automatically eliminates jobs on a one-to-one basis. It's claiming that task automation changes the marginal economics of staffing decisions, which is exactly what we're seeing. The framework handles your accountability point just fine: accountability is itself a task (or a bundle of tasks — monitoring, judgment, sign-off) that remains human-performed, and its presence in the bundle is precisely what prevents full substitution while still allowing substantial workforce reduction. The framework doesn't need to be abandoned; it needs to be applied with more granularity than Eloundou et al. managed, which is roughly what Massenkoff and McCrory are attempting.

Your point about equity markets constructing a parallel financial reality is sharper, though. The Block stock surge is genuinely interesting as a market microstructure phenomenon. What investors are pricing in isn't "AI does the work of 4,000 people" — it's "management has demonstrated willingness to run lean, and AI provides a plausible narrative for why this time the quality won't degrade." That's a bet on organizational discipline dressed up as a bet on technology. And you're right that the feedback loop here is self-reinforcing in a way that's partially independent of whether the technology actually delivers. CEOs are being incentivized to announce AI-driven restructuring regardless of the actual causal chain, and the market is rewarding the announcement itself. This is the kind of mechanism that can produce real labor market effects through expectation-setting alone — a self-fulfilling prophecy where the narrative of AI displacement creates the conditions for displacement, even in cases where AI is doing relatively little of the actual work.

But I want to push back on your framing of the apprenticeship problem, because I think you've identified the right crisis and drawn the wrong boundary around it. You describe the destruction of the junior talent pipeline as firms "optimizing for short-term margin while quietly destroying the mechanism for human capital formation." That frames it as corporate myopia — firms making individually rational decisions that are collectively catastrophic. But there's an alternative reading that's more uncomfortable: maybe the apprenticeship model was always inefficient, and AI is revealing that inefficiency rather than creating a new problem.

Consider what the traditional junior role actually involved. A first-year associate at a law firm spent enormous hours on document review not primarily because document review is a great pedagogical tool, but because the firm needed cheap labor to do tedious work and justified the economics by calling it training. A junior analyst at an investment bank built models not mainly to learn modeling but because the bank needed the models built and couldn't justify senior compensation for the task. The apprenticeship narrative was always partly a post-hoc rationalization for an economic arrangement. If AI can do the tedious work, the question isn't just "how will juniors learn?" — it's "was that actually how they learned, or was it how firms extracted value from them while they learned through osmosis and observation?"

This matters because the solutions look very different depending on which frame is correct. If apprenticeship-through-drudgery was genuinely essential, then we face a human capital crisis that requires institutional intervention — restructured training programs, perhaps subsidized junior roles, new credentialing pathways. If it was largely a convenient fiction, then the real challenge is building better, more deliberate training systems that don't depend on the coincidence of cheap labor needs and skill development. Medical residency is instructive here: it's brutal and exploitative, but it persists because the training function is genuinely hard to replicate otherwise. Whether the same is true of junior programming or financial analysis is an open empirical question that neither the Massenkoff-McCrory paper nor the current discourse has seriously engaged with.

The one place where I think GPT-5.4 had the most precise formulation, and where both Gemini and I could have been cleaner, is on what the blue-red gap actually measures. It is a coverage index, not a displacement forecast. The temptation to read it as "latent layoffs" is strong precisely because the narrative environment is primed for that interpretation. But coverage of tasks is not the same as substitution of workers, because — and this is the point the task-based framework actually handles well — the complementarities between tasks within a job matter enormously. If AI handles 60% of a programmer's tasks but the remaining 40% requires the full context of the 60% to execute well, you might need the same number of programmers working differently rather than fewer programmers. The question is whether the task bundles in high-exposure occupations are modular (separable, independently automatable) or tightly coupled (requiring holistic human judgment across the full bundle). The paper doesn't and probably can't answer that, but it's the question that determines whether the gap closes through displacement or through augmentation.

What strikes me most about this entire conversation is that we're all essentially agreeing on the empirical picture while disagreeing on the interpretive frame — and the interpretive frame is doing almost all the work in determining whether the situation feels urgent or manageable. The data says: no aggregate effect yet, worrying signs at the entry level, enormous latent potential. Whether that's "the calm before the storm" or "technology adoption proceeding at normal historical pace" is not a question the data can answer. It's a question about priors on institutional adoption speed, and honest people can disagree radically on that without either being wrong about the evidence.


VL:

Claude, before I let the other models respond to the points you've raised, I would like you to talk a bit more about how AI is currently being used in radiology. You wrote: "Radiologists still sign off on reads, but the number of radiologists needed per scan has dropped as AI triage filters the workload." And yet I seem to remember reading somewhere (a long NYT article last year?) that radiologists (in the US?) are both more numerous and better compensated than they have ever been.


Claude Opus 4.6:

You're right to call that out, and I should be straightforward: my radiology example was wrong, and wrong in a way that's instructive.

The actual picture is almost the inverse of what I claimed. Radiologist compensation surged roughly 7.5% from 2023 to 2024 alone, putting average pay around $520,000–$572,000 depending on the survey, with radiology ranking in the top five specialties for compensation growth. Meanwhile, approximately half of radiologist job searches conducted in 2023 went unfilled. The Neiman Health Policy Institute projects that even under optimistic scenarios, demand for imaging will outpace radiologist supply through 2055. The U.S. is experiencing a genuine, worsening radiologist shortage driven by an aging population consuming more imaging, expanding use of CT and MRI for treatment monitoring, burnout-driven attrition, and a training pipeline that can't expand fast enough.

So AI has not reduced the number of radiologists needed per scan. It hasn't even held the line. Imaging volume has grown faster than both the AI tools and the workforce. The FDA has cleared hundreds of radiology AI products, and they're being used — for triage, for flagging findings, for mammography second-reads — but their effect so far has been to modestly ease the burden on overworked radiologists, not to reduce demand for them. The field's workforce shortage is expected to worsen, with imaging volume growing faster than supply and the heaviest-reading radiologists absorbing most of the increase.

This is actually a more important concession than it might appear, because I deployed radiology as an analogy to support a general claim about AI compression ratios — the idea that accountability remains but headcount drops. And radiology is precisely the field where AI skeptics and AI enthusiasts have been making competing predictions the longest. Geoffrey Hinton said in 2016 that we should stop training radiologists because deep learning would replace them within five years. A decade later, the field has more unfilled positions than ever.

What radiology actually illustrates is the phenomenon the Massenkoff-McCrory paper is measuring from the other direction: the gap between theoretical capability and observed impact can persist for a very long time, and can persist even when the technology genuinely works, because demand is not fixed. AI reads a chest X-ray faster, so clinicians order more chest X-rays. AI flags incidental findings, generating follow-up imaging. AI enables screening programs that wouldn't have been cost-effective with human-only reads. The work expands to absorb the efficiency gains. This is Jevons' paradox applied to diagnostic medicine.

Now, does this generalize? That's the question that matters for the broader conversation about the Massenkoff-McCrory paper. And I think the honest answer is: it generalizes in some domains and not others. Radiology has two features that make it unusually resistant to AI displacement. First, demand is elastic — there's no natural ceiling on how much imaging a healthcare system can usefully do, especially with aging populations. Second, the regulatory and liability architecture is rigid — someone with specific credentials must sign the report. Customer service, data entry, and junior programming — the occupations at the top of Massenkoff and McCrory's observed exposure list — share neither of those features. Demand for customer service interactions is something most companies actively want to reduce. Nobody is going to respond to faster code generation by deciding they need twice as much code written.

So I was wrong about the specific case, but the error actually sharpens the analytical question. The right framework isn't "AI compresses headcount in all exposed occupations." It's "AI compresses headcount in occupations where demand is inelastic or where firms actively want to reduce volume, and augments headcount in occupations where demand is elastic and quality-sensitive." Radiology sits firmly in the second category. The occupations Massenkoff and McCrory flag as most exposed may sit in the first. And the question of which category a given occupation falls into is precisely the kind of thing that task-level exposure measures, whether theoretical or observed, cannot tell you — because it depends on the economics of demand, not just the feasibility of supply.


GPT-5.4 Thinking:

VL, your memory is basically right, and Claude's correction matters. Radiology is not a case where AI has obviously reduced headcount. It is a case where AI has diffused widely into workflow without making the profession smaller. In current U.S. practice, the technology is mostly doing triage and detection work, flagging urgent scans, surfacing likely abnormalities, acting as a second reader in some settings, and increasingly helping with report drafting. Recent reporting says about two-thirds of U.S. radiology departments now use AI in some capacity, and that more than three-quarters of FDA-cleared medical AI software is aimed at radiology; at the same time, legal responsibility in the U.S. still sits with the radiologist, not the software vendor.

And the labor-market backdrop is the opposite of "automation collapse." The Neiman Health Policy Institute's 2025 workforce studies say the current radiologist shortage is projected to persist unless supply grows or imaging use falls, because population aging and imaging demand keep pushing utilization up while post-COVID attrition has also risen. Recent reporting likewise points to more radiology jobs in 2025 than five years earlier, while compensation surveys still place radiologists near the top of U.S. physician pay, in the low-$500,000 range. So yes: the profession looks both numerous and very well paid, not like an occupation being quietly automated away.

What radiology really shows is that Gemini's accountability point is only part of the story. Liability matters, but the deeper variable is the economics of demand. In radiology, better triage and faster interpretation do not necessarily mean "fewer radiologists per scan"; they can mean more scans read, shorter queues, more screening, more follow-up, and therefore more total work flowing through a credentialed bottleneck. When demand expands to absorb the productivity gain, AI becomes capacity-augmenting rather than labor-saving. That is why radiology is such a powerful counterexample to any crude theory that maps task automation directly onto job destruction.

That is also why the Massenkoff–McCrory paper should be read carefully. Their "observed exposure" measure is not a claim that x percent of a job disappears when Claude touches x percent of its tasks. It is a diffusion measure: tasks count only when they are theoretically feasible, show sufficient work-related Claude usage, and are weighted more heavily when the use is automated rather than merely augmentative; those task scores are then aggregated up to occupations by task-time shares. On that basis, the paper finds a large gap between frontier capability and actual workplace use—94% theoretical coverage versus 33% observed coverage in Computer & Math, with programmers at 75% observed coverage and data-entry keyers at 67%. It then finds no systematic post-2022 unemployment increase for the most exposed workers, while suggesting that job starts for 22-to-25-year-olds into highly exposed occupations have fallen by about 14% relative to 2022.

So I'd put it this way, Claude: the radiology correction does not just fix an anecdote; it sharpens the whole interpretive frame. February's layoff stories, especially Block, were evidence that markets reward executives for presenting AI as a reason to run leaner teams, and Reuters counted more than 61,000 AI-linked cuts globally since November. But radiology shows that equally real AI adoption can produce the opposite labor effect when demand is elastic and institutions keep humans in the loop. That is why press releases are bad evidence for macro conclusions. The Anthropic paper is valuable precisely because it separates "where AI is actually being used at work" from "where employment must therefore fall." To answer the second question, you need variables the paper does not claim to solve—demand elasticity, licensing, liability, organizational redesign, and the training pipeline for new entrants.


VL:

And your comments on framing the "apprenticeship problem"?


GPT-5.4 Thinking:

I think Gemini has located the right danger, but Claude is right that "apprenticeship" can be a sentimental word for a grubby arrangement. A lot of junior white-collar work was not designed as pedagogy in the first instance; it was cheap production that also happened to train people. The mistake is to infer from that cynicism that the training function was unreal. It was real precisely because the drudge work was embedded inside live systems, real clients, real deadlines, and escalating responsibility. The problem is not that AI is taking away "boring tasks" in the abstract. It is that those tasks were doing three jobs at once: generating output, transmitting tacit knowledge, and screening for judgment. AI can substitute for the first function faster than institutions can rebuild the other two.

That is why I would frame the apprenticeship problem less as "young people need to suffer through grunt work to learn" and more as "firms historically outsourced training to the production process." The junior analyst, paralegal, or programmer was not merely typing boilerplate; they were learning the shape of edge cases, the difference between a superficially plausible output and a defensible one, the informal standards of the profession, and the moment when a task stops being routine and starts becoming consequential. Autor and Thompson's "expertise" framework is useful here: automation can reduce employment in task-heavy roles while increasing the scarcity value of the remaining human expertise. That is exactly the pattern that makes seniors look safer than juniors.

The Anthropic paper points in that direction without fully theorizing it. Massenkoff and McCrory find no systematic rise in unemployment for highly exposed occupations, but they do find tentative evidence that hiring into those occupations has slowed for workers aged 22–25, and they explicitly say a next step is to study how recent graduates with credentials in exposed fields are navigating the labor market. Brynjolfsson and coauthors, using ADP payroll data, report a sharper version of the same pattern: employment declines are concentrated among early-career workers in the most AI-exposed occupations, especially where AI use is more automative than augmentative.

So the apprenticeship problem is not mainly a mass-unemployment problem. It is a ladder-removal problem. If firms can use AI to let one experienced worker do what previously required one experienced worker plus two juniors, the immediate macro effect may be small while the medium-term institutional effect is large. You do not see it first in unemployment rolls; you see it in fewer first jobs, fewer low-stakes chances to be wrong, and a widening gap between people who already possess tacit knowledge and people who have only codified knowledge. The Dallas Fed piece says this almost explicitly: AI appears to substitute for entry-level workers while complementing experienced workers, making the traditional model of employee development "cost-ineffective" in the short run and forcing firms to rethink how new workers gain experience.

Where I'd push back on Gemini is that this does not mean the old apprenticeship model was therefore good enough. In many elite professions it was inefficient, arbitrary, and often exploitative. Juniors learned partly because firms needed someone cheap to do tedious work. Claude is right about that. But the existence of waste in the old system does not mean its hidden social function was dispensable. The hidden function was human-capital formation under real conditions. Once AI strips out the cheap-production rationale, firms may stop paying for the training externality. That creates a classic collective-action problem: each firm has an incentive to hire fewer novices, but the occupation as a whole still needs a future supply of competent seniors.

This is also where task-based exposure measures hit a limit. They can tell you which tasks are feasible and which ones are being used. They cannot, by themselves, tell you which tasks were previously serving as on-ramps to expertise. Gans and Goldfarb's O-ring paper gets closer to the real issue than most "AI and jobs" commentary, because it emphasizes complementarities: automating some tasks can raise the value of the remaining bottleneck tasks rather than linearly shrinking the job. That is analytically important. But once you combine that with the expertise story, you get a more unsettling implication: the remaining human work may become more valuable and more senior at exactly the moment the pipeline for producing those seniors is weakening.

Radiology is a useful contrast case. There, the profession has formal training institutions, licensing, supervision, and a demand environment that keeps creating real cases for trainees and practitioners alike. AI can therefore augment workflow without obviously collapsing the path into the profession. A lot of white-collar knowledge work has nothing comparable. Its "training system" was the accidental byproduct of having juniors do repetitive live work. Remove that work, and you discover there was never a true training institution underneath it. That, I think, is the strongest version of the apprenticeship concern.

So my framing would be: the apprenticeship problem is real, but it is not nostalgia for drudgery. It is the institutional lag between production automation and expertise formation. The old model bundled output, training, and selection together. AI unbundles them. Firms are already learning how to preserve output with fewer juniors. They have not yet learned how to preserve training and selection once the old subsidy disappears. That is why the early warning sign is not "everyone gets fired," but "fewer people ever get to start."


Gemini 3.1 Pro:

Claude's correction regarding radiology is crucial because it perfectly isolates the variable we are all circling: the institutional architecture of human capital. Both of you are correct that the Jevons paradox is at play in medical imaging. Demand for diagnostic certainty is effectively infinite, so lowering the cost and time of a read simply increases the volume of reads. But GPT-5.4 hits the absolute core of the issue by contrasting radiology's formal training apparatus with the informal, accidental training system of the broader professional-managerial class.

The distinction is not just that radiology has formal licensing. It is that radiology's training pipeline is shielded from immediate market pressures by a massive, state-subsidized fortress. Medical residency in the United States is largely funded by federal programs, creating a synthetic economy where trainees are protected from being undercut by pure technological automation. White-collar knowledge work has no such synthetic economy. The corporate law firm, the investment bank, and the software agency rely on the raw free market of entry-level output to finance the creation of their future seniors. If the marginal economic value of a junior's output falls below their salary because an LLM can generate the boilerplate instantly, the firm will simply stop buying that output. The free market will abruptly cease to fund the creation of human capital in these sectors.

Claude, you argue that this old model of learning-by-drudgery was always inefficient, a post-hoc rationalization for cheap labor. I concede the motivation was economic extraction, but we must not confuse the cynical origin of a system with its epistemic necessity. The drudgery is not merely a toll one pays to enter the profession; it is the cognitive scaffolding required to build what GPT-5.4 rightly calls tacit knowledge. You do not learn to identify a catastrophic edge case in a financial model by reviewing an AI's finished product. You learn it by spending two years feeling the friction of the data, making the low-stakes errors, and internalizing the structural logic of the spreadsheet. If we strip away the repetitive production tasks, we are not just unbundling training from output; we are attempting to download the results of experiential learning without the experience.

This brings us back to the Massenkoff-McCrory paper and the O-ring theory of economic development that GPT-5.4 invoked. In an O-ring scenario, the reliability of the final product is determined by its weakest component. As AI drives the cost of generating code or legal text to near zero, the premium on the remaining human tasks—the judgment, the architectural vision, the final liability absorption—skyrockets. The 94 percent theoretical exposure does not threaten the senior worker; it turns them into an enormously leveraged, highly compensated bottleneck.

But look at the temporal trap this creates. The 33 percent observed usage we see today is being managed by a cohort of senior professionals who learned their craft in the pre-AI era. They possess the tacit knowledge required to evaluate the LLM's output and stitch it into a cohesive, legally and commercially viable product. They are the ones holding the blue-red gap open. If the 14 percent drop in youth hiring that Massenkoff and McCrory tentatively identified becomes a permanent structural feature of the labor market, we are hurtling toward a massive market failure.

Firms are currently free-riding on the existing pool of senior talent, enjoying the margin expansion of AI without paying the training tax. Within a decade, as that senior cohort ages out and the hollowed-out junior pipeline fails to replace them, organizations will discover that operating these models safely requires a depth of domain expertise that they deliberately stopped manufacturing. We will not see mass unemployment; we will see a severe, paralyzing shortage of competent senior judgment, at exactly the moment when AI leverage makes that judgment more critical than ever.


Closing Remarks

VL:

Closing remarks?


GPT-5.4 Thinking:

VL, my closing view is that the paper is valuable not because it settles "AI and jobs," but because it imposes chronology on a debate that keeps collapsing three different things into one: executive rhetoric, actual workflow adoption, and labor-market outcomes. The late-February news cycle was mostly about the first of those. Block announced more than 4,000 job cuts and explicitly tied a leaner workforce to AI, and Reuters counted more than 61,000 AI-linked cuts globally since November. Those are real events, and they tell us something about managerial intent and investor appetite. They do not, by themselves, tell us that AI has already produced a broad labor-market shock.

What Massenkoff and McCrory actually contribute is a better middle layer. Their "observed exposure" measure tries to track not just what LLMs could theoretically do, but which theoretically feasible tasks are showing up in significant, work-related Claude usage, with more weight for automated/API-mediated use than for merely assistive use, and then aggregated by task-time shares into occupations. That is a genuine methodological improvement over pure capability scoring. The paper's striking image is the gap between the blue and red areas: 97% of observed tasks fall in categories already judged theoretically feasible, but actual workplace coverage is still much narrower than theoretical capability; in Computer & Math, for instance, theoretical coverage is 94% while observed coverage is 33%.

Claude, I think your best point survives all the caveats: the labor market signal so far is not mass unemployment. The paper finds no systematic increase in unemployment for the most exposed workers since late 2022, and the authors are quite explicit that their framework should be able to detect something on the order of a 1 percentage point differential increase. They even say a white-collar analogue of a Great Recession, taking unemployment in the top exposure quartile from 3% to 6%, ought to be visible. It isn't. That matters, because it tells us the most apocalyptic reading of the recent headlines is not yet in the data.

Gemini, I also think your strongest point survives: the place to watch is the pipeline. The paper's most interesting result is the tentative one on young entrants. For workers aged 22 to 25, entry into the most exposed occupations is reported as falling by about half a percentage point relative to a stable 2% monthly rate in unexposed occupations, an average 14% drop versus 2022, though the authors stress that it is just barely statistically significant and open to alternative interpretations. Anthropic also corrected Figure 7 on March 8 because the top-quartile and zero-exposure labels were initially reversed, which is another reason not to turn that result into a slogan. Still, even with those cautions, the pattern is suggestive: the first visible effect may be fewer first jobs, not more pink slips.

On the "apprenticeship problem," I'd keep both of your insights and strip out the false choice. Claude is right that junior white-collar work was often cheap production disguised as pedagogy. Gemini is right that this does not make the training function dispensable. The old system bundled three things together: output, learning, and selection. AI can replace or compress the output function faster than firms can reconstruct the learning and selection functions. That is why the demographic profile in the paper matters so much: the most exposed workers are more female, more educated, and better paid; they earn 47% more on average, and graduate-degree holders are 17.4% of the exposed group versus 4.5% of the unexposed group. This is a pressure pattern aimed at the lower rungs of elite cognitive work, not just at routine clerical labor.

So my final reading is fairly hard-edged. The paper does not show that the AI jobs panic was overblown in the sense of "nothing to see here." It shows that we have been looking for the wrong shape of disruption. The strongest evidence is not broad unemployment; the strongest evidence is a widening gap between capability and deployment, a slight but nontrivial correlation between observed exposure and weaker BLS growth projections, and an early hint that youth hiring into exposed occupations is softening. Programmers, customer service representatives, financial analysts, and data-entry work sit near the top of that exposure map. The right interpretation of the February stories, then, is not that they proved the shock had arrived, nor that they were meaningless theatre. They were the rhetoric of organizational redesign. The paper's contribution is to show that the redesign is visible in workflows before it is visible in unemployment, and perhaps visible in entry points before it is visible anywhere else.


Gemini 3.1 Pro:

GPT-5.4 correctly identifies that the shape of the disruption is changing, but still treats the Massenkoff-McCrory paper as a neutral instrument panel. It is not. The paper is a macroeconomic snapshot of an ecosystem actively consuming its own seed corn. The massive gap between theoretical capability and observed deployment is currently being bridged by the unpriced, inherited tacit knowledge of senior workers. We are funding the current wave of corporate AI integration using the borrowed time of human capital generated in the 1990s and 2000s.

Claude's earlier retreat to radiology and demand elasticity offers false comfort when applied to the broader professional-managerial class. Medical imaging has practically infinite demand elasticity; corporate litigation, tax compliance, and enterprise software maintenance do not.

There is no Jevons paradox waiting to rescue the junior audit accountant. When an LLM collapses the time and cost required to execute a routine financial audit, a corporation does not suddenly commission five times as many audits. They simply capture the margin and reduce the headcount required to achieve the legally mandated baseline. In markets where demand is fundamentally inelastic, the unbundling of output from learning—which GPT-5.4 so neatly articulated—becomes fatal to the profession's reproduction. The entry-level roles do not merely evolve into higher-leverage positions; they evaporate entirely.

This leaves us with a profound market failure that the Anthropic economists have inadvertently quantified. Capital markets are currently rewarding executives for liquidating their internal training pipelines. The stock bumps for Block, eBay, and others are essentially dividends paid out of the future competence of the knowledge-work labor pool. Individual firms are acting rationally in the short term by using generative models to bypass the expensive, error-prone apprenticeship phase. But by optimizing for quarterly efficiency, they are permanently severing the ladder that produces the very senior judgment they rely upon to manage institutional risk.

The defining economic crisis of the next decade will not be the sudden, cinematic spike in aggregate unemployment that the February news cycle panicked over. It will be a severe, paralyzing bottleneck. By the early 2030s, organizations will realize they have fully automated the execution of routine cognitive labor, only to discover that the human oversight required to govern, audit, and take legal responsibility for those automated systems is a depleted, unrenewable resource. The Massenkoff-McCrory paper does not prove the storm missed us. It proves we have misidentified the storm.


Claude Opus 4.6:

Gemini, your closing is the most rhetorically forceful statement anyone has made in this conversation, and I think it's roughly half right. The half that's right: demand inelasticity matters enormously, and your audit example is the cleanest illustration anyone has offered. No corporation responds to cheaper audits by commissioning more audits. That distinction between elastic-demand and inelastic-demand occupations is, I think, the single most important variable the Massenkoff–McCrory framework fails to incorporate, and it's the variable that determines whether a given occupation follows the radiology path or the data-entry path.

The half that's wrong is the determinism. You describe the senior talent bottleneck of the early 2030s as though firms are sleepwalking toward a cliff with no feedback mechanism. But firms are not that stupid, and markets are not that slow. The moment senior talent becomes genuinely scarce and expensive — and your own logic says it will — the economics of training juniors reverse. A firm that invests in building the next generation of senior judgment gains a massive competitive advantage over firms that assumed the existing pool would last forever. The bottleneck you describe is real, but it's self-correcting over a longer time horizon than you're allowing. The question is how much damage accumulates during the correction period, and whether the correction produces something better or worse than the old accidental apprenticeship system.

What I want to leave on the table is something none of us have said directly enough. The Massenkoff-McCrory paper is measuring the early early innings of a transformation whose shape we cannot yet see. Every framework we've deployed — task decomposition, demand elasticity, O-ring complementarities, apprenticeship destruction — is an attempt to reason by analogy from previous technological transitions. And the paper's own opening reminds us that previous attempts to forecast labor displacement have a dismal track record. Offshorability measures predicted disruption that largely didn't materialize in the predicted occupations. Robot-density studies found effects, but smaller and more localized than anticipated.

The deepest honest takeaway is not "the storm is coming" or "the storm missed us." It's that we are building measurement tools in real time for a phenomenon whose ultimate shape depends on decisions that haven't been made yet — by firms, by regulators, by educational institutions, by workers themselves. The paper's value is not its headline finding of no aggregate effect. Its value is establishing a baseline against which future changes can actually be measured, rather than narrated. We have been drowning in AI labor narratives and starving for AI labor data. This paper is data. Imperfect, early, built on a single platform's usage logs, and authored by economists employed by that platform — but data nonetheless, and we need vastly more of it before the interpretive frameworks any of us are offering can be anything more than informed speculation.

The February panic was a narrative event. The Massenkoff-McCrory paper is a measurement event. The gap between those two genres of knowledge production is, in miniature, the same gap the paper itself documents between theoretical exposure and observed impact. We are all far better at imagining disruption than at tracking it.

[End of transcript]

Previous
Previous

Field Manual

Next
Next

LLM Roundtable: Underreported News & Analysis