Rubrics

Prompt:

Devise a detailed multifactorial rubric for reducing the question X to something objectively answerable - or at least (if absolute objectivity here is impossible) highlight the independent (or semi-independent) axes of evaluation.



Claude 4.8 on ‘two-tier’ policing:


"Two-tier policing" resists an objective answer for three separate reasons, and a good rubric works by prising those reasons apart rather than trying to dissolve them all at once. First, the term is essentially contested: it is used to assert nearly opposite empirical claims by different speakers. Second, the question fuses a descriptive sub-question ("how prevalent") with a normative one ("is this a problem"), and the second cannot inherit objectivity from the first. Third, the core allegation — that like cases are treated unalike — depends on a counterfactual ("how would the other group have been treated?") that is usually unobservable. The rubric below is organised so that each axis isolates one of these difficulties, so that two people who disagree can at least locate where they disagree (definition, evidence, mechanism, or value).


Step 0: Disambiguate the referent (the gate everything else passes through)

Before any axis applies, you have to fix which claim is on the table, because "two-tier policing" currently names at least six distinct propositions, several mutually contradictory:

  1. Differential treatment by group identity of the policed — e.g. the post-Southport claim, amplified by Reform UK and others, that white or right-coded protesters are handled more harshly than ethnic-minority or left-coded ones. The term gained traction after disorder following the 2024 Southport attack, with figures such as Nigel Farage and Suella Braverman alleging police are more heavy-handed with right-wing groups and softer on protests like Black Lives Matter and pro-Palestinian marches.

  2. The mirror claim (older, left-coded) — that ethnic minorities are over-policed (stop-and-search disparities, the Macpherson/"institutional racism" lineage).

  3. Differential treatment by cause/ideology of protest — pro-Palestine vs. other demonstrations, climate protest, etc.

  4. Differential treatment by crime type or victim — e.g. allegations that some offences are under-investigated for fear of community tension, or that non-crime hate incidents are recorded asymmetrically.

  5. Resource/geographic two-tier — different service levels by area or wealth (the banal, non-ideological sense).

  6. Differential speed/severity of justice — fast-tracked riot sentencing versus slow processing elsewhere.

Claims 1 and 2 point in opposite directions and cannot both be the dominant systemic pattern without further specification; treating them as one question is the single most common source of confusion. A well-formed question names exactly one proposition, its alleged favoured tier, and its alleged disfavoured tier. Everything below is to be applied to one such proposition at a time.


Descriptive axes — operationalising "how prevalent"

Axis A — Locus in the justice pipeline. Decide whether the claim is about policing at all. Charging is the CPS; sentencing is the judiciary and Sentencing Council; only deployment, arrest, use of force, and recording are the police. Much of the public "two-tier" discourse about swift, severe riot outcomes is actually about charging and sentencing, not policing — and conflating them lets a claim survive refutation by sliding between institutions. Score the claim for which stage(s) it actually implicates.

Axis B — Unit and level of analysis. Disparities can arise at (i) individual officer discretion, (ii) force-level operational command, (iii) national guidance (College of Policing, NPCC), or (iv) statute. An anecdote about one officer is a different proposition from a claim about systemic policy, and evidence for one is weak evidence for another.

Axis C — Routine vs. exceptional context. Steady-state neighbourhood policing and large-scale public-order events run on different legal tools (Public Order Act powers), command structures, and threat assessments. Most two-tier allegations concern exceptional events; generalising from them to "policing" as such is a scope error to be flagged.

Axis D — Direction, explicitly stated. Which tier is favoured and which disfavoured? Requiring this prevents the claim from quietly reversing when challenged, and makes it falsifiable.

Axis E — The comparator / counterfactual. This is the crux. A two-tier claim is unfalsifiable without a matched comparison: the same conduct, comparable circumstances (crowd size, prior violence at similar events, available evidence, threat intelligence), differing only in the tier variable. Score how well the claim specifies its comparator and whether a real comparator exists or is merely imagined. Vivid asymmetric anecdotes (two incidents that "look" alike on camera) are not matched comparisons.

Axis F — Treatment vs. impact vs. outcome. Distinguish three things that "disparity" can mean: equal treatment producing different outcomes because base rates differ; different treatment with non-biased intent (a proportionate response to a genuinely different threat); and different treatment with biased intent (the actual two-tier claim). Outcome disparities (e.g. stop-and-search rates by ethnicity) are well documented but are evidence of the explanandum, not of its cause.

Axis G — Magnitude and threshold. "Prevalent" needs a number behind it. Specify: frequency (share of relevant decisions affected), effect size (how large the differential after controls), and systematicity (isolated incidents vs. patterned across forces/time). Set the threshold in advance — otherwise any nonzero count "proves" prevalence and any imperfect dataset "disproves" it.

Axis H — Evidence type and its native strength. Rate what evidence each sub-claim could in principle rest on and how good it is:

  • Strong/available: Home Office stop-and-search and use-of-force statistics, arrest-to-charge ratios, MoJ sentencing data, HMICFRS inspections. HMICFRS's "Activism and impartiality" inspection, commissioned in 2023, reviewed over 4,000 documents, examined 120 non-crime hate incident records, and analysed over 857,000 police social media posts.

  • Weak/scarce: the protest-comparison version, where n is tiny, events are non-comparable, and confounds dominate. The Commons Home Affairs Committee found the policing response to the 2024 riots "entirely appropriate" given the violence, with no evidence of two-tier policing — illustrating that the official-inquiry evidence base exists but reaches conclusions contested by claimants.


Causal axis — the mechanism (matters for "prevalent" and "problem")

Axis I — Mechanism, if a disparity is established. Sort the cause into: explicit policy; structural/institutional process (e.g. intelligence-led deployment that tracks where past disorder occurred); implicit bias; rational risk-differentiation (treating a crowd with a history of violence differently is a difference in inputs, not in standards); or measurement artifact (e.g. video-heavy events yield faster charges because evidence is abundant — a confound for the "fast-tracked unfairly" claim). The normative verdict turns heavily on which of these is operating, so this axis bridges the descriptive and normative halves.


Normative axes — operationalising "is this a problem"

These are irreducibly value-laden; the goal is to make the value premise explicit, not to hide it inside the data.

Axis J — Which conception of fairness? Formal equality (any treatment difference for like conduct is wrong) will score borderline cases as problems; proportionality/operational-necessity (differences justified by genuine threat differences are legitimate) will not; an equity view (differential policing to redress historic over-policing is acceptable) reverses the sign for some claims. State which you are applying.

Axis K — Perception and legitimacy (semi-independent of truth). Because British policing rests on consent, perceived two-tier treatment damages confidence even where the empirical disparity is absent — a genuinely distinct problem with its own evidence base (public-confidence surveys). HMICFRS found that police involvement in certain public engagements can create a perception of bias even when none is intended, undermining public confidence — making it crucial that police are not only impartial but seen to be. A complete answer scores the reality and the perception axes separately; they can diverge.

Axis L — Severity, scope, and reversibility. A problem's weight scales with how many people are affected, how grave the consequence (a slower response vs. a wrongful liberty deprivation), and whether it is correctable. This prevents treating a recording inconsistency and a systematic charging bias as equivalent "problems."

Axis M — Baseline and tractability. "A problem" relative to what — a frictionless ideal, comparable jurisdictions, or the system's own prior state? And is the residual disparity one policy can fix, or an artifact of deeper base-rate differences no policing reform touches?


Turning the axes into a scorable rubric

For any single, disambiguated claim, the answerable composite is roughly:

Existence × Magnitude × Mechanism-culpability × (Reality-weight + Perception-weight) → Problem-rating, where confidence rises when the claim specifies a real matched comparator (E), isolates the police stage (A), controls the obvious confounds (F, I), rests on systematic rather than anecdotal evidence (H), and survives the threshold set in advance (G) — and falls when any of these is missing.

Concretely, a claim earns a high "this is a real and serious problem" score only if: a disparity is demonstrated against a genuine comparator; it is patterned not isolated; the mechanism is biased intent or unjustified structure rather than rational risk-differentiation or artifact; and it is located at the police stage actually accused. A claim can simultaneously score low on reality and high on perception/legitimacy — which is, on the current evidence, where several versions of the UK debate actually sit.


What remains irreducible

Three residues survive every refinement, and honesty requires naming them rather than papering over them. The choice of fairness conception (Axis J) is a value commitment, not a finding. The counterfactual (Axis E) is frequently unobservable, so confidence is capped below certainty even with good data. And the disambiguation step (Step 0) means there is no single fact of the matter about "two-tier policing" tout court — only verdicts on specific propositions, which can come out differently. The rubric's payoff is not a single objective answer but legibility: applied honestly, it forces any disputant to say which proposition, which comparator, which mechanism, and which value they are invoking — at which point most apparent disagreements turn out to be disagreements about different questions.

If it would help, I can apply this rubric end-to-end to one specific version of the claim (say, the protest-disparity version or the stop-and-search version) and show how each axis scores against the available UK evidence.

Next
Next

Out of Season