← Back to Home

Viral BridgeBench Post Claims Claude Opus 4.6 Was ‘Nerfed,’ Critics Call It Bad Science

A viral X post claimed Claude Opus 4.6 hallucinations surged 98%. Critics found the comparison used different test sizes, not equal benchmarks. Same-task analysis shows minimal change, within normal AI variability.

BridgeMind AI claimed Anthropic’s Claude Opus 4.6 was secretly degraded after a hallucination benchmark retest. The viral post has since drawn sharp criticism for flawed methodology. The claim triggered widespread debate over whether AI companies are quietly downgrading paid models to reduce costs. BridgeMind Claims a 98% Surge in Hallucinations BridgeMind, the team behind the BridgeBench coding benchmark, posted that Claude Opus 4.6 had fallen from second to tenth place on its hallucination leaderboard. Accuracy reportedly dropped from 83.3% to 68.3%. “CLAUDE OPUS 4.6 IS NERFED. BridgeBench just proved it. Last week Claude Opus 4.6 ranked #2 on the Hallucination benchmark with an accuracy of 83.3%. Today Claude Opus 4.6 was retested and it fell to #10 on the leaderboard with an accuracy of only 68.3%,” they wrote. The post framed this as proof of “reduced reasoning levels.” However, a closer look at the underlying data tells a different story.

Source: BeinCrypto