Study across 18 countries finds systemic attribution failures -- and users blame both the AI and the outlet it cites
AI assistants are gaining ground as a gateway to the news while failing the basics of accuracy and attribution. A BBC-coordinated study with the European Broadcasting Union found that 45% of answers about current events contained at least one significant problem, and Google's Gemini showed serious sourcing flaws in 72% of cases. The audit examined more than 3,000 answers from ChatGPT, Copilot, Gemini, and Perplexity across 14 languages and 18 countries, with journalists at 22 public-service media organizations reviewing each response for accuracy, sourcing, and context.
The errors aren't edge cases. Nearly a third of all answers had broken or misleading attribution, and one in five contained factual mistakes or outdated information. Gemini recorded significant issues in 76% of responses -- more than double the rate of other assistants -- driven largely by faulty sourcing. Reviewers also flagged a pattern of "ceremonial citations": references that look rigorous but don't actually support the claims when checked. It looks thorough. It isn't.
This is the largest, most multilingual check of AI-news behavior to date. Earlier BBC work in February focused on English-language markets; the new study applied common prompts across Europe and North America, plus Ukraine and Georgia, and used native-language evaluators. The failures traveled with the language and market. That matters.
The audience is already there. According to separate survey work released alongside the audit, 7% of all online news consumers -- and 15% of under-25s -- now use AI assistants to get news summaries. Adoption is outpacing reliability. That's the tension.
When AI misreads or misquotes a story, users blame the assistant and the news brand it cites. The newsroom did the reporting, the assistant scrambled it, and the byline eats the fallout anyway. That is a perverse incentive for publishers trying to maintain trust.
It's also expensive to fix. Newsrooms cannot monitor or correct every AI summary in the wild, and "please verify" labels don't undo a confident, wrong answer. The AI now sits between reporting and reader, and it adds friction where journalism needs clarity. The brand loses twice.
The gap on attribution is stark. Gemini showed significant sourcing problems in 72% of tested answers; ChatGPT, Copilot, and Perplexity stayed below 25%. That suggests divergent approaches to citation construction and fallback behavior under uncertainty.
The factual errors are not subtle. Examples in the study include false claims about surrogacy law in Czechia and incorrect summaries of changes to UK disposable vape rules. Confident tone made the answers more misleading, not less. Tone isn't truth.
Good attribution lets readers verify claims and weigh credibility. AI assistants often supply citations that look legitimate but, on inspection, don't support the text or never said what's asserted. Some point to real sources that don't contain the cited fact. Others wave at "reports" that can't be traced.
This is worse than no citation. A missing reference signals uncertainty; a spurious reference manufactures confidence. It also wastes reader time and makes independent verification harder. That's a trust drain, not a trust bridge.
Model builders acknowledge hallucinations as a known failure mode and say they are working on it. But "working on it" doesn't match the current adoption curve. Millions already use assistants for news, especially younger audiences, and they often assume the summaries are accurate.
The study's recommendation is pragmatic: improve response scaffolding for news questions, fix sourcing logic, and publish regular quality results by market and language. It also calls for ongoing independent monitoring and stronger media-literacy cues inside assistants. Progress will show up in measurements, not demos. Ship the drop in error rates.
Trust falls faster than it is rebuilt. Assistants earn credibility through convenience and repetition; a handful of high-visibility mistakes can sour that trust for both the tool and the news brands it cites. Once users feel they've been misled, they don't carefully apportion blame. They bounce.
Three signals to watch: do sourcing failures fall meaningfully in follow-up audits; do assistants add conspicuous verification prompts for hard news; and do regulators treat AI misattribution as an information-integrity problem, not just a product bug. If those needles don't move, the audience will. Count on it.