2026/04/20/ai-models-marketed-as-uncensored-still-encounter

Study finds “uncensored” AI models still avoid charged words due to safety filtering embedded during pretraining

Apr 20, 2026, 10:43 PM·morgin.ai

EDITOR BRIEF

Pretrain Forensics measured a behavior it calls the flinch: when a model avoids predicting politically or socially charged words even without issuing a refusal. Across seven pretraining models from five labs, the researchers found that supposedly uncensored models can still heavily down-rank sensitive terms compared with open-data baselines.

CONTEXT

The findings suggest that removing refusal behavior after training does not necessarily remove deeper safety biases learned during pretraining. This points to a growing distinction between visible moderation and embedded censorship, which could affect model transparency, auditing, and downstream fine-tuning reliability.

ARTICLE

Even 'uncensored' models can't say what they want

COMMENTS

Discussion

> geekhaus:~$ next read?

TechCrunch

Study finds “uncensored” AI models still avoid charged words due to safety filtering embedded during pretraining

EDITOR BRIEF

CONTEXT

ARTICLE

COMMENTS

Discussion

Everyone is navigating AI security in real time — even Google

Xreal, Google’s smartglasses partner, thinks it has finally mastered this notoriously tricky industry

Australian study finds four-day work week trials maintained or improved productivity across most participating companies

EDITOR BRIEF

CONTEXT

ARTICLE

COMMENTS

Discussion

Next read recommendations

Everyone is navigating AI security in real time &#8212; even Google

Xreal, Google’s smartglasses partner, thinks it has finally mastered this notoriously tricky industry

Australian study finds four-day work week trials maintained or improved productivity across most participating companies

Everyone is navigating AI security in real time — even Google