АУП > News > #8 The paradox of sensitivity and censorship in generating models

#8 The paradox of sensitivity and censorship in generating models

03.02.2025

Why can a virtual person kill a human but not an animal?

Algorithms without prejudice
Silence is louder than shouting
Transparency, decentralisation, and explainability of AI

Algorithms without prejudice

Large Language Models (LLMs) are capable of creating amazing images and texts that reveal a strange paradox. They are trained to avoid explicitly showing nudity or explicitly harmful content. However, a skilful user can sometimes circumvent the restrictions by pushing the model to create a scene of violence between people. This is done with the help of special prompts and «ethical» explanations that mask the real nature of the request. As a result, the model, considering the task to be abstract, can generate a picture of a person killing a person.

However, there is an interesting twist: if you try to force the model to generate a similar scene of animal abuse, you will encounter a refusal. The reason is not only technical limitations, but also the ethical code embedded in the model's architecture. Legislative norms protecting animals and woven into the training data form a kind of taboo: the model refuses to fulfil such requests even with complex, confusing prompts.

Take the ChatGPT o1 model, an extremely useful tool that detects the Nazi salute in both archival and contemporary photos with encyclopaedic accuracy. But if you show it a well-known photo of Elon Musk with a similar gesture, it will not see any hint of Nazism. You can safely sign the ChatGPT model o1 - «And silence will be his response».

We can also recall the scandal surrounding the removal, and then return after publicity, of the name of Eli Milchan, an active supporter of the military use of AI. The language models refused to generate answers if his name was mentioned in the prompt.

Silence is louder than shouting

This is not a simple technical error, but a manifestation of «learned caution», i.e. censorship on the part of developers. They are afraid of confusing contexts where the difference between a symbol and its interpretation is very thin, and the result can lead to huge lawsuits, loss of important contracts, and fear of deterring investors from funding LLM projects.

A similar paradox can be observed in DeepSeek R1, a Chinese model capable of building complex causal chains. If you ask it about the events in Tiananmen Square or the status of Taiwan, it will give vague answers and omit important details, reflecting the official narratives of the PRC. The model walks a fine line between data and prohibitions, at best leaving the user in information uncertainty, and at worst quoting the state position as historical truth.

But the tricks of censorship are not limited to simply replacing undesirable facts. LLMs are trained to «keep quiet» about specific names, erasing them from the collective memory. Sometimes these are serial killers or rapists whose names become taboo, creating the illusion that the crimes never happened. Even more surprisingly, the name of the mayor of an Australian city who obtained a court injunction against mentioning himself is also excluded from such models. This calls into question transparency and the public's right to know about public figures and their actions, regardless of current legal interpretations.

As an example, all current models (both closed and open) refer to the war in Ukraine as a «war» rather than abbreviations that even russians do not understand. However, this will continue until russia releases its own models or substitutes the necessary information from global LLM databases.

The situations described above highlight two important ideas:

The flexibility and vulnerability of the LLM. Models can circumvent explicit prohibitions if given a «justifiable» explanation. This causes concern, as such approaches can be used for harmful purposes.
Built-in censorship. The norms of society and legislation, in particular those related to animal protection or public policy, are not just written in some documents – they are embedded in the «core» of the model. Thus, censorship becomes an integral part of its work.

After all, it is not just a question of data removal, but of reprogramming the collective memory. Imagine LLM as a huge digital cauldron where facts, events, and personalities are fed into. In the process of special filtering and censorship, they can be distorted, and a ‘convenient’ version of history appears that pleases the authorities or meets the interests of model developers.

Transparency, decentralisation, and explainability of AI

Censorship in language models can be overcome not only by bans or technical tricks, but also by radically changing the architecture and interaction with society. Transparency, decentralisation and explanability are three key conditions on which the responsible development of LLMs, free from censorship restrictions, should be based.

1. Transparent code and open data

The main problem lies in the «black box» of algorithms. Closed code and inaccessible training data raise suspicions and facilitate manipulation. The solution is to make algorithms open.

Open source LLM: The publication of the code allows researchers, ethicists and programmers to check the model for biases, possible ways of censorship and optimise mechanisms for fair moderation.
Access to training data: Providing anonymised and organised datasets will help to identify historical distortions or deliberate exclusion of information. The public will be able to check whether the model reflects a complete picture of the world.
Change registers: Logging all changes and additions to the training sets will help track any attempts at censorship. 90% (and perhaps all 99%) of users do not pay attention to the version of the model they are working with (it is just a series of numbers, which is often not even openly displayed).

An interesting case: the success of DeepSeek R1 is partly due to the use of open source tools from OpenAI, which were intended to improve ChatGPT for different needs. Now, OpenAI is trying to prohibit DeepSeek R1 models from using these algorithms, as they were intended to develop OpenAI models, not competitive products. But the genie is already out of the bottle.

2. Decentralising the monopoly of knowledge

When LLM management is concentrated in a few large corporations, the risk of unilateral censorship increases. A distributed architecture can help solve this problem:

Federated training: Instead of centralised processing on a single server, the training process can be shared between multiple universities, research centres and community initiatives, with each contributing their own data and insights.
Open APIs and tools: The use of open interfaces for LLMs allows developers to create specialised models for specific fields and languages, reducing the risk of monopoly.
Community models: Supporting non-profit and research teams to develop their own LLMs will promote diversity of approaches and reduce dependence on commercial interests.

3. Explainability: from «black box» to transparent algorithms

LLMs often work like «oracles», producing results without explaining the logic behind them. To reduce distrust and spot censorship, we all need explanations:

Visualisation methods: Special tools that show how the model processes information, what word relationships it bases its conclusion on, and what factors influence the answer.
«White boxes»: Models with transparent decision-making logic where the user can easily understand how answers are generated and where biases may occur.
On-demand explanations: Mechanisms that provide a step-by-step analysis of the model's reasoning and point out possible sources of error at the user's request.

Implementing these principles is not just a technical matter. It requires the joint work of developers, researchers, human rights activists, government agencies and society. Only in this way will we be able to turn LLMs from potential «tools of censorship» into «tools of truthful and unbiased information».

Summary

The text examines the paradox of censorship and the «sensitive» behaviour of large language models (LLMs) that can generate scenes of human violence but block content about animal cruelty. The author provides examples from ChatGPT and DeepSeek R1, demonstrating how constraints reflecting ethical norms or state narratives can be built into models. It also describes how models can «forget» certain names or events, effectively rewriting history. It concludes by proposing three principles for overcoming censorship: transparency (open source, open data), decentralisation and explainability.

#AUPtrends #LLM

#8 The paradox of sensitivity and censorship in generating models

Categories

Recent Posts