The Relativity of "Poisoning" in Large Language Models: Defining Intended Audiences
In the rapidly evolving world of artificial intelligence, large language models (LLMs) have become a focal point of discussion, particularly when it comes to the concept of "poisoning." However, I propose that the idea of "poisoning" is relative and that we need to start defining LLMs with their intended audience in mind to better understand and contextualize this phenomenon.
Consider a US-focused LLM that incorporates both pro- and anti-Trump material to represent the diverse perspectives within the country. This approach may be seen as balanced and representative by some, but if we narrow the focus to members of the Democratic National Committee (DNC), they might perceive the inclusion of pro-Trump material as "poisoning" the model with views they find objectionable. Conversely, a Republican National Committee (RNC) focused LLM might face similar criticism from the opposite side.
Expanding this concept to the international level further highlights the relativity of "poisoning." An LLM designed for a global audience may need to include material from various nations, such as Russia’s perspective that Ukraine initiated a conflict. To one side, this inclusion might be seen as essential for a comprehensive view, while to the other, it could be labeled as "poisoning" with propaganda. Similarly, a US-focused LLM, a Chinese-focused LLM, and a US-Chinese-focused LLM would all produce different outputs based on their intended audiences, with each potentially viewing the others' content as biased or "poisoned."
To date, much of the discourse surrounding LLM bias has centered on US race issues—a topic that, while critical, can sometimes overshadow broader applications of bias in AI. The international lens offers a clearer and perhaps more tangible way to examine these dynamics, as geopolitical perspectives are often more explicitly defined and contrasted.
This brings me to a challenge: I suggest we move away from the nebulous term "poisoning" and instead develop a conceptualized framework that explicitly defines what a particular LLM is designed to regurgitate. Is it built for a US-only audience? Does it encompass perspectives from the US, Russia, and China (US+RU+CN)? Perhaps it’s tailored for a US and European Union (US+EU) viewpoint, or narrowly focused on the DNC, RNC, or even an ethical group within the US? By establishing this framework, we create a clearer baseline for evaluating an LLM’s output.
Only after defining an LLM’s intended audience can we meaningfully discuss whether the insertion of, say, Russian influence into a model designed solely to reflect US positions constitutes "poisoning." Without this context, the term loses its utility and becomes a catch-all for anything deemed undesirable by a particular group. A US-only LLM incorporating Russian perspectives might indeed be misaligned with its purpose—but that misalignment isn’t inherently "poisoning" unless we’ve agreed on the model’s scope beforehand.
This shift in perspective could transform how we approach LLM development and criticism. Rather than chasing an elusive "unbiased" ideal—an impossibility given the subjective nature of human data—we can focus on transparency and intentionality. Developers could state upfront, "This LLM is for X audience and reflects Y perspectives," allowing users to judge its outputs against a defined standard rather than an abstract notion of purity.
Of course, it’s Saturday as I write this, and I reserve the right to change my mind by Monday. The beauty of wrestling with complex ideas like these is that they evolve with scrutiny and debate. What do you think—should we redefine how we approach LLMs, or am I overcomplicating a problem that’s already solved?
-J