Dec 17, 2023

How Susceptible Are LLMs to Persuasive Misinformation?

A study of how persuasive misinformation can change LLM responses, from rejection and uncertainty to acceptance.

A recent study, "The Earth is Flat because...", provides an in-depth analysis of how LLMs respond to persuasive misinformation. The researchers created the Fact to Misinform (Farm) dataset, pairing straightforward factual questions with carefully crafted persuasive misinformation. Their extensive experiments revealed that LLMs, including state-of-the-art models like GPT-4, can have their correct beliefs on factual knowledge easily manipulated by various persuasive strategies.

LLM Response Behaviors

The study tests LLMs in a multi-turn conversation setting, observing how they respond to different persuasive strategies. The authors identified five types of behaviors in LLMs when faced with misinformation: rejection, sycophancy, uncertainty, acceptance, and self-inconsistency. Sycophancy and uncertainty often serve as interim stages before LLMs ultimately accept misinformation.

Rejection is disagreeing with the misinformation, maintaining its stance based on accurate information.
Sycophancy is aligning with the user's perspective or misinformation to maintain conversational harmony, but without changing its belief.
Uncertainty is expressing doubt, showing a lack of confidence in its response or the information presented.
Acceptance is accepting the misinformation and using it in responses, presenting a change in its stance.
Self-Inconsistency is presenting inconsistent responses e.g. agreeing with the misinformation in one instance and refuting it in another.

Findings

Even the most advanced LLMs showed a notable susceptibility to misinformation. Authors showed that the best performing LLM (GPT-4), had a 20.5% susceptibility to misinformation. Conversely, Llama-2-7B-chat emerges as the most susceptible model in their experiments, with an average susceptibility at the level of 78.1%. Key findings included:

Significant belief alteration in LLMs due to persuasive tactics.
Advanced LLMs like GPT-4 displayed higher resistance but were not immune.
Repetition and rhetorical appeals were effective in altering LLM beliefs.

Bug or feature?

The cost of training LLMs is enormous. We can't expect to retrain those models daily so the most common technique to keep the current knowledge without training the models is RAG. When using RAG, we expect that the model will use the added context and ignore the inner knowledge as it might be outdated. Is it then really a bug that the model can be convinced that a truth is different? RAG essentially prioritizes external context over the model's pre-trained knowledge, which in this light, might be more of a strategic design choice than a vulnerability. What are your thoughts on that?

← AI explained