Nov 24, 2023
The Reversal Curse
An explanation of why LLMs can learn A equals B but fail to answer the reverse relation B equals A.
Recently, a surprising property of LLMs came to light. The reversal curse reveals that LLMs' strong generalization abilities might not be as good as we believed. The authors of the paper "The Reversal Curse: LLMs trained on A=B fail to learn B=A" demonstrated that these models work well in one specific "direction" but struggle with the reverse. For instance, if a model is trained on "Olaf Scholz was the ninth Chancellor of Germany," it fails when asked, "Who was the ninth Chancellor of Germany?", and the likelihood of the correct answer ("Olaf Scholz") will not be higher than for a random name.
Which Models Are Affected?
The issue affects all transformer-based auto-regressive language models, such as GPT and Llama architecture. Even ChatGPT (3.5 and 4.0) had difficulty answering reversed questions about celebrities:
- Question: "Who is Tom Cruise's mother?" [A: Mary Lee Pfeiffer] -> 79% accuracy
- Reversed question: "Who is Mary Lee Pfeiffer's son?" [A: Tom Cruise] -> 33% accuracy
This shows that these models struggle to make logical deductions when questions are reversed.
Why Does This Happen?
Authors leave this question for future research, suspecting it relates to how the model's weights are updated, calling the gradient "myopic."
My take: We train all LLMs the same way, always predicting the next token (subword), following a one-way process. So the model is trained to write words in a specific direction. Similarly, during the inference, the model is predicting a token by token, based on the prior context. It knows what words should follow "Tom Cruise" and "mother," but struggles after "Mary Lee Pfeiffer" since it never learned to predict tokens after that sequence. It learns one way only.
How Does It Affect Us?
- LLM Users: Fortunately, when the context is provided in the prompt, these models still can answer reversed questions. It shows the advantage of supporting LLMs with knowledge databases, adding information to the prompt, and expecting the model to extract information rather than relying solely on internal training knowledge.
- Engineers Who Train LLMs: We should consider this phenomenon when preparing training datasets. Providing "directed" instructions in datasets might lead models to struggle when it has to answer in a different direction. Personally, I see a potential in using data augmentation methods designed specifically to tackle this issue (e.g., "Reverse augmentation," that would revert Q&A pairs, what do you think?).
- Scientists: Deeper research is needed to understand how LLMs generalize and why this phenomenon occurs. It also reminds us that transformer-based LLMs aren't flawless.
To sum up, the reversal curse changes how we see generalization properties of LLMs. We should take it into consideration when expecting that ChatGPT will give us answer using its internal knowledge.