Mar 24, 2024
Reducing the Reversal Curse?
A follow-up on the reversal curse and how reverse training may help language models handle reversed relations.
Hi, dear Readers. I usually don't address you directly, but given that recent "Explained" papers come from different authors, it would be nice to also introduce myself. This paper was prepared for you by Agnieszka, a Senior AI Engineer@Chaptr. I am, as always, happy to find, read and explain recent papers for you. Feel free to reach out to me on LinkedIn.
Large language models (LLMs) excel in many tasks, but they suffer from a surprising limitation called the "reversal curse." This means they struggle to apply knowledge in the opposite direction of how they learned it. Recently, researchers at FAIR Meta proposed a technique called "Reverse Training" to address this issue. (Reverse Training to Nurse the Reversal Curse)
What is the reversal curse?
A few months ago, I explained the paper "The Reversal Curse: LLMs trained on A=B fail to learn B=A". The authors demonstrated that LLMs work well in one specific "direction" but struggle with the reverse. For instance, if a model is trained on "Olaf Scholz is the ninth Chancellor of Germany," it might fail when asked, "Who was the ninth Chancellor of Germany?". The likelihood of the correct answer ("Olaf Scholz") won't be higher than for a random name. The issue affects all transformer-based auto-regressive language models, such as GPT and Llama architecture.
The strategy
The core idea behind reverse training is basically an offline data augmentation method targeted to reduce the "reversal curse." Training examples are reversed while preserving the integrity of entities or randomly segmenting sequences. This approach doubles the training data by adding reversed versions of examples, which the model learns alongside the original data. The paper categorizes reversals into three types:
- Word Reversal: Flipping the entire word order within a sentence.
- Entity-Preserving Reversal: Preserving the order of entities (like names) while reversing other words.
- Random Segment Reversal: Randomly selected text segments are reversed.
Experiments
In experiments, they evaluated the effectiveness of reverse training in both pre-training and fine-tuning stages across different tasks. The results show that reverse training significantly enhances model performance on tasks requiring reversed application of knowledge, such as reversing biographies or symbolic reverse tasks, without impairing performance on standard tasks. For instance, reverse training improves recall in biography-related tasks. Importantly, it does not impair performance on standard language benchmarks.
Conclusion
The paper concludes that reverse training represents a promising method for overcoming the limitations of current LLMs in processing information in reverse order. Although enhancing the ability to connect celebrities with their parents in reverse doesn't seem like a breakthrough, it might be a great way of strengthening relation understanding in LLMs.
One more thing: Doubling the training data means doubling the training time, which is a very strong limitation of the study. While data augmentation is great, it would be even better if models could understand relations between entities better, from the start.