Feb 18, 2024

Chain-of-Thought: Do LLMs Need Prompts to Think?

A look at CoT decoding, an approach that explores alternative decoding paths to reveal reasoning without explicit chain-of-thought prompts.

The recent paper "Chain-of-Thought Reasoning Without Prompting" by Google DeepMind poses a thought-provoking question: can large language models (LLMs) perform complex reasoning without advanced prompting? After all, prompting has become the go-to technique for teaching LLMs to solve problems step-by-step.

Imagine if LLMs weren't limited to a specific prompting format. Could they surprise us with their problem-solving approaches? The authors of this paper think so. Their new "CoT-decoding" technique uses "thought processes" during the decoding text generation. It suggests that complex reasoning paths aren't reliant on prompts - they're already built-in LLMs, waiting to be used.

Methodology: token branching + weighted aggregation

The most common decoding technique is to predict and choose the next token based on the highest probability (or randomly select one of the high ones). In contrast, CoT-decoding considers the top k most probable tokens. Let's say k equals 5; instead of just using the most likely word, the model explores paths created by the top 5 words at the first decoding step. This branching helps reveal hidden reasoning or calculation chains within the model.

One problem with considering multiple top predictions is that they could contradict each other or lead to inaccurate final answers. The authors overcome this with a weighted aggregation technique. Essentially, they check all the decoding paths and see which final answer appears most frequently across them. They then give that answer a higher "confidence score" which helps improve accuracy and lower the possibility of generating a wrong or hallucinated answer.

Experiments

Authors of the paper tested CoT-decoding on a variety of tasks including mathematical reasoning (think multi-step word problems), natural language reasoning (e.g., determining if a year is even or odd), and symbolic reasoning puzzles. They tested it on pre-trained LLMs of different sizes, as well as models specifically fine-tuned for reasoning (instruction-tuned models) to test if CoT decoding provides improvements across the board.

For all experiments, they used the standard QA format (Q: [question] A:) and asked the model to continue the generation given that prefix. During decoding, they used k = 10 as default for the alternative top-k tokens at the first decoding position.

The results?

CoT decoding delivers higher accuracy for many reasoning tasks than typical greedy decoding. This means the LLMs were already holding those "chains of thought" internally but we couldn't access them without this new decoding method. Even when LLMs have been exposed to examples demonstrating step-by-step reasoning during training, CoT-decoding still improves their performance. This suggests that CoT-decoding may act as a supplement to standard reasoning training techniques.

For highly synthetic tasks (those rarely found in the training data), CoT-decoding struggles because the LLMs simply haven't learned the patterns related to that task. Without a surprise, they observed higher accuracy with larger, more powerful LLMs when using CoT decoding, as larger models usually have better generalizability.

Similar research

Moreover, it's not the first approach to form the implicit reasoning in the model. In November I wrote about "Implicit Chain of Thought Reasoning via Knowledge Distillation" which aimed to enable complex reasoning in LLMs by using the LLM's internal representations rather than decoded text to perform the reasoning. Here, the reasoning happens implicitly "vertically" between layers rather than "horizontally" through decoded text.

I wonder what would become of merging those two approaches?

Conclusions

The paper challenges our assumption that LLMs require constant prompting to tackle complex problems and leaves a promise in understanding how LLMs form their internal logic through exploring alternative decoding paths. So, CoT-decoding isn't just about the final answer but shows the LLM's different approaches to problem-solving. This contrasts with typical prompting techniques that might bias the LLM towards a specific type of reasoning designed by the engineer.

However, exploring multiple decoding paths naturally increases computation. Future research will likely look at optimizations for efficiency. And finally, research showed that this implicit CoT is limited to the training data we use. When a new problem appeared, the results were much lower.

← AI explained