Chain-of-Draft Prompting: A More Efficient Alternative to Chain of Thought

March 5, 2025 · 6 minute read

Yusuf Ishola· March 5, 2025

With Chain-of-Thought (CoT) prompting, Large Language Models (LLMs) can now solve complex reasoning tasks with impressive accuracy, but often at high computational cost.

Chain-of-Draft Prompting

Chain-of-Draft (CoD) prompting, introduced by researchers at Zoom Communications, provides a solution by encouraging LLMs to generate minimal yet informative reasoning steps—similar to how humans jot down concise notes rather than elaborate explanations.

Let's take a look at this fascinating new method and see how you can start benefitting from it today.

What is Chain-of-Draft Prompting?

Chain-of-Draft is a new prompting technique that aims to reduce token usage while maintaining reasoning quality.

CoD takes inspiration from human problem-solving behavior where we capture only essential information in our notes. Unlike Chain of Thought (CoT) prompting that generates verbose step-by-step reasoning, CoD limits each reasoning step to the absolute minimum—typically 5 words or less.

This approach preserves the reasoning pathway while drastically cutting computational overhead.

Chain-of-Draft vs. Chain-of-Thought Outputs

Let's compare approaches using a simple math problem:

Q: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops did Jason give to Denny?

Chain-of-Thought Process:

Q: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops did Jason give to Denny?
A: Let's think through this step by step:
1. Initially, Jason had 20 lollipops.
2. After giving some to Denny, Jason now has 12 lollipops.
3. To find out how many lollipops Jason gave to Denny, we need to calculate the difference between the initial number of lollipops and the remaining number.
4. We can set up a simple subtraction problem: Initial number of lollipops - Remaining number of lollipops = Lollipops given to Denny
5. Putting in the numbers: 20 - 12 = Lollipops given to Denny
6. Solving the subtraction: 20 - 12 = 8
Therefore, Jason gave 8 lollipops to Denny.

[Response]: 8 lollipops

Chain-of-Draft Process:

Q: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops did Jason give to Denny?
A: 20 - x = 12; x = 20 - 12 = 8. 
 
[Response]: 8

CoD provides the key reasoning steps without unnecessary text.

How to Implement Chain-of-Draft

Implementing CoD requires only a simple prompt modification.

The beauty of CoD is its simplicity—no model changes, fine-tuning, or complex infrastructure required. You simply adjust your system prompt to instruct the model to provide concise reasoning steps.

Here's a basic implementation using the OpenAI API:

import openai

cod_prompt = """Think step by step, but only keep a minimum draft for each thinking step, with 5 words at most. Return the answer at the end of the response after a separator ####."""

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": cod_prompt},
        {"role": "user", "content": "If a train travels at 60 miles per hour for 2.5 hours, how far does it go?"}
    ]
)

print(response.choices[0].message['content'])

The key is instructing the model to limit each thinking step to just a few words.

Performance Benchmarks: Chain-of-Draft vs Chain-of-Thought

CoD delivers remarkable efficiency gains without sacrificing accuracy.

When tested across multiple reasoning tasks, CoD consistently achieves similar accuracy to CoT while using a fraction of the tokens.

On GSM8k (a math reasoning benchmark), CoD achieved 91% accuracy compared to CoT's 95%, but used only 20% of the tokens and reduced latency by up to 76%.

For symbolic reasoning tasks like the coin flip problem, CoD matched CoT's perfect 100% accuracy while using only 14-32% of the tokens.

Model	Task	CoT Accuracy	CoD Accuracy	Token Reduction	Latency Reduction
GPT-4o	GSM8k	95.4%	91.1%	80%	76.2%
Claude 3.5	Sports	93.2%	97.3%	92.4%	72.2%
GPT-4o	Coin Flip	100%	100%	68%	42.9%

These results demonstrate CoD's practical value for production applications.

When to Use Chain-of-Draft

CoD excels in high-volume, cost-sensitive applications.

Consider using Chain-of-Draft when:

✅ You want to utilize reasoning capabilities but token costs are a significant concern
✅ Response latency is critical
✅ You need transparent reasoning but want to minimize overhead
✅ Processing high volumes of inference requests
✅ Working with reasoning-heavy applications like math problem-solving

CoD provides the best balance between reasoning quality and efficiency.

Monitor and Compare CoD Performance with Helicone ⚡️

With Helicone, you can easily track and compare tokens used, response times, and costs between CoT and CoD approaches and drastically slash your LLM costs today.

Limitations to Consider

CoD has specific constraints worth noting.

CoD performs best in few-shot settings (where you seed with examples), with accuracy dropping significantly in zero-shot (no examples) scenarios. This is likely due to the absence of CoD-style reasoning patterns in training data which makes it harder for models to 'understand' the approach.

Smaller models (under 3B parameters) also show a more pronounced performance gap between CoT and CoD. Additionally, tasks requiring extensive contextual understanding may benefit from more verbose reasoning.

Conclusion

Chain-of-Draft presents a practical, immediately applicable approach to more efficient LLM reasoning.

By using the minimalist reasoning approach of CoD, you can achieve comparable accuracy to Chain of Thought while significantly reducing costs and latency. This makes advanced reasoning capabilities more accessible for production applications with strict budgets and performance requirements.

Go ahead and play around with CoD and be sure to use Helicone to measure the improvements!

You might find these useful:

Frequently Asked Questions

What is Chain of Draft (CoD) prompting?

Chain of Draft is a prompting technique that encourages LLMs to generate minimal, concise reasoning steps. It limits each step to about 5 words, reducing token usage, costs, and latency while maintaining reasoning quality.

How does CoD differ from Chain of Thought (CoT)?

CoD produces much more concise reasoning than CoT while maintaining similar accuracy. It removes non-essential details, often using equations or shorthand notation, leading to 68-92% token reduction with minimal accuracy impact.

How do I implement Chain of Draft in my prompts?

You can add a directive such as this to the system prompt: 'Think step by step, but only keep a minimum draft for each thinking step, with 5 words at most.'

What tasks work best with Chain of Draft?

CoD is ideal for structured reasoning tasks like mathematical problem-solving, symbolic reasoning, and logic problems. It performs well on benchmarks like GSM8k and tasks requiring step-by-step logical progressions.

What are CoD's limitations?

CoD struggles in zero-shot settings and with smaller models (under 3B parameters). Highly complex problems that need detailed explanations may still benefit from traditional Chain of Thought approaches.

How much can CoD reduce costs compared to CoT?

CoD can cut token usage by 68-92%. For high-volume applications, this can mean substantial reductions in LLM API expenses while keeping accuracy intact.

How does CoD affect response latency?

CoD speeds up response generation by 40-76% by reducing token output. This improvement benefits real-time applications like chatbots or analytics tools, enhancing user experience and system efficiency.

Can CoD be combined with other prompting techniques?

Yes, CoD integrates well with few-shot learning, self-consistency sampling, and potentially tree-of-thought methods. Experimenting with combined approaches can further optimize performance.

Questions or feedback?

Are the information out of date? Please raise an issue or contact us, we'd love to hear from you!

Join Helicone