The Dumbest Prompting Trick That Actually Works
Google Research found a technique that wins 47 out of 70 AI benchmarks - with zero losses and zero downsides. It's embarrassingly simple.
I keep seeing people on X sharing increasingly complex prompting techniques. Chain-of-thought. Tree-of-thought. Mega-prompts with 47 carefully crafted instructions.
Meanwhile, Google researchers just published a paper on prompt repetition - a technique so simple it feels like a bug in the matrix.
47 wins. 0 losses. 23 ties.
Across GPT-4o, Claude, Gemini, and DeepSeek. Across 7 different benchmarks. Statistically significant improvements with literally no downsides.
The technique? Repeat your entire prompt twice before sending it. That’s it.
Why Do LLMs Miss Context In Your Prompts?
LLMs process your prompt left-to-right. Early tokens can’t see later tokens. This is baked into how transformers work—they’re trained as causal language models, meaning past tokens can’t attend to future tokens. The first sentence of your prompt gets analyzed before the model even knows what question you’re asking at the end.
Think about it: you’re giving someone instructions while they’re already walking away. They hear the beginning, but they’re halfway down the hall before you finish.
Google’s research team looked at this limitation and asked a simple question: what if we let every token see every other token?
Their answer won 47 out of 70 benchmarks.
How Effective Is Prompt Repetition?
Google tested this across every major model:
- Gemini 2.0 Flash & Flash Lite
- GPT-4o & GPT-4o-mini
- Claude 3 Haiku & Claude 3.7 Sonnet
- Deepseek V3
Seven benchmarks: ARC, OpenBookQA, GSM8K, MMLU-Pro, MATH, and two custom tests.
The results:
- 47 statistically significant wins
- 0 losses (never performed worse than baseline)
- 23 ties
On some tasks, the improvement was absurd. Gemini 2.0 Flash-Lite jumped from 21.33% to 97.33% accuracy on one benchmark—a 4.5x improvement.
And here’s the kicker:
No latency increase. Your response comes back just as fast.
No longer outputs. The model doesn’t get verbose.
No format changes. JSON, structured outputs—everything stays consistent.
It’s the rare technique that’s all upside, no downside.
So What Is It?
Alright, let’s dive in.
The technique exploits how attention works in transformer models. By restructuring your prompt in a specific way, you let every token “see” every other token before the model starts generating.
It’s not a complex framework. It’s not prompt engineering wizardry. When I first read the paper, I thought there had to be a catch.
There isn’t.
I’ll show you:
- The exact technique (it takes 5 seconds to implement)
- Why it works (the transformer attention explanation)
- When to use it vs. skip it (there’s one scenario where it doesn’t help)
- Copy-paste examples you can test right now
Before we dive in, here’s another post from me about prompting that you can build on top of:
What Is Prompt Repetition?
Let’s get back into it.
Ready for it?
You repeat your prompt. Twice.
That’s it.
The technique transforms your prompt from:
<QUERY>To:
<QUERY><QUERY>You literally copy-paste your entire prompt and send it twice in a row.
I told you it was embarrassingly simple.
Why Does Repeating Your Prompt Improve LLM Results?
When you repeat the prompt, each token in the second copy can attend to every token in the first copy. The model essentially gets a “preview” of the full request before generating its response. This bypasses the causal limitation where past tokens can’t see future tokens - because now the “future” tokens from your first copy become “past” tokens for your second copy.
The Google team calls this enabling each prompt token to “attend to every other prompt token.” It’s a simple fix to a fundamental architectural constraint.





