If you use Claude Code daily, you have probably noticed how quickly token costs add up. The good news? A few simple habits and configuration changes can significantly cut your spending without affecting output quality. Here are 3 proven techniques plus two bonuses that actually work.
1. Keep CLAUDE.md Under 200 Lines
CLAUDE.md is the file Claude Code reads automatically at the start of every session. It stores your project instructions, coding conventions, and context. The problem: the larger this file, the more tokens are consumed on every single turn.
What to do:
- Keep only your core project rules in CLAUDE.md, aim for under 200 lines
- Move specialized workflows such as deploy processes, test commands, and module-specific instructions into separate Skills files
- Claude Code loads these Skills only when relevant, not on every turn
Result: Fewer unnecessary tokens loaded per conversation.
2. Use /clear and /compact Between Unrelated Tasks
Claude Code context window fills up over a long session and old irrelevant conversation history keeps adding to your token bill.
Two commands you should use regularly:
/clearresets the entire context. Use this when one task is done and you are starting something completely unrelated./compactsummarizes the conversation without losing the thread. Use this when continuing on the same project but you want to lighten the history.
Tip: Never run a frontend bug fix and a database migration in the same session without clearing context in between. Carrying both contexts the entire time costs you double.
3. Turn Off Extended Thinking for Simple Tasks
Extended Thinking allows Claude to reason more deeply on complex problems but it costs significantly more tokens.
Simple rule:
- Routine code generation, bug fixes, documentation: Extended Thinking OFF
- Architecture decisions, complex algorithms, tricky multi-step debugging: Extended Thinking ON
Leaving Extended Thinking on by default for every task is one of the easiest ways to overspend on Claude Code.
Bonus 1: Never Break Your Cache
This is a silent cost killer that most developers do not notice.
Claude Code caches your system prompt so the full prompt is not reprocessed on every turn, only new messages are. But if you place any dynamic content at the top of your system prompt such as a current timestamp, session ID, or any variable that changes each time, the cache breaks on every single turn.
That means you are paying full input price on every turn instead of the cheaper cached rate.
Fix:
- Never put dynamic or changing content at the top of your system prompt
- Structure it as static instructions first, dynamic context last
Bonus 2: 1M Token Context Window Is Now Free on Sonnet 4.6
Anthropic made the 1 Million token context window Generally Available in March 2024 and on Claude Sonnet 4.6, there is no surcharge for using it.
However, if you are still on Sonnet 4.5 beta and you cross 1M tokens, you will get hit with a 2x input surcharge.
Action: If you work with large codebases, switch to Sonnet 4.6 now.
Quick Reference
| Technique | Benefit |
|---|---|
| CLAUDE.md under 200 lines | Fewer tokens per session |
| /clear and /compact | Prevent context bloat |
| Control Extended Thinking | Save on simple tasks |
| Static content at top of system prompt | Keep cache intact every turn |
| Use Sonnet 4.6 | 1M context window at no extra cost |
Add these 5 habits to your Claude Code workflow and you will see both lower costs and better performance across every project.
This article is part of AIInsider.in Claude Code optimization series. Hindi version ke liye yahan click karein.

