GPT-5.4 Is Coming: 1M Token Context and Hours-Long Extreme Reasoning Mode

2026-03-05 · # AI 뉴스

OpenAI signaled yet another rapid release cycle. On March 3, 2026 — the very day it shipped GPT-5.3 Instant — the official OpenAI account posted a brief message on X (formerly Twitter): “5.4 sooner than you think.”¹ A few days later, The Information, the U.S.-based paid technology outlet, published an exclusive citing people familiar with the matter, revealing the key specifications of GPT-5.4.²

What the Report Said: Three Core Points

The Information’s reporting boiled down to three main things about GPT-5.4.

First, the context window will expand to 1 million (1M) tokens — more than double that of the current GPT-5.2.
Second, a brand-new Extreme Reasoning Mode will be introduced, designed to draw on far more compute and reason continuously for hours at a time.
Third, reliability for long-horizon multi-step workflows will be significantly improved.

Context Window: From 400K to 1M

The current flagship, GPT-5.2, supports a context window of 400,000 tokens.³ That figure covers both input and output — with a maximum output of 128K tokens, the effective input capacity works out to roughly 272K tokens. If GPT-5.4 hits 1M tokens, that’s a 2.5× increase.

On the surface, this might look like a straightforward spec race. But context length is one of the most consequential variables in how an AI model can actually be used. One million tokens translates to roughly 750,000 words in English, or around 500,000–600,000 words in Korean. That’s the equivalent of three or four full-length novels, or an entire mid-sized codebase, loaded in a single request.

Until now, the 1M token territory was largely claimed by Google’s Gemini 1.5 Pro and Anthropic’s Claude 3 series.⁴ In fact, GPT-4.1 had supported 1M tokens, but the transition to the GPT-5 family left OpenAI trailing its competitors on context for a while. GPT-5.4 was shaping up to close that gap.

Extreme Reasoning Mode: How Is It Different from the o-Series?

If your first reaction to “reasoning for hours” was to wonder how this differed from the existing o-series (o1, o3, etc.), that was a fair question. OpenAI already ran high-performance reasoning models like o3, so what exactly would GPT-5.4’s Extreme Reasoning Mode add?

The o-series was designed to boost reasoning quality by extending internal “thinking time.” GPT-5.2 already offered this via a reasoning.effort parameter with four levels — low, medium, high, and xhigh.⁵ The limitation, though, was structural: the model was focused on responding to a single prompt, and if reasoning time ballooned during a complex agentic task, the full context could saturate quickly.

Extreme Reasoning Mode pushed further than that. The Decoder described it as a feature “designed to use significantly more compute on tough questions, intended for researchers rather than everyday users.”⁶ The real distinction wasn’t just depth of reasoning — it was sustained task continuity. Where the o-series was built to dig deep into a single problem, Extreme Reasoning Mode was aimed at executing continuous, multi-step tasks over hours without losing the thread.

The table below compares GPT-5.2’s existing reasoning modes against GPT-5.4’s Extreme Reasoning Mode.

	GPT-5.2 Thinking (xhigh)	GPT-5.4 Extreme Reasoning Mode
Primary purpose	Maximize reasoning quality for a single query	Sustain multi-step tasks over long durations
Expected runtime	Within minutes	Hours
Target users	General users + developers	Researchers, engineers
Context integration	Within 400K limit	Can leverage full 1M token context
Compute cost	High	Extremely high (estimated)

Long-Horizon Workflows and Codex: What Actually Changes

GPT-5.4 wasn’t just about raw numbers and a flashy marketing mode. Behind it lay OpenAI’s broader agentic strategy.

Since the second half of 2025, OpenAI had been pushing Codex — its coding agent — to the forefront of its agentic AI offerings. GPT-5.2 Codex introduced context compaction techniques to handle large codebases within a 400K window,⁷ but the fundamental context ceiling remained. Truly massive repositories, or multi-day planning tasks handled in one continuous session, were still out of reach.

Combining a 1M token context with Codex would eliminate that bottleneck. A scenario where the model ingests an entire codebase of tens of thousands of lines in a single pass — then handles refactoring, migration, and test writing as one continuous workflow — was becoming a real possibility. That was exactly why The Information noted that “GPT-5.4 will be particularly important for programming agents like Codex.”

For enterprises, this carried significant weight as well. Analyzing hundreds of contracts, dozens of quarterly reports, or an entire body of internal policy documents in a single query — without additional chunking — was now within reach. That wasn’t just a performance upgrade; it was a shift in how AI-powered architectures could be designed from the ground up.

Faster Release Cycles: A Strategy for Managing Expectations

There was an interesting backdrop to all of this. The Decoder reported that OpenAI was deliberately accelerating its release cadence. When GPT-5 launched in the summer of 2025, expectations had run too high — and when the model landed, some users were disappointed. ChatGPT’s user growth reportedly fell short of internal targets.⁸

In response, OpenAI moved away from concentrating hype around a single big announcement and shifted toward frequent incremental updates. From GPT-5.1 to GPT-5.2, GPT-5.3 Instant, and the upcoming GPT-5.4, several releases had been rolled out over just a few months since late 2025.

For users, this meant a steady stream of improvements. For developers and enterprises, it also meant repeatedly cycling through API version management and prompt re-optimization. To ease the transition burden, OpenAI announced that GPT-5.2 Instant would remain available in the paid model selector for three months after the GPT-5.3 Instant rollout.

OpenAI’s Hints and Community Reactions

OpenAI had dropped the “sooner than you think” line but hadn’t committed to a specific launch date. The community was already speculating about a release within the week. A thread on r/singularity titled “There’s a good chance GPT-5.4 will release this week” drew substantial attention.⁹

The market interest was unsurprising. A 1M token context window addressed the gap with competitors while pushing agentic use cases to a new level. And if Extreme Reasoning Mode could genuinely sustain researcher-grade complex reasoning stably over hours, that would be more than a numerical upgrade — it would be a qualitative leap in what AI models were capable of.

Questions Still Unanswered

Several things remained unconfirmed at this point. Pricing for Extreme Reasoning Mode hadn’t been disclosed. Hours of continuous reasoning would consume a massive number of tokens, meaning the cost structure would heavily determine how widely the feature could actually be used. It also wasn’t yet clear whether GPT-5.4 had improved general reasoning performance relative to GPT-5.2 — that would have to wait for benchmarks.

The moment the launch announcement came, the industry would dive back into another round of performance comparisons. Until then, “sooner than you think” remained the only promise on the table.

OpenAI. (2026, March 3). “5.4 sooner than you Think.” [Tweet]. X. https://x.com/OpenAI/status/2028909019977703752 ↩
The Information. (2026, March 4). OpenAI’s Next AI Model Will Have ‘Extreme’ Reasoning. https://www.theinformation.com/newsletters/ai-agenda/openais-next-ai-model-will-extreme-reasoning ↩
OpenAI Developers. (2025). GPT-5.2 Model. OpenAI API Documentation. https://developers.openai.com/api/docs/models/gpt-5.2 ↩
Investing.com via The Information. (2026, March 4). OpenAI to release GPT-5.4 model with expanded context window. https://www.investing.com/news/economy-news/openai-to-release-gpt54-model-with-expanded-context-window—the-information-93CH-4541516 ↩
OpenAI. (2025). Introducing GPT-5.2. https://openai.com/index/introducing-gpt-5-2/ ↩
The Decoder. (2026, March 4). GPT-5.4 reportedly brings a million-token context window and an extreme reasoning mode. https://the-decoder.com/gpt-5-4-reportedly-brings-a-million-token-context-window-and-an-extreme-reasoning-mode/ ↩
OpenAI. (2025). Introducing GPT-5.2-Codex. https://openai.com/index/introducing-gpt-5-2-codex/ ↩
The Decoder. (2026, March 4). GPT-5.4 reportedly brings a million-token context window and an extreme reasoning mode. https://the-decoder.com/gpt-5-4-reportedly-brings-a-million-token-context-window-and-an-extreme-reasoning-mode/ ↩
r/singularity. (2026, March 4). There’s a good chance GPT-5.4 will release this week. Reddit. https://www.reddit.com/r/singularity/comments/1rjycke/theres_a_good_chance_gpt54_will_release_this_week/ ↩