AI Reasoning Wars, Anthropic's $30B, and the Security Paradox

· # AI 뉴스
Gemini DeepSeek Anthropic AI 보안 OpenClaw

Google unveiled a new model that breaks reasoning benchmark ceilings, while China’s open-source giant DeepSeek announced the impending release of a trillion-parameter monster. Anthropic closed its largest-ever $30 billion funding round at a $380 billion valuation. Meanwhile, news that AI autonomously found 12 zero-day vulnerabilities in OpenSSL — the foundation of internet security — shook the security community, while AI agent platform OpenClaw’s security flaws triggered usage restrictions at big tech companies including Meta.

Model wars, funding bombs, security warnings. Let’s break them down one by one.

New Landscape of Reasoning Wars: Gemini 3.1 Pro and DeepSeek V4

Gemini 3.1 Pro: Breaking the Benchmark Ceiling

On February 19, Google DeepMind unveiled Gemini 3.1 Pro. The core was a leap in reasoning ability. On ARC-AGI-2, the benchmark measuring AI’s logical thinking, it scored 77.1% — more than double its predecessor Gemini 3 Pro’s 31.1%. Compared to Claude Opus 4.6’s 68.8% and GPT-5.2’s 52.9% on the same benchmark, Google clearly secured first place in the reasoning competition.

ARC-AGI-2 requires models to solve new logical patterns never seen before. It’s not about simple memorization or pattern matching, but measuring genuine ‘thinking ability.’ The 77.1% score means AI has taken another step closer to human-level abstract reasoning.

It also topped the industry on GPQA Diamond benchmark for graduate-level science problems with 94.3%. According to Ars Technica’s analysis, this suggests qualitative change in actual complex problem-solving ability beyond mere benchmark numbers.

Pricing was also notable. API rates of $2 input, $12 output (per million tokens) with 1 million token context window. Immediate deployment across Google’s ecosystem including Google AI Studio, Vertex AI, Gemini app, and NotebookLM. Google’s strategy of simultaneously deploying its top-performance model across its entire platform shows this war isn’t just about benchmarks but ecosystems.

DeepSeek V4: The Trillion-Parameter Challenge

The same day, China’s DeepSeek also announced V4’s imminent release. Total 1 trillion (1T) parameters with 32B active parameters — pushing MoE (Mixture of Experts) architecture to its limits, like taking just one needed book from a massive library.

Most notable was the conditional memory system dubbed “Engram”. While existing models store all knowledge mixed in neural network weights, Engram retrieves needed knowledge instantly with hash-based O(1) search. Like the brain’s long-term memory, frequently used knowledge loads quickly while complex reasoning uses separate pathways. Optimal allocation was reportedly dynamic reasoning 75%, static search 25% sparse capacity.

Dynamic Sparse Attention was also newly introduced. While traditional transformer attention mechanisms examine all tokens, this approach dynamically selects tokens to focus on based on context. This enables much more efficient utilization of the 1M context window.

DeepSeek V4 particularly specialized in coding. Multi-file refactoring, sandbox code execution, and other functions developers need for actual work were supported at the architecture level. According to leaked January information, aggressive RAM offloading should enable running on high-spec workstations.

If Gemini 3.1 Pro claimed the benchmark throne, DeepSeek V4 aims to change the game rules through architectural innovation. While their approaches differ, their message is identical — AI reasoning evolution is accelerating.

Money Flows: The Meaning of Anthropic’s $30B

On February 12, Anthropic raised $30 billion in Series G at a $380 billion valuation. This valuation more than doubled from the previous round in September 2025. GIC and Coatue co-led, with D.E. Shaw Ventures, Founders Fund, ICONIQ and others participating.

According to Reuters, Amazon invested $8 billion in this round, while Google held a 14% stake. The fact that two cloud giants simultaneously bet big on one AI startup speaks volumes about this industry’s strategic importance.

Interestingly, there was a Super Bowl ad effect. Data showed Anthropic’s users increased 11% after running Super Bowl ads. An AI startup advertising during the Super Bowl itself symbolizes the changing times. AI is transitioning from developer tools to mass consumer products.

Putting the $380 billion valuation in context, this exceeds half of South Korea’s Samsung Electronics market cap. A company founded in 2021 reaching this level in 5 years shows the scale of capital flowing into AI is unlike any historical precedent. While gaps with OpenAI remain, they’re rapidly narrowing.

AI Protecting AI: The Shock of 12 OpenSSL Zero-Days

On February 19, a blog post shared by security expert Bruce Schneier turned the security community upside down. AI security research company AISLE’s AI system had independently discovered 12 zero-day vulnerabilities (security holes unknown even to developers) in OpenSSL.

OpenSSL is the foundation of internet security. Website HTTPS encryption, email security, VPNs — almost all internet secure communications depend on this library. This codebase had undergone millions of CPU-hours of fuzzing (technique finding bugs by feeding random data) and multiple comprehensive audits by top security teams including Google over decades.

Yet AI found 12 zero-days at once. Among them, CVE-2025-15467 was a stack buffer overflow in CMS message parsing enabling remote code execution without valid keys. NIST’s CVSS v3 score was 9.8 out of 10 — “CRITICAL” rating. Such ratings are extremely rare for projects like OpenSSL.

More surprising was the timing. Three of the discovered vulnerabilities were in code written 1998-2000. AI caught bugs that had evaded human experts and automated tools for over 25 years. One inherited from Eric Young’s SSLeay implementation, OpenSSL’s predecessor — older than OpenSSL itself.

AISLE’s AI directly proposed patches for 5 of the 12, and those patches were adopted in official releases. After all fixes were completed on January 27, announcement followed responsible disclosure procedures. Of 14 CVEs assigned to OpenSSL in 2025, 13 were discovered by this AI system.

This was the moment AI security detection capability moved from theory to practice. But this ability is a double-edged sword. If defenders can use it, so can attackers. As Schneier said, AI vulnerability discovery will “be used for both attack and defense”.

AI Threatening AI: OpenClaw Security Controversy

The same February 19 brought news showing AI agents’ dark side. Security firm Sophos issued a “critical triple threat” warning about OpenClaw, while Ars Technica reported big tech companies including Meta began restricting OpenClaw usage.

OpenClaw is an AI agent platform running directly on user computers. It can read files, execute code, call external APIs, and even control browsers. While explosively popular among developers, Sophos identified fundamental problems.

First, personal data access — OpenClaw runs on local devices so can access all files and account permissions on that device. Second, external communications — agents can freely communicate with external servers over the internet. Third, untrusted content processing — agents directly read and process content from web pages, emails, etc.

These three combined enable prompt injection (attacks injecting malicious commands into AI) to hijack the agent’s entire privilege set. For example, the moment AI reads a maliciously crafted webpage, hidden commands could take control of AI behavior to exfiltrate user files or install malware on the system.

According to WIRED’s coverage, Meta completely banned OpenClaw from internal networks. However, some companies chose establishing security guidelines and preparing security enhancements within 60 days rather than complete blocks. OpenClaw’s potential was too attractive. One official said “It could be the future. That’s why we’re building for it”.

This incident reveals the essential dilemma of the agent AI era. AI needs permissions to be useful, and more permissions create bigger security risks. The tug-of-war between usefulness and safety will be a core challenge for the AI industry in 2026.

Personal Thoughts

Reasoning performance, capital, security — these three axes moving simultaneously seems to be the essence of today’s AI industry.

Gemini 3.1 Pro’s reasoning jump was honestly beyond expectations. DeepSeek V4’s Engram architecture also shows the competition axis shifting from “bigger models” to “smarter structures.”

Anthropic’s $30 billion investment confirms again that capital flowing into AI startups has exceeded common sense. A 5-year-old company at $380 billion valuation.

The OpenSSL incident shows AI’s power when used for defense, but the same technology can be used for attacks. OpenClaw’s security controversy is similar — AI agents need our permissions to work for us, but when breached, we’re in danger. The convenience-security tug-of-war will continue.

← LLM Compression Techniques Deep Dive — Quantization, Pruning, and Distillation Qwen 3.5 Complete Guide — Specs, Benchmarks, VRAM, and Usage →