OpenAI Symphony: From Supervising Coding Agents to Managing Work

· # AI News
OpenAI Symphony Coding Agent Orchestration

In August 2025, a three-person team at OpenAI began an experiment under a single constraint: they would not write a single line of application code themselves. Every piece of application logic, every test, every CI configuration, every document, every internal tool — Codex wrote it all. Five months later, the repository had accumulated roughly one million lines of code and 1,500 merged pull requests. The team estimated it would have taken ten times as long if done by hand.1

OpenAI named the methodology that emerged from this experience Harness Engineering. In early March 2026, they released its reference implementation as open source. The project is called Symphony.2

The Core Problem: Agents Without a Manager

Tools like Cursor, GitHub Copilot, and Claude Code are structured around a loop where a developer asks, and an agent responds. The human is always at the center. Agents react. Humans initiate, monitor, and retry.

The limits of this structure emerge at scale. When the number of issues to handle simultaneously grows to five, ten, or more, the developer’s attention becomes the bottleneck. No matter how capable the agent, throughput cannot increase as long as a human must press “go” each time.

This is the problem Symphony set out to solve. Instead of supervising agents, teams manage work.

[!KEY] Symphony’s paradigm shift: “The developer hands work to the agent” → “The issue tracker automatically summons the agent”

What It Does: The Lifecycle of an Implementation Run

Symphony’s core unit of execution is the Implementation Run. It refers to the entire autonomous process an agent handles — from the moment an issue enters a designated state (by default, Todo or In Progress) through to a merged PR or a handoff to human review.

flowchart TD
    A[Linear 이슈<br/>상태: Todo] -->|30초마다 폴링| B[오케스트레이터<br/>적격 이슈 탐지]
    B --> C[워크스페이스<br/>격리 생성]
    C --> D[WORKFLOW.md<br/>프롬프트 렌더링]
    D --> E[Codex App-Server<br/>실행]
    E --> F{결과}
    F -->|성공| G[CI 통과 / PR 생성<br/>Human Review 상태로 전환]
    F -->|실패| H[지수 백오프<br/>재시도 큐]
    F -->|이슈 상태 변경| I[에이전트 중단<br/>워크스페이스 정리]
    G --> J[인간 검토 후 머지]
    H -->|최대 5분 대기| B

Each issue is assigned its own isolated workspace directory. While agent A processes ABC-42, agent B can work on ABC-43 without any filesystem-level conflicts. This isolation is what allows Symphony to run ten agents concurrently by default, and more with configuration.

Proof of Work is another noteworthy feature. An agent’s job does not end with writing code. Before advancing to the next stage, it must produce CI status reports, PR review feedback, complexity analysis, and a walkthrough recording of its changes. Completion is demonstrated through verifiable artifacts — not just a claim that the agent “did something.”

Architecture: Eight Layers

SPEC.md defines eight components in Symphony’s architecture.3

ComponentRole
Workflow LoaderParses WORKFLOW.md; extracts YAML front matter and prompt templates
Config LayerTyped getters; environment variable priority handling
Issue Tracker ClientLinear GraphQL polling; returns a normalized issue model
OrchestratorOwns the polling loop; manages runtime state; decides dispatch, retry, and halt
Workspace ManagerCreates per-issue directories; runs lifecycle hooks; handles cleanup
Agent RunnerSpawns the Codex App-Server process; handles stdio JSON-RPC stream
Status Surface (optional)Phoenix LiveView dashboard; /api/v1/* HTTP API
LoggingRoutes structured logs to stdout, files, or external services

Of these, the Orchestrator stands out for managing all runtime state in a single in-memory map. The state — comprising running, claimed, retry_attempts, and completed — was designed to support restart recovery without a persistent database. Failed agents are automatically placed back in the retry queue after an exponential backoff of up to five minutes.

[!KEY] Symphony is not a full-stack workflow engine. As SPEC.md makes clear, it is a “scheduler/runner/tracker reader.” Ticket state transitions, attaching PR links, and posting comments are all handled by the agent itself through its own tools.

WORKFLOW.md: Versioning Agent Behavior Like Code

In Symphony, the file that captures a team’s strategy is WORKFLOW.md. This single file determines the entirety of how an agent operates.

---
tracker:
  kind: linear
  project_slug: "eng-backend"
  active_states: ["Todo", "In Progress"]
  api_key: $LINEAR_API_KEY
polling:
  interval_ms: 30000
workspace:
  root: ~/symphony-workspaces
  hooks:
    after_create: |
      git clone git@github.com:your-org/your-repo.git .
    before_run: |
      git checkout -b symphony/{{ issue.identifier }}
    after_run: |
      npm test && npm run lint
agent:
  max_concurrent_agents: 10
  max_turns: 20
codex:
  command: codex app-server
  approval_mode: auto-edit
---

You are working on {{ issue.identifier }}: {{ issue.title }}

{{ issue.description }}

## Requirements
- Write clean, well-tested code
- Follow existing conventions
- Ensure all CI checks pass

The key insight is that this file lives inside the repository. When a team wants to change how agents behave, they commit to WORKFLOW.md. The change goes through PR review and leaves a history. Agent policy is version-controlled alongside source code.

The 30-second polling interval, the default 60-second hook timeout, and all four lifecycle hooks — after_create, before_run, after_run, and before_remove — take effect immediately upon a change to WORKFLOW.md, without restarting the service.

Why Elixir: The Superpower of BEAM

Most of OpenAI’s technical stack is Python. Yet Symphony’s reference implementation was written in Elixir.4 The reasoning is clear.

graph LR
    subgraph BEAM["Erlang/BEAM 런타임"]
        ORC["오케스트레이터<br/>프로세스"]
        W1["워커 프로세스<br/>ABC-42"]
        W2["워커 프로세스<br/>ABC-43"]
        W3["워커 프로세스<br/>ABC-44"]
        SUP["Supervisor<br/>크래시 감지 + 재시작"]
        ORC --> SUP
        SUP --> W1
        SUP --> W2
        SUP --> W3
    end
    C1["Codex<br/>인스턴스 1"] <-->|stdio JSON-RPC| W1
    C2["Codex<br/>인스턴스 2"] <-->|stdio JSON-RPC| W2
    C3["Codex<br/>인스턴스 3"] <-->|stdio JSON-RPC| W3

BEAM’s Supervision Tree isolates each agent process independently. If agent A throws an exception and crashes, B and C are unaffected. The supervisor automatically restarts A. Achieving this level of isolation and recovery in Python’s threading model or asyncio would require substantially more code.

Elixir’s lightweight processes — independent at the OS scheduler level rather than OS threads — are ideal for supervising hundreds of concurrent agent runs. In an environment where each Implementation Run blocks for several minutes awaiting LLM inference, the BEAM scheduler’s near-zero cost of waiting made it a natural choice.

The core orchestration logic in the reference implementation amounts to roughly 258 lines of Elixir. That compactness is itself evidence of the abstraction level BEAM affords.

Harness Engineering: The Environment Symphony Assumes

Symphony does not work out of the box when dropped into any repository. The README states plainly: “Symphony works best in codebases that have adopted harness engineering.”5

Harness Engineering is the engineering methodology OpenAI published in February 2026. Its essence is structuring a codebase so that agents can read it, run it, and verify it autonomously. Three pillars define it.

  1. Hermetic Testing: Tests that run locally in a deterministic fashion, with no external dependencies. An agent must be able to verify on its own whether a change is correct.

  2. Machine-Readable Docs: AGENTS.md, WORKFLOW.md, and scripted structures that allow an agent to independently discover how to build, test, and deploy the project.

  3. Modular Architecture: A design with minimal side effects, enabling agents to make localized changes with high confidence.

The OpenAI experiment team reflected that early progress was slower than expected because this environment was not yet in place. The bottleneck was not agent capability but what they called “underspecified environment.” Acceleration came not from asking agents to perform better, but from asking: what infrastructure needs to exist to make this task possible?

How Symphony Differs from Competing Tools

Understanding Symphony’s context requires a look at the current landscape.

ToolTypeAgent TriggerEcosystem Lock-inSelf-Hostable
SymphonyOrchestratorIssue tracker (automatic)None (pluggable)Yes
DevinAutonomous agentManual / SlackCognition cloudNo
GitHub Copilot
Coding Agent
Issue → PRGitHub IssuesGitHub ecosystemNo
CursorIDE assistantDeveloper (manual)VSCode forkNo

GitHub Copilot Coding Agent is the closest conceptual parallel — both aim to convert issues into pull requests automatically. The difference lies in ecosystem lock-in. Copilot assumes a GitHub repository, GitHub Actions, and GitHub Issues in combination. Symphony ships with a Linear adapter as the default, but the Integration Layer in SPEC.md was designed to support GitHub Issues, Jira, or any other tracker via a plugin interface. Community work on a GitHub Issues adapter had reportedly already begun at the time of release.

The contrast with Devin is more fundamental. Devin is a cloud SaaS product; code and context are transmitted to external servers. Symphony is an on-premises daemon. With only a Codex API key, everything runs inside a company’s firewall.

Hacker News and Community Response

Symphony appeared on Hacker News on March 4, 2026.6 The reception was muted — 20 points and 6 comments. Two points captured most of the community’s attention.

The first was the choice of Elixir. The fact that it was not Python stood out. On the r/elixir subreddit, a thread launched with a single line — “And, yes, it’s in Elixir” — and members responded with “Very cool.” For the Elixir community, OpenAI’s choice read as validation that their language was suited to production AI infrastructure.

The second was the readability of SPEC.md. One HN comment was sharp:

“The specs are inscrutable agent slop. I want it to tell me what it does and instead it just lists database fields.” — HN comment6

The criticism was that SPEC.md referenced a state machine without clearly specifying state transitions. It exposed a tension between OpenAI’s approach of “give the SPEC to Codex and have it build from there” and a document written for human comprehension.

Two Ways to Use It

Symphony offers two entry points depending on how a team wants to proceed.

Option 1: Build your own implementation. Hand SPEC.md to an agent. A single prompt — “implement this spec in my language of choice” — is enough to start. TypeScript, Python, Rust — there is no language constraint. This approach suits teams that want to integrate the orchestration logic into an existing stack without adopting Elixir.

Option 2: Use the Elixir reference implementation. This is the elixir/ directory in the GitHub repository. Manage Elixir and Erlang versions with mise, build with mix setup && mix build, and start the service with ./bin/symphony WORKFLOW.md.

git clone https://github.com/openai/symphony
cd symphony/elixir
mise trust && mise install
mise exec -- mix setup && mix build
LINEAR_API_KEY=your_key mise exec -- ./bin/symphony ./WORKFLOW.md

OpenAI attaches a caveat to the reference implementation: “prototype software intended for evaluation only.” Before deploying to production, teams are advised to build a hardened version or conduct thorough review.

What Is Missing

SPEC.md is equally clear about what Symphony does not address. The Non-Goals section explicitly excludes:

  • A rich web UI or multi-tenant control plane
  • A general-purpose workflow engine or distributed job scheduler
  • Business logic for how tickets, PRs, or comments should be edited
  • Enforcement of uniform sandbox policies across all implementations

The last item is significant. Symphony is a low-level engineering preview that presupposes a trusted environment. It is not suited for contexts where external contributors could submit malicious issues that an agent might execute. The intended use case is internal teams operating over access-controlled repositories.

Why Now

The direct prerequisite for Symphony’s emergence was the release of Codex’s App-Server mode. App-Server mode runs Codex as a JSON-RPC server over stdio. This protocol is what enables an orchestrator like Symphony to spawn Codex as a subprocess, receive its event stream, and track agent state.

→ initialize     (orchestrator sends configuration)
← initialized    (Codex acknowledges)
← thread/start   (work thread begins)
← turn/start     (reasoning turn begins)
← turn/completed (turn result)
← turn/failed    (failure)

An important property of this protocol is that it is not Codex-specific. Claude Code, or any future agent that implements the same interface, could connect to Symphony in the same way. If the interface between orchestrator and agent becomes standardized, the choice of which agent to use becomes a matter of changing a single line — codex.command — in WORKFLOW.md.

So What Can You Actually Do With It

Technical specs alone don’t paint the picture. Here are concrete scenarios.

Scenario 1: The backlog-clearing machine. A startup backend team of five. Thirty bug tickets piling up on the Linear board because everyone’s focused on high-priority features. Connect Symphony, and every time a Bug-labeled issue moves to Todo, an agent automatically creates a branch, writes the fix, runs the tests, and opens a PR. Developers come in the next morning and just review PRs. They spend time reviewing, not implementing.

Scenario 2: The overnight shift. Before leaving work, move five issues to Todo. By the next morning, five PRs are waiting for review. CI has passed, and each PR comes with a change summary. The agent worked while you slept. Since developer working hours and agent working hours don’t overlap, effective throughput roughly doubles.

Scenario 3: Boilerplate automation. Projects where every new API endpoint requires the same router, controller, DTO, test, and documentation scaffolding. Write an issue saying “Add POST /api/orders endpoint, spec below” and the agent generates the boilerplate by learning from existing patterns. Repetitive labor shrinks; developers focus on business logic.

Scenario 4: Team expansion for solo developers. For a solo developer running a side project, Symphony is roughly equivalent to hiring a junior developer. Write issues, and they get implemented. You still need to review the code, but the most time-consuming step — implementation — is delegated.

One prerequisite is common to all scenarios: the better your test coverage, the greater the benefit. Agents need to verify their own work. Projects with low test coverage or manual QA dependency limit how autonomous Symphony can be.

A Staircase of Prerequisites

The future Symphony points toward is compelling, but the entry barriers are real. Hermetic testing, machine-readable documentation, modular architecture — codebases that possess all three are currently a minority. For teams with legacy monorepos, test suites that depend on external services, and build processes passed down through oral tradition, Symphony remains a distant prospect.

Ironically, the teams best positioned to benefit from Symphony are already the ones making the most sophisticated use of agents. The process of adopting Harness Engineering itself requires substantial upfront investment. Starting fresh from an empty repository is straightforward; the cost of retrofitting an existing project to meet Symphony’s requirements will vary considerably from team to team.

The direction, however, is clear. OpenAI chose not to sell a product for how to use agents, but instead released, as open source, a reference design for how to make agents work on their own. The Apache 2.0 license places no meaningful barrier on enterprise adoption.

Symphony’s real value is not 258 lines of Elixir. It lies in demonstrating, in executable form, the proposition that running agents like teammates — rather than tools — requires an orchestration layer.


Footnotes

  1. OpenAI, “Harness engineering: leveraging Codex in an agent-first world,” OpenAI Blog, February 2026. https://openai.com/index/harness-engineering/

  2. openai/symphony GitHub Repository, published March 2026. Apache License 2.0. https://github.com/openai/symphony

  3. openai/symphony SPEC.md, “Symphony Service Specification, Draft v1 (language-agnostic).” https://github.com/openai/symphony/blob/main/SPEC.md

  4. openai/symphony elixir/README.md, “Elixir/OTP implementation of Symphony.” https://github.com/openai/symphony/blob/main/elixir/README.md

  5. openai/symphony README.md, “Running Symphony - Requirements.” https://github.com/openai/symphony

  6. Hacker News, “OpenAI Symphony,” item #47252045, March 4, 2026. https://news.ycombinator.com/item?id=47252045 2

← Claude Code's AI Code Review and Karpathy's autoresearch: When AI Writes, Reviews, and Experiments on Its Own GPT-5.4: AI That Builds Your PPT, Excel Models, and Financial Reports →