GLM-5.2, the open-weight model Z.ai shipped in June 2026, is the one everyone is lining up against GPT-5.5 and Claude Opus 4.8, and the headline holds up: it beats GPT-5.5 on several coding benchmarks and costs roughly six times less than Opus on output tokens. But 'beats' and 'cheaper' hide the more useful truth. On the hardest long-horizon tasks Opus 4.8 still pulls clearly ahead, GLM is text-only where Opus reads images, and the cheap hosted API routes through China. Here are the actual numbers, side by side and sourced, the price breakdown, and an honest read on when the cheap open model is the right call and when it isn't.
The short answer
GLM-5.2 from Z.ai beats GPT-5.5 on several coding benchmarks and costs about six times less than Claude Opus 4.8 on output, with open weights you can self-host for free. The catch: Opus still wins the hardest tasks clearly, and GLM cannot see images. Great value, not a clean knockout.
What GLM-5.2 is, in one breath
Z.ai (the lab formerly known as Zhipu) shipped GLM-5.2 on 13 June 2026 and put the weights up a few days later. It’s a 753-billion-parameter mixture-of-experts model, with only 8 experts active per token, so it punches above what its raw size suggests. It carries a 1 million token context window, two reasoning effort levels, and an MIT license, which is about as permissive as open weights get. One thing it does not do: images. GLM-5.2 is text-only, and that matters more than it sounds, as we’ll see.
The benchmarks, minus the hype
Here’s the honest picture, four coding benchmarks, three models:
The good news for the challenger is real. GLM-5.2 beats GPT-5.5 on SWE-bench Pro (62.1 to 58.6), on FrontierSWE (74.4 to 72.6) and on MCP-Atlas (76.8 to 75.3). And against the reigning model, it gets genuinely close on a few: it’s within 0.7 points of Opus 4.8 on FrontierSWE and within a point on MCP-Atlas. On the Code Arena frontend leaderboard it even edges Opus, sitting at #2 to Opus at #4. For an open-weight model at a fraction of the price, that’s a serious showing.
Now the part the headlines skip. On the brutal long-horizon tests, Opus 4.8 doesn’t just win, it pulls away. NL2Repo: 69.7 against GLM’s 48.9, a 20-point gap. SWE-Marathon: 26 against 13, double. SWE-bench Pro: a clean 7 points. Here’s the full table, copy it if you want it:
| Benchmark | GLM-5.2 | GPT-5.5 | Claude Opus 4.8 |
|---|---|---|---|
| Terminal-Bench 2.1 | 81.0 | 84.0 | 85.0 |
| SWE-bench Pro | 62.1 | 58.6 | 69.2 |
| FrontierSWE | 74.4 | 72.6 | 75.1 |
| MCP-Atlas | 76.8 | 75.3 | 77.8 |
| NL2Repo | 48.9 | 50.7 | 69.7 |
| SWE-Marathon | 13.0 | 12.0 | 26.0 |
So “talons Opus” is true on the mid-tier and misleading as a summary. The closer the task gets to a hard, multi-step, real-codebase problem, the wider Opus’s lead. Numbers from DigitalApplied and Artificial Analysis.
The price, where it actually wins
This is the real story, not the leaderboard. Output tokens, per million:
GLM-5.2 runs $1.40 in / $4.40 out per million tokens, with cached input at $0.26. Opus 4.8 is $5 / $25, GPT-5.5 is $5 / $30. So on output, where the money actually goes, GLM is roughly 5.7x cheaper than Opus and nearly 7x cheaper than GPT-5.5. On input it’s about 3.6x cheaper. And because the weights are MIT, you can skip the per-token bill entirely and self-host, which no closed model lets you do. If your workload is big and your task is mid-tier, the math gets very hard to argue with. (Anthropic and OpenAI pricing: Claude, OpenAI.)
The catches nobody puts in the headline
Three of them, and they’re not small.
It can’t see. GLM-5.2 is text-only. Opus 4.8 reads screenshots, PDFs and UI state from images. If your agent looks at a screen, GLM is simply out of the running, and that rules it out of a whole category of agentic work today.
The cheap API has a passport. The hosted endpoint runs through a China-based provider, which is a real data-residency question if you’re sending proprietary or regulated code through it. The clean answer is the open weights: self-host and the question disappears. But then you’re paying for GPUs, not API calls, so do that math honestly.
It’s a June 2026 snapshot. This space turns over weekly. Today’s gap is not next month’s. Re-check the leaderboards before you commit a project to any of these.
So who should actually use it
Reach for GLM-5.2 when cost dominates, when you can self-host, when open weights matter for your stack, or when the work is mid-tier coding and agentic tasks that stay in text. For a lot of teams, that’s most of the work, and the savings are enormous.
Stay on Opus 4.8 for the hardest long-horizon coding, for anything that touches images, and for the cases where the last few points decide whether a multi-step agent finishes or stalls. That ceiling is what you’re paying for.
GPT-5.5 is the awkward one here: it’s pricier than Opus on output and behind GLM on several of these coding benchmarks, so on this particular axis it’s hard to place. It has other strengths, but “best value coder” isn’t the pitch.
Honestly, the headline that ages best isn’t “GLM beats GPT-5.5.” It’s that an open-weight model you can run yourself now lands a point behind the frontier on real coding work, for a fraction of the cost. That’s the thing worth watching.
Sources: VentureBeat, Artificial Analysis, DigitalApplied, plus the Anthropic and OpenAI pricing pages. Benchmarks as published in June 2026.
Frequently asked questions
Is GLM-5.2 really free?
The weights are, under an MIT license, so you can download and self-host them for nothing. The hosted API is not free but it is cheap: about $4.40 per million output tokens. So free if you run it yourself, very low cost if you do not.
Does GLM-5.2 actually beat GPT-5.5?
On several coding benchmarks, yes: SWE-bench Pro, FrontierSWE and MCP-Atlas all put it ahead of GPT-5.5. On a couple of others, like Terminal-Bench and NL2Repo, GPT-5.5 is still in front. So it beats GPT-5.5 on a lot of long-horizon coding work, not on everything.
Is it as good as Claude Opus 4.8?
Close on some tests, not on the hardest. It trails Opus 4.8 by under a point on FrontierSWE and MCP-Atlas, but Opus is far ahead on NL2Repo and SWE-Marathon, the tougher long-horizon tasks, and Opus is the only one of the three that can read images. It runs alongside Opus on the mid-tier, it does not match it overall.
What is the catch with the cheap API?
Two things. It is text-only, so no image or screenshot input. And the hosted API runs through a China-based provider, which raises data-residency questions for sensitive code; self-hosting the open weights sidesteps that.
Will these numbers stay true?
No, and that is the nature of the field. These are launch benchmarks from June 2026. New models and revised scores land constantly, so treat this as a snapshot and re-check the leaderboards before you bet a project on it.