2026/06/28/semgrep-benchmark-finds-zhipu-ai-s-glm-5-2
Semgrep benchmark finds Zhipu AI’s GLM 5.2 outperforms Claude Code on prompt-only IDOR vulnerability detection
EDITOR BRIEF
Semgrep says Zhipu AI’s open-weight GLM 5.2 scored 39% F1 on its IDOR detection benchmark, ahead of Claude Code at 32% and at about $0.17 per vulnerability found. Semgrep’s own purpose-built multimodal pipeline still led with 53–61% F1, suggesting much of the performance comes from the surrounding security-analysis harness rather than the model alone.
INSIGHTS
The result points to rising competitiveness for open-weight models in specialized security tasks, especially when cost per finding matters. But the stronger showing from Semgrep’s guided pipeline reinforces that tooling, context selection, and workflow design may be as important as raw model capability for production AI security agents.
COMMENTS
Discussion
> geekhaus:~$ next read?
