GEEK HAUS
Back to feed
2026/06/28/semgrep-benchmark-finds-zhipu-ai-s-glm-5-2

Semgrep benchmark finds Zhipu AI’s GLM 5.2 outperforms Claude Code on prompt-only IDOR vulnerability detection

·semgrep.dev
read original

EDITOR BRIEF

Semgrep says Zhipu AI’s open-weight GLM 5.2 scored 39% F1 on its IDOR detection benchmark, ahead of Claude Code at 32% and at about $0.17 per vulnerability found. Semgrep’s own purpose-built multimodal pipeline still led with 53–61% F1, suggesting much of the performance comes from the surrounding security-analysis harness rather than the model alone.

INSIGHTS

The result points to rising competitiveness for open-weight models in specialized security tasks, especially when cost per finding matters. But the stronger showing from Semgrep’s guided pipeline reinforces that tooling, context selection, and workflow design may be as important as raw model capability for production AI security agents.

COMMENTS

Discussion

> geekhaus:~$ next read?

Next read recommendations