2026/06/09/cohere-open-sources-a-coding-agent-that-runs-on-a

Cohere open-sources a coding agent that runs on a single H100

Jun 9, 2026, 09:41 PM·VentureBeat

EDITOR BRIEF

Cohere launched North Mini Code, an open-source 30B-parameter mixture-of-experts model for agentic software engineering that can run on a single H100. The model supports a 256K-token context window, terminal work, code review, architecture mapping, and sub-agent orchestration under an Apache 2.0 license. Independent testing found it produced about three times more output tokens than comparable models, creating a potential verbosity cost in production.

CONTEXT

The release gives engineering teams a self-hosted alternative to managed coding agents, which could appeal to companies with cost, privacy, or customization concerns. Its sparse MoE design reflects a broader trend toward efficient agentic models that deliver specialized capabilities without requiring large inference clusters, though output bloat may limit savings at scale.

ARTICLE

Engineering teams building agentic coding pipelines now have a concrete open-source alternative to managed models like Claude Fable 5 — one that runs on a single H100. The tradeoff: Cohere's North Mini Code, which launched Tuesday, generated three times the output tokens of comparable models in independent testing, a verbosity cost that compounds in high-volume production workloads.The new open-source model is a 30 billion parameter mixture-of-experts (MoE) model with 3 billion parameters active per token, built for agentic software engineering including sub-agent orchestration, architecture mapping, code review and terminal work. The model supports a 256,000 token context window with a 64,000 token maximum generation length, and is available on Hugging Face under an Apache 2.0 license.What North Mini Code can doNorth Mini Code targets the full agentic coding stack. Here is what the model does and what it runs on.Software engineering. Cohere built North Mini Code specifically for agentic software engineering, not adapted from a general-purpose base. It has integrated tool-use capabilities and supports interleaved thinking, which Cohere says improves performance across multi-step agentic work.Architecture mapping and code review. North Mini Code can analyze and map systems architecture, surface dependencies and perform code review across large codebases. With a 256,000 token context window, it can hold substantial multi-file projects in a single context pass.Terminal-based agentic tasks. The model is trained for terminal environments, handling shell interactions, package scripts and command-line tooling. Cohere benchmarked it on Terminal-Bench v2, which tests agents in real terminal environments rather than synthetic code generation tasks.How it was builtNorth Mini Code is a sparse mixture-of-experts model with 128 experts, of which 8 activate per token. The compute requirement at inference time is closer to a 3 billion parameter model despite 30 billion total parameters. Nick Frosst, co-founder of Cohere, demoed it running on a Mac Studio via MLX at around 20 gigabytes of RAM, the same machine he uses for his own local coding work.Cohere trained the model through two stages of supervised fine-tuning followed by reinforcement learning with verifiable rewards across more than 70,000 verifiable tasks spanning approximately 5,000 repositories, deduplicated against SWE-Bench. Rather than optimizing against a single agent scaffold, Cohere trained across three. SWE-Agent uses a rich CLI with specialized commands. Mini-SWE-Agent uses a single bash tool with raw shell output. OpenCode uses individually typed tools returning structured JSON. Cohere reports a 10 percentage point gain on OpenCode evaluation from the multi-harness approach while maintaining SWE-Agent performance.Where it fitsNorth Mini Code enters a market that now includes Mistral Devstral Small 2, GitHub Copilot, Cursor, and Claude Fable 5 — each with distinct cost and deployment tradeoffs.Cohere's primary benchmark comparison is against Mistral Devstral Small 2, a 24 billion parameter dense model. In vendor-reported internal tests, Cohere claims 2.8x higher output throughput and a 30% inter-token latency advantage over Devstral Small 2 in internal tests under identical hardware configurations. Cohere also claims, in its Hugging Face technical post, that North Mini Code outperforms open-source models up to four times its parameter count on its reported benchmarks, including models at 120 billion parameters. Artificial Analysis independently ranks it eighth of 127 comparable open-weight models on output speed at 210 tokens per second, with a time to first token of 0.25 second against a class median of 1.95 seconds. It places 18th of 127 on the Artificial Analysis Intelligence Index. One flag from the same data: the model generated 75 million output tokens to complete the Intelligence Index against a class median of 25 million. In high-volume agentic pipelines, that verbosity compounds into inference cost and latency."Suddenly people are thinking like hey, am I getting enough economic value out of the tokens from a model?" Frosst said during the launch video. "Local deployment is one way of empowering people and making AI really something that works for them."GitHub Copilot, Cursor and Claude Code operate on per-usage or subscription pricing with no on-premises option. Anthropic's Claude Fable 5, now the most capable publicly available managed coding model, runs at $50 per million output tokens. For Frosst, the model is the polar opposite of Fable."Its small, cost effective, apache 2.0, and locally deployable. This is the way LLMs should go. small, open source, transparent and sovereign, vs large, expensive, proprietary and hegemonic," Frosst wrote in a post on X.What this means for enterprisesFor teams building production agentic coding pipelines, North Mini Code's release clarifies a set of decisions that have been forming for mont