AI agents are entering their rebuild era as enterprises confront the reliability problem
편집자 요약
기업들이 AI agent를 생산 환경에 투입하면서 LLM 성능만으로는 장기 실행 워크플로의 안정성을 보장할 수 없다는 점이 드러나고 있습니다. Temporal Technologies의 Preeti Somal은 초기 agent 구현이 장애 복구, 상태 관리, 비용 통제, API 조율 같은 운영 기반을 충분히 갖추지 못해 ‘버전 2.0’ 재구축 수요가 늘고 있다고 설명했습니다.
맥락
Agentic AI 확산은 새로운 문제라기보다 기존 분산 시스템의 난제를 더 복잡하게 만드는 흐름입니다. 기업용 AI의 경쟁력은 모델 선택을 넘어 durable execution, observability, governance를 갖춘 플랫폼 엔지니어링 역량으로 이동하고 있습니다.
본문
As enterprise AI agents move into production, organizations are confronting a growing reliability problem. Many teams are discovering that LLM performance alone does not determine whether agents succeed in production. Long-running AI workflows must survive crashes, preserve state, recover from failures, manage inference costs, and coordinate across APIs, tools, and enterprise systems.After a first wave focused on rapid deployment, organizations now need to revisit those first-generation implementations, and redesign early agent architectures around workflow orchestration, observability, governance, and recovery, said Preeti Somal, Senior VP Engineering at Temporal Technologies, during the latest AI Impact Series event in New York. “We do have a lot of customers that come to us where they’re building version 2.0 of the same agent,” Somal said. “They had to move really fast, but they didn’t take care of the plumbing. Things crash and burn, and then they’re back to rebuilding with the reliable foundation.”For workflow orchestration company Temporal, whose infrastructure predates the current wave of agentic AI, the shift reflects a broader enterprise realization: production AI systems require durable execution, state management, visibility into workflows, and mechanisms to recover when models or downstream systems fail. Agentic AI has supercharged familiar engineering problems“These patterns aren’t necessarily new," Somal said. " AI just supercharges them."Agentic systems introduce additional complexity because they often involve long-running, multi-step processes spanning multiple services, models, APIs, and tools. A single workflow might call several large language models, access retrieval systems, trigger external applications, and manage state over hours or days. The engineering questions, Somal said, often emerge only after deployment.“People will write agents but haven’t thought about what happens if the agent crashes,” she said. “Am I going to need to run the entire agent flow again?” For enterprises operating under cost constraints, the answer matters. Restarting workflows after failures can multiply inference expenses, increase latency, and create poor customer experiences.Somal compared the current moment to an earlier period in enterprise cloud adoption when organizations went straight to migrating workloads before considering that they needed to redesign underlying architectures if they wanted these workloads to weather the long-term.“This rush to do AI in a world where you haven’t even modernized your application reminds me a little bit of that lift-and-shift that happened in the cloud,” she said. “Everybody realized you’re spending more money on cloud and we haven’t gotten value there.” Why long-running agents force a new architectureEnterprise workflows increasingly involve agents executing over long windows, sometimes spanning many hours while interacting with tools and systems. Reliability challenges compound when workflows persist over time, and it impacts both state and memory, two ideas that are often treated interchangeably in AI conversations.State concerns workflow execution. It includes where an agent is in a process, which actions have already completed, and where recovery should resume after failure. Memory or context captures information an agent carries forward across interactions or tasks.“The state of the agent is around what step and what actions have been performed, and if something crashes, where do you want to recover from, versus the context and memory piece,” Somal explained. That distinction becomes increasingly important when enterprises begin moving beyond simple chatbot interactions toward longer-running business processes. Somal pointed to a healthcare example involving customer Abridge, where workflows process physician visits through multiple stages, including audio processing, summarization, model calls, and after-visit generation.“There’s not just one piece to that flow,” Somal said. “Taking videos and slicing that, taking summaries, calling the LLMs, generating the after-visit summary, all of that is being orchestrated.” The implication for enterprises is that successful agents increasingly depend on systems that can survive interruptions, coordinate across services, and maintain continuity over time.The rise of the deterministic spineA useful framework for enterprise AI design is the deterministic spine, Somal said, which is how they think about Temporal's role. “It is denoting the path you want to take," she said. "It is calling the brain, but if the brain doesn’t respond, it will call it again. If the brain responds but the next step is going to fail, it will pick up from where that failure happened.” In this framing, the language model acts as a probabilistic system producing variable outputs, while orchestration software maintains execution reliability around it. And the concept matters because enterprise systems increasingly require consistency even when


