Software Architecture & Agentic AI – Asynchrony is key!

When people think about agentic AI, they think about how to automate tasks that require autonomous decision-making. They conceive of high-level solutions for various problems and how a system of AI agents could benefit work efficiency, relieve human workers, increase productivity, etc. Consequently, they begin with prototype development, implement their ideas, test, and evaluate the most suitable models to help them achieve a Proof of Concept (PoC).

However, agentic AI is still in its infancy, and there are many fundamental issues to solve – such as the fact that most AI projects fail in the short term due to unclear scope, overly high expectations, insufficient data volumes, or inadequate data preparation. One of the most fundamental problems, however, is that LLMs’ fallibility and non-deterministic behavior are ignored. An MIT Study depicts that only 5% of enterprise-grade AI systems make it to production. The main reasons for failure cited in this study are brittle workflows, lack of contextual learning, and misalignment with day-to-day operations – meaning insufficient time is spent properly integrating AI into operational processes. You can mitigate such fundamental issues through proper design and sensibly embedding your agentic AI solution into your process. By the way, we at CID can support you at each stage of your agentic AI project.

But let’s imagine agentic AI has grown up and reached a solid, well-working, and understood state – and your Proof of Concept just works fine. Transitioning from a PoC to a production-ready implementation involves understanding not only the intrinsic qualities of AI but also the essentials of software development. Otherwise, the AI project will fail in the long term. Since you will inevitably face the same situation as with non-AI software: you must decide on the right software architecture.

Typically, the approach for prototyping agentic AI software is to choose a monolithic architecture. Monoliths enable faster progress because they are less complex. Current agent-based AI frameworks further enhance the appeal of a monolithic implementation, making it tempting to opt for a monolith for the production-ready version as well (– examples of AI frameworks follow later in this article). Depending on the requirements of your AI project, this may or may not be a valid decision. Choosing the right software architecture is absolutely essential for a production-ready version, though. As explained in detail in the article referenced above, incorrect design decisions may yield initial rapid success for PoCs. Still, they often result in unmaintainable, cost-inefficient, and inoperative systems in production over time. This risk underscores the importance of shift-left design, i.e., addressing scalability, observability, and modularity early in development rather than retrofitting them later, when technical debt becomes prohibitively expensive to resolve.

In this article, I discuss the key points to consider when implementing an agentic AI approach in a large-scale, multi-user production environment.

Requirements Engineering

As with all software architectures, the first, most important, and ongoing step is to define, document, and maintain the requirements for a software project. The functional requirements, which determine what your software should do, are typically worked out, exhaustively elaborated during prototyping, and specified accordingly for an initial production version. Yet, the non-functional requirements, which define the quality and performance aspects, are more crucial when designing production-ready software. Agentic AI software is no different in this regard.

When developing agentic AI systems, a fundamental common feature is that multiple specialized and autonomously operating agents are involved in accomplishing a larger task, each contributing a well-defined, typically small, and domain-specific subtask. If that were not true, there would be no reason to design a system with multiple agents at all – in fact, this can be the best solution anyway: only use AI agents if it makes sense and benefits you!

AI Use-Cases

A typical example in tutorials for agentic AI systems involves a writing pipeline of specialized AI agents. A corresponding workflow might include:

A web-research agent to gather relevant information.
A text-summarization or abstract agent to condense findings into key insights.
A suggestion agent to propose improvements for clarity, tone, or structure.
A tagging agent to categorize content with metadata.
A classification agent to organize material by topic or intent.
A sentiment-analysis agent to assess emotional tone or bias.
A verification, governance, or compliance agent to ensure:
- The article’s relevance and originality.
- The functionality of embedded links.
- The completeness of required elements (e.g., headers, footers, citations).

This modular approach allows each agent to focus on a distinct task, improving writing efficiency and output quality.

Another real-life example is agent-driven wealth management prospecting, triggered by real-time life events – like an upcoming wedding – to deliver timely, personalized financial guidance. When a prospect shares a life update (e.g., “I’m getting married next weekend!”), the system transforms raw data into actionable insights for advisors, automating research while preserving the human touch.

In this context, dedicated agents can handle various tasks fully autonomously:

Data Collection Agents
- Prospect profiles are gathered from CRM records, regulatory filings, and advisor notes.
- Public signals, such as social media announcements or news mentions, are monitored for relevant updates.
Knowledge Infrastructure Agents
- A structured knowledge graph connects disparate data points to create a cohesive view of the prospect.
Event Processing Pipeline Agents
- Public data is scanned to recognize events for named entities, and an LLM filters for the most relevant candidates.
- The LLM identifies high-priority events that warrant an advisor’s attention.
- The system generates tailored action recommendations based on the detected events.

While agents automate the analysis, the final advisor is a human being, ensuring nuanced, trust-based client interactions.

Agentic Comparability

When you consider such a system of individually working software components, each responsible for contributing a dedicated piece to the big picture, the idea suggests itself that an agentic AI system is like a microservice system.

And if you agree with that thought, many of the same non-functional requirements apply to an agentic AI system, too – scalability, reliability, availability, maintainability, cost-efficiency, etc. – as well as certain performance aspects. Depending on the use case, you want to ensure that the system can handle a given load and allow for a given throughput. How are such non-functional requirements achieved in a microservice system? Right – by decoupling your services.

Decouple your Agents!

Decoupling agents means making them operate as independently as possible from each other by maximizing their cohesion, i.e., each agent conforms to a clear, single purpose and includes only the logic and data necessary to fulfill that purpose.

There are plenty of general-purpose and specialized agentic frameworks being developed, e.g., SmolAgents, LangChain, HayStack, SemanticKernel, Agent Framework, you name them. They are great for quick prototyping, checking out ideas, and PoCs. However, they are not yet ready for all kinds of production scenarios. This is not only because many of the offered, but necessary features have not yet reached production readiness. It is because they inherently require you to stick to synchronous, direct in-process or HTTP calls for agent communication, especially regarding agent orchestration and AI workflow design.

Even when considering the Model Context Protocol (MCP) and Google’s Agent-to-Agent Protocol (A2A), communication is done via HTTP (using JSON-RPC, gRPC, SSE), actively waiting for specific agent responses to a given request. Even when using an agent registry to find an appropriate agent to communicate with, the out-of-the-box communication mechanism in existing frameworks typically remains the same: a direct contact address to the agent. And that means agents must know each other.

Hence, there is always some coupling between agents due to their way of communicating. We’ve learned from modern microservice design that tight coupling can degrade several non-functional requirements. Key challenges include:

Scalability: Services become harder to scale up or down independently, as interdependencies require a careful and coordinated approach.
Reliability: Error handling and retry mechanisms grow more complex, increasing the risk of cascading failures.
Maintainability: Rapid fixes or deployments of new services are hindered by dependencies that require coordinated updates.
Throughput: Blocking calls and waiting for downstream services to accept requests introduces latency, reducing overall system efficiency.

The tighter the coupling, the more complex and worse the situation. In the early days of microservices, the designs were more like a (bad) monolithic one, just with additional inter-service communication. There was even a term coined for such a design failure: a monolithic microservice.

When designing an agentic AI system, we should avoid the mistakes of the past and decouple the agents for the same reasons!

Some might argue that agentic use cases and workflows are inherently sequential and strongly coupled by design. For instance, agents may often be arranged like a star, with an orchestrator in the middle, choosing the next agent to contact (although there are no restrictions on the design of sequential paths). Such a design, by the way, can also be true for microservice systems. However, that is just a business or logical point of view. In any case, this argument neglects that such a use case or workflow is used by potentially many users. Hence, many of the non-functional requirements still hold and must be met.

For instance, you still want to be able to scale a single, long-running agent within a sequential workflow up and down depending on the load scenario. An agentic workflow should handle requests concurrently to support multi-user scenarios. Agent errors should be handled gracefully and not require you to restart the entire workflow from the beginning. Of course, you could restart the workflow from the start or scale the workflow as a whole instead of just single agents. However, that is like scaling up entire virtual machines instead of just containers in a microservice scenario. But this means it’s simply not cost-efficient. And this is even more true for work done by AI!

If you must repeat AI work, this may be very costly, as you pay for each token sent to and received from an LLM. Response caching and storing intermediate results, keeping track of histories, etc., can mitigate such costs but can also become complex to implement reliably. A good thing: depending on the AI framework used, you will receive better or worse support for restarting a workflow at intermediate checkpoints.

So, what is the suggested solution? Decouple your agents! Decouple them as much as possible and choose asynchronous communication, e.g., with the help of message brokers. If an agent is unavailable, the request message remains in the queue and is read as soon as the agent returns to work. Need to scale an agent up? Just spawn new agents that can read from the same queue. Is there an error at an intermediate step during the workflow? The state is kept as a message in some queue and continues when the failing step succeeds. If a failing step never succeeds? The state can be moved automatically to a dead-letter queue for further (potentially manual) inspection.

Challenge your Requirements!

A typical counterargument that can be discussed is: Why not simply “modularize” by using AI with tool calls from the same codebase and implementing tools as reusable libraries? While this approach has merits, the decision depends on your specific requirements. It is common sense only to use an agentic or a fully asynchronous agentic system if it makes sense, i.e., it brings you advantages, or it is necessary to meet your requirements. Again, there’s the analogy with implementing a distributed system with microservices: only do that if your use case benefits from it and your non-functional requirements suggest it.

So, tooling vs. agents is a question of what you want to achieve. At its core, a “stupid agent” could be conceived as just a tool. However, what makes an agent truly agentic is its agency, i.e., its ability to reason, make decisions, and act autonomously rather than merely executing predefined functions. So, tools and libraries have their right to exist (e.g., as part of an agent!) but fulfill a totally different purpose.

Key advantages of a fully decoupled, asynchronous agent-based architecture include:

True Decoupling: Individual agents can scale independently. Unlike libraries in a monolith, where scaling one component often forces scaling of the entire system. Modularizing by libraries can also introduce hidden dependencies and thus unwanted coupling.
Clear Separation of Concerns: Agents operate as self-contained units in a dedicated domain, reducing unintended side effects.
Operational Flexibility: Decoupled agents can be deployed, updated, or replaced without disrupting the broader system.

As discussed in detail in the previous section, decoupling is achieved best by using fully asynchronous communication methods.

The Bottom Line

Asynchrony is key for an agentic AI system in a large-scale, multi-user production environment. Don’t repeat the mistakes of the past and stick to a hard-to-scale or maintain monolithic design.

Designing software and choosing the right architecture is not a trivial task. The same is true (or even more true) for developing AI and agentic AI software. We at CID have 17+ years of experience in AI and almost 30 years of experience with software architecture. We can support you and your company by discussing your AI and agentic AI use cases and how they can benefit your business. We can help nail down the appropriate set of functional and non-functional requirements and accompany you along the way to a fully production-ready implementation of your agentic AI system that – to close the circle finally – will just work in the long term and also not fail in the short term.

Please get in touch with us so we can discuss how we can enable your AI agents to your advantage.

Software Architecture & Agentic AI – Asynchrony is key!

Requirements Engineering

AI Use-Cases

Agentic Comparability

Decouple your Agents!

Challenge your Requirements!

The Bottom Line

Author

Share

More posts

AI Chatbots in Online Shops: Why Most Solutions Fall Short and How to Do It Better

Future-Proofing Infrastructure: Leveraging Modern Software and Hardware Architectures with AI Accelerators for Sustainable Efficiency

Latest Media Content

No-/Low-Code or Bespoke: Which Approach Fits Your Business?

Software Architecture: Building Systems That Fit Your Needs

The Role of AI in Data Engineering: Transforming Workflows and Efficiency