Introduction to Agents

如果无法正常显示,请先停止浏览器的去广告插件。
分享至:
1. Introduction to Agents Authors: Alan Blount, Antonio Gulli, Shubham Saboo, Michael Zimmermann, and Vladimir Vuskovic
2. Introduction to Agents and Agent architectures Acknowledgements Content contributors Enrique Chan Mike Clark Derek Egan Anant Nawalgaria Kanchana Patlolla Julia Wiesinger Curators and editors Anant Nawalgaria Kanchana Patlolla Designer Michael Lanning November 2025 2
3. Table of contents From Predictive AI to Autonomous Agents 6 Introduction to AI Agents 8 The Agentic Problem-Solving Process A Taxonomy of Agentic Systems 10 14 Level 0: The Core Reasoning System 15 Level 1: The Connected Problem-Solver 15 Level 2: The Strategic Problem-Solver 16 Level 3: The Collaborative Multi-Agent System 17 Level 4: The Self-Evolving System 18 Core Agent Architecture: Model, Tools, and Orchestration 19 Model: The “Brain” of your AI Agent 19 Tools: The "Hands" of your AI Agent 20 Retrieving Information: Grounding in Reality 21 Executing Actions: Changing the World 21 Function Calling: Connecting Tools to your Agent 22
4. Table of contents The Orchestration Layer 22 Core Design Choices 23 Instruct with Domain Knowledge and Persona 23 Augment with Context 24 Multi-Agent Systems and Design Patterns 24 Agent Deployment and Services 26 Agent Ops: A Structured Approach to the Unpredictable 27 Measure What Matters: Instrumenting Success Like an A/B Experiment 29 Quality Instead of Pass/Fail: Using a LM Judge 29 Metrics-Driven Development: Your Go/No-Go for Deployment 30 Debug with OpenTelemetry Traces: Answering "Why?" 30 Cherish Human Feedback: Guiding Your Automation Agent Interoperability 31 31 Agents and Humans 32 Agents and Agents 33 Agents and Money 34
5. Table of contents Securing a Single Agent: The Trust Trade-Off 34 Agent Identity: A New Class of Principal 35 Policies to Constrain Access 37 Securing an ADK Agent 37 Scaling Up from a Single Agent to an Enterprise Fleet 39 Security and Privacy: Hardening the Agentic Frontier 40 Agent Governance: A Control Plane instead of Sprawl 40 How agents evolve and learn 42 How agents learn and self evolve 43 Simulation and Agent Gym - the next frontier 46 Examples of advanced agents 47 Google Co-Scientist 47 AlphaEvolve Agent 49 Conclusion 51 Endnotes 52
6. Introduction to Agents and Agent architectures Agents are the natural evolution of Language Models, made useful in software. From Predictive AI to Autonomous Agents Artificial intelligence is changing. For years, the focus has been on models that excel at passive, discrete tasks: answering a question, translating text, or generating an image from a prompt. This paradigm, while powerful, requires constant human direction for every step. We're now seeing a paradigm shift, moving from AI that just predicts or creates content to a new class of software capable of autonomous problem-solving and task execution. This new frontier is built around AI agents. An agent is not simply an AI model in a static workflow; it's a complete application, making plans and taking actions to achieve goals. It combines a Language Model's (LM) ability to reason with the practical ability to act, allowing November 2025 6
7. Introduction to Agents and Agent architectures it to handle complex, multi-step tasks that a model alone cannot. The critical capability is that agents can work on their own, figuring out the next steps needed to reach a goal without a person guiding them at every turn. This document is the first in a five-part series, acting as a formal guide for the developers, architects, and product leaders transitioning from proofs-of-concept to robust, production-grade agentic systems. While building a simple prototype is straightforward, ensuring security, quality and reliability is a significant challenge. This paper provides a comprehensive foundation: • Core Anatomy: Deconstructing an agent into its three essential components: the reasoning Model, actionable Tools, and the governing Orchestration Layer. • A Taxonomy of Capabilities: Classifying agents from simple, connected problem-solvers to complex, collaborative multi-agent systems. • Architectural Design: Diving into the practical design considerations for each component, from model selection to tool implementation. • Building for Production: Establishing the Agent Ops discipline needed to evaluate, debug, secure, and scale agentic systems from a single instance to a fleet with enterprise governance. Building on the previous Agents whitepaper 1 and Agent Companion 2 ; this guide provides the foundational concepts and strategic frameworks you will need to successfully build, deploy, and manage this new generation of intelligent applications which can reason, act and observe to accomplish goals 3 . November 2025 7
8. Introduction to Agents and Agent architectures Words are insufficient to describe how humans interact with AI. We tend to anthropomorphize and use human terms like “think” and “reason” and “know.” We don't yet have words for "know with semantic meaning" vs "know with high probability of maximizing a reward function." Those are two different types of knowing, but the results are the same 99.X% of the time. Introduction to AI Agents In the simplest terms, an AI Agent can be defined as the combination of models, tools, an orchestration layer, and runtime services which uses the LM in a loop to accomplish a goal. These four elements form the essential architecture of any autonomous system. • The Model (The "Brain"): The core language model (LM) or foundation model that serves as the agent's central reasoning engine to process information, evaluate options, and make decisions. The type of model (general-purpose, fine-tuned, or multimodal) dictates the agent's cognitive capabilities. An agentic system is the ultimate curator of the input context window the LM. • Tools (The "Hands"): These mechanisms connect the agent's reasoning to the outside world, enabling actions beyond text generation. They include API extensions, code functions, and data stores (like databases or vector stores) for accessing real-time, factual information. An agentic system allows a LM to plan which tools to use, executes the tool, and puts the tool results into the input context window of the next LM call. • The Orchestration Layer (The "Nervous System"): The governing process that manages the agent's operational loop. It handles planning, memory (state), and reasoning strategy execution. This layer uses prompting frameworks and reasoning techniques (like November 2025 8
9. Introduction to Agents and Agent architectures Chain-of-Thought 4 or ReAct 5 ) to break down complex goals into steps and decide when to think versus use a tool. This layer is also responsible for giving agents the memory to "remember." • Deployment (The "Body and Legs"): While building an agent on a laptop is effective for prototyping, production deployment is what makes it a reliable and accessible service. This involves hosting the agent on a secure, scalable server and integrating it with essential production services for monitoring, logging, and management. Once deployed, the agent can be accessed by users through a graphical interface or programmatically by other agents via an Agent-to-Agent (A2A) API. At the end of the day, building a generative AI agent is a new way to develop solutions to solve tasks. The traditional developer acts as a "bricklayer," precisely defining every logical step. The agent developer, in contrast, is more like a director. Instead of writing explicit code for every action, you set the scene (the guiding instructions and prompts), select the cast (the tools and APIs), and provide the necessary context (the data). The primary task becomes guiding this autonomous "actor" to deliver the intended performance. You'll quickly find that an LM's greatest strength—its incredible flexibility—is also your biggest headache. A large language model's capacity to do anything makes it difficult to compel it to do one specific thing reliably and perfectly. What we used to call “prompt engineering” and now call “context engineering” guides LMs to generate the desired output. For any single call to a LM, we input our instructions, facts, available tools to call, examples, session history, user profile, etc – filling the context window with just the right information to get the outputs we need. Agents are software which manage the inputs of LMs to get work done. Debugging becomes essential when issues arise. "Agent Ops" essentially redefines the familiar cycle of measurement, analysis, and system optimization. Through traces and logs, you can monitor the agent's "thought process" to identify deviations from the intended execution path. As models evolve and frameworks improve, the developer's role is to furnish November 2025 9
10. Introduction to Agents and Agent architectures critical components: domain expertise, a defined personality, and seamless integration with the tools necessary for practical task completion. It's crucial to remember that comprehensive evaluations and assessments often outweigh the initial prompt's influence. When an agent is precisely configured with clear instructions, reliable tools, and an integrated context serving as memory, a great user interface, the ability to plan and problem solve, and general world knowledge, it transcends the notion of mere "workflow automation." It begins to function as a collaborative entity: a highly efficient, uniquely adaptable, and remarkably capable new member of your team. In essence, an agent is a system dedicated to the art of context window curation. It is a relentless loop of assembling context, prompting the model, observing the result, and then re-assembling a context for the next step. The context may include system instructions, user input, session history, long term memories, grounding knowledge from authoritative sources, what tools could be used, and the results of tools already invoked. This sophisticated management of the model's attention allows its reasoning capabilities to problem solve for novel circumstances and accomplish objectives. The Agentic Problem-Solving Process We have defined an AI agent as a complete, goal-oriented application that integrates a reasoning model, actionable tools, and a governing orchestration layer. A short version is “LMs in a loop with tools to accomplish an objective.” But how does this system actually work? What does an agent do from the moment it receives a request to the moment it delivers a result? November 2025 10
11. Introduction to Agents and Agent architectures At its core, an agent operates on a continuous, cyclical process to achieve its objectives. While this loop can become highly complex, it can be broken down into five fundamental steps as discussed in detail in the book Agentic System Design: 6 1. Get the Mission: The process is initiated by a specific, high-level goal. This mission is provided by a user (e.g., "Organize my team's travel for the upcoming conference") or an automated trigger (e.g., "A new high-priority customer ticket has arrived"). 2. Scan the Scene: The agent perceives its environment to gather context. This involves the orchestration layer accessing its available resources: "What does the user's request say?", "What information is in my term memory? Did I already try to do this task? Did the user give me guidance last week?", "What can I access from my tools, like calendars, databases, or APIs?" 3. Think It Through: This is the agent's core "think" loop, driven by the reasoning model. The agent analyzes the Mission (Step 1) against the Scene (Step 2) and devises a plan. This isn't a single thought, but often a chain of reasoning: "To book travel, I first need to know who is on the team. I will use the get_team_roster tool. Then I will need to check their availability via the calendar_api." 4. Take Action: The orchestration layer executes the first concrete step of the plan. It selects and invokes the appropriate tool—calling an API, running a code function, or querying a database. This is the agent acting on the world beyond its own internal reasoning. 5. Observe and Iterate: The agent observes the outcome of its action. The get_ team_roster tool returns a list of five names. This new information is added to the agent's context or "memory." The loop then repeats, returning to Step 3: "Now that I have the roster, my next step is to check the calendar for these five people. I will use the calendar_api." November 2025 11
12. Introduction to Agents and Agent architectures This "Think, Act, Observe" cycle continues - managed by the Orchestration Layer, reasoned by the Model, and executed by the Tools until the agent's internal plan is complete and the initial Mission is achieved. Figure 1: Agentic AI problem-solving process Let’s take a real-world example of how a Customer Support Agent would operate in this 5-step cycle: Imagine a user asks, "Where is my order #12345?" November 2025 12
13. Introduction to Agents and Agent architectures Instead of immediately acting, the agent first enters its "Think It Through" phase to devise a complete strategy. It reasons: "The user wants a delivery status. To provide a complete answer, I need a multi-step plan: 1. Identify: I must first find the order in our internal database to confirm it exists and get its details. 2. Track: From those details, I'll need to extract the shipping carrier's tracking number. Then, I'll query the external carrier's API for the live status. 3. Report: Finally, I must synthesize the information I've gathered into a clear, helpful response for the user." With this multi-step plan in mind, the agent begins execution. In its first "Act" phase, it executes step one of its plan, calling the find_order("12345") tool. It observes the result—a full order record, including the tracking number "ZYX987." The agent's orchestration layer recognizes that the first part of its plan is complete and immediately proceeds to the second. It acts by calling the get_shipping_ status("ZYX987") tool. It observes the new result: "Out for Delivery." Finally, having successfully executed the data-gathering stages of its plan, the agent moves to the "Report" step. It perceives it has all the necessary components, plans the final message, and acts by generating the response: "Your order #12345 is 'Out for Delivery'! November 2025 13
14. Introduction to Agents and Agent architectures A Taxonomy of Agentic Systems Understanding the 5-step operational loop is the first part of the puzzle. The second is recognizing that this loop can be scaled in complexity to create different classes of agents. For an architect or product leader, a key initial decision is scoping what kind of agent to build. We can classify agentic systems into a few broad levels, each building on the capabilities of the last. Figure 2: Agentic system in 5 steps November 2025 14
15. Introduction to Agents and Agent architectures Level 0: The Core Reasoning System Before we can have an agent, we must start with the "Brain" in its most basic form: the reasoning engine itself. In this configuration, a Language Model (LM) operates in isolation, responding solely based on its vast pre-trained knowledge without any tools, memory, or interaction with the live environment. Its strength lies in this extensive training, allowing it to explain established concepts and plan how to approach solving a problem with great depth. The trade-off is a complete lack of real- time awareness; it is functionally "blind" to any event or fact outside its training data. For instance, it can explain the rules of professional baseball and the complete history of the New York Yankees. But if you ask, "What was the final score of the Yankees game last night?", it would be unable to answer. That game is a specific, real-world event that happened after its training data was collected, so the information simply doesn't exist in its knowledge. Level 1: The Connected Problem-Solver At this level, the reasoning engine becomes a functional agent by connecting to and utilizing external tools - the "Hands" component of our architecture. Its problem-solving is no longer confined to its static, pre-trained knowledge. Using the 5-step loop, the agent can now answer our previous question. Given the "Mission": "What was the final score of the Yankees game last night?", its "Think" step recognizes this as a real-time data need. Its "Act" step then invokes a tool, like a Google Search API with the proper date and search terms. It "Observes" the search result (e.g., "Yankees won 5-3"), and synthesizes that fact into a final answer. November 2025 15
16. Introduction to Agents and Agent architectures This fundamental ability to interact with the world - whether using a search tool for a score , a financial API for a live stock price, or a database via Retrieval-Augmented Generation (RAG) is the core capability of a Level 1 agent. Level 2: The Strategic Problem-Solver Level 2 marks a significant expansion in capability, moving from executing simple tasks to strategically planning complex, multi-part goals. The key skill that emerges here is context engineering: the agent's ability to actively select, package, and manage the most relevant information for each step of its plan. An agent's accuracy depends on a focused, high-quality context. Context engineering curates the model's limited attention to prevent overload and ensure efficient performance. For instance, consider the "Mission": "Find a good coffee shop halfway between my office at 1600 Amphitheatre Parkway, Mountain View, and my client's office at 1 Market St, San Francisco." A Level 2 agent will start creating a plan: 1. Think: "I must first find the halfway point." • Act: Call the Maps tool with both addresses. • Observe: "The halfway point is Millbrae, CA." 2. Think: "Now I must find coffee shops in Millbrae. The user asked for 'good' ones, so I will search for places with a 4-star rating or higher." November 2025 16
17. Introduction to Agents and Agent architectures • Act: Call the google_places tool with query="coffee shop in Millbrae, CA", min_rating=4.0. (This is context engineering - it automatically created a new, focused search query from the previous step's output ). • Observe: "The search returns 'Millbrae Coffee' and 'The Daily Grind'." 3. Think: "I will synthesize these results and present them to the user." This strategic planning also enables proactive assistance, like an agent that reads a long flight confirmation email, engineers the key context (flight number, date), and acts by adding it to your calendar. Level 3: The Collaborative Multi-Agent System At the highest level, the paradigm shifts entirely. We move away from building a single, all- powerful "super-agent" and toward a "team of specialists" working in concert, a model that directly mirrors a human organization. The system's collective strength lies in this division of labor. Here, agents treat other agents as tools. Imagine a "Project Manager" agent receiving a "Mission": "Launch our new 'Solaris' headphones." The Project Manager agent doesn't do the entire work itself. It Acts by creating new Missions for its team of specialized agents much like how it works in the real life: 1. Delegates to MarketResearchAgent: "Analyze competitor pricing for noise-canceling headphones. Return a summary document by tomorrow." 2. Delegates to MarketingAgent: "Draft three versions of a press release using the 'Solaris' product spec sheet as context." November 2025 17
18. Introduction to Agents and Agent architectures 3. Delegates to WebDevAgent: "Generate the new product page HTML based on the attached design mockups." This collaborative model, while currently constrained by the reasoning limitations of today's LMs, represents the frontier of automating entire, complex business workflows from start to finish. Level 4: The Self-Evolving System Level 4 represents a profound leap from delegation to autonomous creation and adaptation. At this level, an agentic system can identify gaps in its own capabilities and dynamically create new tools or even new agents to fill them. It moves from using a fixed set of resources to actively expanding them. Following our example, the "Project Manager" agent, tasked with the 'Solaris' launch, might realize it needs to monitor social media sentiment, but no such tool or agent exists on its team. 1. Think (Meta-Reasoning): "I must track social media buzz for 'Solaris,' but I lack the capability." 2. Act (Autonomous Creation): Instead of failing, it invokes a high-level AgentCreator tool with a new mission: "Build a new agent that monitors social media for keywords 'Solaris headphones', performs sentiment analysis, and reports a daily summary." 3. Observe: A new, specialized SentimentAnalysisAgent is created, tested, and added to the team on the fly, ready to contribute to the original mission. This level of autonomy, where a system can dynamically expand its own capabilities, turns a team of agents into a truly learning and evolving organization. November 2025 18
19. Introduction to Agents and Agent architectures Core Agent Architecture: Model, Tools, and Orchestration We know what an agent does and how it can scale. But how do we actually build it? The transition from concept to code lies in the specific architectural design of its three core components. Model: The “Brain” of your AI Agent The LM is the reasoning core of your agent, and its selection is a critical architectural decision that dictates your agent's cognitive capabilities, operational cost, and speed. However, treating this choice as a simple matter of picking the model with the highest benchmark score is a common path to failure. An agent's success in a production environment is rarely determined by generic academic benchmarks. Real-world success demands a model that excels at agentic fundamentals: superior reasoning to navigate complex, multi-step problems and reliable tool use to interact with the world 7 . To do this well, start by defining the business problem, then test models against metrics that directly map to that outcome. If your agent needs to write code, test it on your private codebase. If it processes insurance claims, evaluate its ability to extract information from your specific document formats. This analysis must then be cross-referenced with the practicalities of cost and latency. The "best" model is the one that sits at the optimal intersection of quality, speed, and price for your specific task 8 . November 2025 19
20. Introduction to Agents and Agent architectures You may choose more than one model, a "team of specialists." You don't use a sledgehammer to crack a nut. A robust agent architecture might use a frontier model like Gemini 2.5 Pro for the heavy lifting of initial planning and complex reasoning, but then intelligently route simpler, high-volume tasks—like classifying user intent or summarizing text—to a much faster and more cost-effective model like Gemini 2.5 Flash. Model routing might be automatic or hard- coded but is a key strategy for optimizing both performance and cost 9 . The same principle applies to handling diverse data types. While a natively multimodal model like Gemini live mode 10 offers a streamlined path to processing images and audio, an alternative is to use specialized tools like the Cloud Vision API 11 or Speech-to-Text API 12 . In this pattern, the world is first converted to text, which is then passed to a language-only model for reasoning. This adds flexibility and allows for best-of-breed components, but also introduces significant complexity. Finally, the AI landscape is in a state of constant, rapid evolution. The model you choose today will be superseded in six months. A "set it and forget it" mindset is unsustainable. Building for this reality means investing in a nimble operational framework—an "Agent Ops" practice 13 . With a robust CI/CD pipeline that continuously evaluates new models against your key business metrics, you can de-risk and accelerate upgrades, ensuring your agent is always powered by the best brain available without requiring a complete architectural overhaul. Tools: The "Hands" of your AI Agent If the model is the agent's brain, tools are the hands that connect its reasoning to reality. They allow the agent to move beyond its static training data to retrieve real-time information and take action in the world. A robust tool interface is a three-part loop: defining what a tool can do, invoking it, and observing the result. November 2025 20
21. Introduction to Agents and Agent architectures Here are a few of the main types of tools agent builders will put into the “hands” of their agents. For a more complete deep dive see the agent tools focused whitepaper in this series. Retrieving Information: Grounding in Reality The most foundational tool is the ability to access up-to-date information. Retrieval- Augmented Generation (RAG) gives the agent a "library card" to query external knowledge, often stored in Vector Databases or Knowledge Graphs, ranging from internal company documents to web knowledge via Google Search. For structured data, Natural Language to SQL (NL2SQL) tools allow the agent to query databases to answer analytic questions like, "What were our top-selling products last quarter?" By looking things up before speaking— whether in a document or a database—the agent grounds itself in fact, dramatically reducing hallucinations. Executing Actions: Changing the World The true power of agents is unleashed when they move from reading information to actively doing things. By wrapping existing APIs and code functions as tools, an agent can send an email, schedule a meeting, or update a customer record in ServiceNow. For more dynamic tasks, an agent can write and execute code on the fly. In a secure sandbox, it can generate a SQL query or a Python script to solve a complex problem or perform a calculation, transforming it from a knowledgeable assistant into an autonomous actor 14 . November 2025 21
22. Introduction to Agents and Agent architectures This also includes tools for human interaction. An agent can use a Human in the Loop (HITL) tool to pause its workflow and ask for confirmation (e.g., ask_for_confirmation()) or request specific information from a user interface (e.g., ask_for_date_input()), ensuring a person is involved in critical decisions. HITL could be implemented via SMS text messaging and a task in a database. Function Calling: Connecting Tools to your Agent For an agent to reliably do “function calling” and use tools, it needs clear instructions, secure connections, and orchestration 15 . Longstanding standards like the OpenAPI specification provide this, giving the agent a structured contract that describes a tool's purpose, its required parameters, and its expected response. This schema lets the model generate the correct function call every time and interpret the API response. For simpler discovery and connection to tools, open standards like the Model Context Protocol (MCP) have become popular because they are more convenient 16 . Additionally, a few models have native tools, like Gemini with native Google Search, where the function invocation happens as part of the LM call itself 17 . The Orchestration Layer If the model is the agent's brain and the tools are its hands, the orchestration layer is the central nervous system that connects them. It is the engine that runs the "Think, Act, Observe" loop, the state machine that governs the agent's behavior, and the place where a developer's carefully crafted logic comes to life. This layer is not just plumbing; it is the conductor of the entire agentic symphony, deciding when the model should reason, which tool should act, and how the results of that action should inform the next movement. November 2025 22
23. Introduction to Agents and Agent architectures Core Design Choices The first architectural decision is determining the agent's degree of autonomy. The choice exists on a spectrum. At one end, you have deterministic, predictable workflows that call an LM as a tool for a specific task—a sprinkle of AI to augment an existing process. At the other end, you have the LM in the driver's seat, dynamically adapting, planning and executing tasks to achieve a goal. A parallel choice is the implementation method. No-code builders offer speed and accessibility, empowering business users to automate structured tasks and build simple agents rapidly. For more complex, mission-critical systems, code-first frameworks, such as Google's Agent Development Kit (ADK) 18 , provide the deep control, customizability, and integration capabilities that engineers require. Regardless of the approach, a production-grade framework is essential. It must be open, allowing you to plug in any model or tool to prevent vendor lock-in. It must provide precise control, enabling a hybrid approach where the non-deterministic reasoning of an LM is governed by hard-coded business rules. Most importantly, the framework must be built for observability. When an agent behaves unexpectedly, you cannot simply put a breakpoint in the model's "thought." A robust framework generates detailed traces and logs, exposing the entire reasoning trajectory: the model's internal monologue, the tool it chose, the parameters it generated, and the result it observed. Instruct with Domain Knowledge and Persona Within this framework, the developer's most powerful lever is to instruct the agent with domain knowledge and a distinct persona. This is accomplished through a system prompt or a set of core instructions. This isn't just a simple command; it is the agent's constitution. November 2025 23
24. Introduction to Agents and Agent architectures Here, you tell it, You are a helpful customer support agent for Acme Corp, ... and provide constraints, desired output schema, rules of engagement, a specific tone of voice, and explicit guidance on when and why it should use its tools. A few example scenarios in the instructions are usually very effective. Augment with Context The agent's “memory” is orchestrated into the LM context window at runtime. For a more complete deep dive see the agent memory focused whitepaper in this series. Short-term memory is the agent's active "scratchpad," maintaining the running history of the current conversation. It tracks the sequence of (Action, Observation) pairs from the ongoing loop, providing the immediate context the model needs to decide what to do next. This may be implemented as abstractions like state, artifacts, sessions or threads. Long-term memory provides persistence across sessions. Architecturally, this is almost always implemented as another specialized tool—a RAG system connected to a vector database or search engine. The orchestrator gives the agent the ability to pre-fetch and to actively query its own history, allowing it to "remember" a user's preferences or the outcome of a similar task from weeks ago for a truly personalized and continuous experience. 19 Multi-Agent Systems and Design Patterns As tasks grow in complexity, building a single, all-powerful "super-agent" becomes inefficient. The more effective solution is to adopt a "team of specialists" approach, which mirrors a human organization. This is the core of a multi-agent system: a complex process is November 2025 24
25. Introduction to Agents and Agent architectures segmented into discrete sub-tasks, and each is assigned to a dedicated, specialized AI agent. This division of labor allows each agent to be simpler, more focused, and easier to build, test, and maintain, which is ideal for dynamic or long-running business processes. Architects may rely on proven agentic design patterns, though agent capabilities and thus patterns are evolving rapidly. 20 For dynamic or non-linear tasks, the Coordinator pattern is essential. It introduces a "manager" agent that analyzes a complex request, segments the primary task, and intelligently routes each sub-task to the appropriate specialist agent (like a researcher, a writer, or a coder). The coordinator then aggregates the responses from each specialist to formulate a final, comprehensive answer. Figure 3: The “iterative refinement” pattern from https://cloud.google.com/architecture/choose-design-pattern-agentic-ai-system For more linear workflows, the Sequential pattern is a better fit, acting like a digital assembly line where the output from one agent becomes the direct input for the next. Other key patterns focus on quality and safety. The Iterative Refinement pattern creates a feedback loop, using a "generator" agent to create content and a "critic" agent to evaluate it against November 2025 25
26. Introduction to Agents and Agent architectures quality standards. For high-stakes tasks, the Human-in-the-Loop (HITL) pattern is critical, creating a deliberate pause in the workflow to get approval from a person before an agent takes a significant action. Agent Deployment and Services After you have built a local agent, you will want to deploy it to a server where it runs all the time and where other people and agents can use it. Continuing our analogy, deployment and services would be the body and legs for our agent. An agent requires several services to be effective, session history and memory persistence, and more. As an agent builder, you will also be responsible for deciding what you log, and what security measures you take for data privacy and data residency and regulation compliance. All of these services are in scope, when deploying agents to production. Luckily, agent builders can rely on decades of application hosting infrastructure. Agents are a new form of software after all and many of the same principles apply. Builders can rely on purpose-built, agent specific, deployment options like Vertex AI Agent Engine which support runtime and everything else in one platform 21 . For software developers who want to control their application stacks more directly, or deploy agents within their existing DevOps infrastructure, any agent and most agent services can be added to a docker container and deployed onto industry standard runtimes like Cloud Run or GKE 22 . November 2025 26
27. Introduction to Agents and Agent architectures Figure 4: Vertex AI Agent builder from https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/overview If you are not a software developer and a DevOps expert, the process of deploying your first agent might be daunting. Many agent frameworks make this easy with a deploy command or a dedicated platform to deploy the agent, and these should be used for early exploration and onboarding. Ramping up to a secure and production ready environment will usually require a bigger investment of time and application of best practices, including CI/CD and automated testing for your agents 23 . Agent Ops: A Structured Approach to the Unpredictable As you build your first agents, you will be manually testing the behavior, over and over again. When you add a feature, does it work? When you fix a bug, did you cause a different problem? Testing is normal for software development but it works differently with generative AI. November 2025 27
28. Introduction to Agents and Agent architectures The transition from traditional, deterministic software to stochastic, agentic systems requires a new operational philosophy. Traditional software unit tests could simply assert output == expected; but that doesn’t work when an agent's response is probabilistic by design. Also, because language is complicated, it usually requires a LM to evaluate “quality” – that the agent's response does all of what it should, nothing it shouldn’t, and with proper tone. Figure 5: Relationships between the operational domains of DevOps, MLOps, and GenAIOps from https://medium.com/@sokratis.kartakis/genai-in-production-mlops-or-genaiops-25691c9becd0 Agent Ops is the disciplined, structured approach to managing this new reality. It is a natural evolution of DevOps and MLOps, tailored for the unique challenges of building, deploying, and governing AI agents, turning unpredictability from a liability into a managed, measurable, and reliable feature. 24 For a more complete deep dive see the agent quality focused whitepaper in this series. November 2025 28
29. Introduction to Agents and Agent architectures Measure What Matters: Instrumenting Success Like an A/B Experiment Before you can improve your agent, you must define what "better" means in the context of your business. Frame your observability strategy like an A/B test and ask yourself: what are the Key Performance Indicators (KPIs) that prove the agent is delivering value? These metrics should go beyond technical correctness and measure real-world impact: goal completion rates, user satisfaction scores, task latency, operational cost per interaction, and—most importantly—the impact on business goals like revenue, conversion or customer retention. This top-down view will guide the rest of your testing, puts you on the path to metrics driven development, and will let you calculate a return on investment. Quality Instead of Pass/Fail: Using a LM Judge Business metrics don't tell you if the agent is behaving correctly. Since a simple pass/fail is impossible, we shift to evaluating for quality using an "LM as Judge." This involves using a powerful model to assess the agent's output against a predefined rubric: Did it give the right answer? Was the response factually grounded? Did it follow instructions? This automated evaluation, run against a golden dataset of prompts, provides a consistent measure of quality. Creating the evaluation datasets—which include the ideal (or "golden") questions and correct responses—can be a tedious process. To build these, you should sample scenarios from existing production or development interactions with the agent. The dataset must cover the full breadth of use cases that you expect your users to engage with, plus a few unexpected ones. While investment in evaluation pays off quickly, evaluation results should always be November 2025 29
30. Introduction to Agents and Agent architectures reviewed by a domain expert before being accepted as valid. Increasingly, the curation and maintenance of these evaluations is becoming a key responsibility for Product Managers with the support from Domain experts. Metrics-Driven Development: Your Go/No-Go for Deployment Once you have automated dozens of evaluation scenarios and established trusted quality scores, you can confidently test changes to your development agent. The process is simple: run the new version against the entire evaluation dataset, and directly compare its scores to the existing production version. This robust system eliminates guesswork, ensuring you are confident in every deployment. While automated evaluations are critical, don't forget other important factors like latency, cost, and task success rates. For maximum safety, use A/B deployments to slowly roll out new versions and compare these real-world production metrics alongside your simulation scores. Debug with OpenTelemetry Traces: Answering "Why?" When your metrics dip or a user reports a bug, you need to understand "why." An OpenTelemetry trace is a high-fidelity, step-by-step recording of the agent's entire execution path (trajectory), allowing you to debug the agent's steps. 25 With traces, you can see the exact prompt sent to the model, the model's internal reasoning (if available), the specific tool it chose to call, the precise parameters it generated for that tool, and the raw data that came back as an observation. Traces can be complicated the first time you look at them but they provide the details needed to diagnose and fix the root cause of any issue. Important trace details may be turned into metrics, but reviewing traces is primarily for debugging, not November 2025 30
31. Introduction to Agents and Agent architectures overviews of performance. Trace data can be seamlessly collected in platforms like Google Cloud Trace, which visualize and search across vast quantities of traces, streamlining root cause analysis. Cherish Human Feedback: Guiding Your Automation Human feedback is not an annoyance to be dealt with; it is the most valuable and data- rich resource you have for improving your agent. When a user files a bug report or clicks the "thumbs down" button, they are giving you a gift: a new, real-world edge case that your automated eval scenarios missed. Collecting and aggregating this data is critical; when you see a statistically significant number of similar reports or metric dips, you must tie the occurrences back to your analytics platform to generate insights and trigger alerts for operational issues. An effective Agent Ops process "closes the loop" by capturing this feedback, replicating the issue, and converting that specific scenario into a new, permanent test case in your evaluation dataset. This ensures you not only fix the bug but also vaccinate the system against that entire class of error ever happening again. Agent Interoperability Once you build your high quality agents, you want to be able to interconnect them with users and other agents. In our body parts analogy, this would be the face of the Agent. There is a difference between connecting to agents versus connecting agents with data and APIs; Agents are not tools 26 . Let’s assume you already have tools wired into your agents, now let’s consider how you bring your agents into a wider ecosystem. November 2025 31
32. Introduction to Agents and Agent architectures Agents and Humans The most common form of agent-human interaction is through a user interface. In its simplest form, this is a chatbot, where a user types a request and the agent, acting as a backend service, processes it and returns a block of text. More advanced agents can provide structured data, like JSON, to power rich, dynamic front-end experiences. Human in the loop (HITL) interaction patterns include intent refinement, goal expansion, confirmation, and clarification requests. Computer use is a category of tool where the LM takes control of a user interface, often with human interaction and oversight. A computer use enabled agent can decide that the next best action is to navigate to a new page, highlight a specific button, or pre-fill a form with relevant information 27 . Instead of an agent using an interface on behalf of the user, the LM can change the UI to meet the needs of the moment. This can be done with Tools which control UI (MCP UI) 28 , or specialized UI messaging systems which can sync client state with an agent (AG UI) 29 , and even generation of bespoke interfaces (A2UI) 30 . Of course, human interaction is not limited to screens and keyboards. Advanced agents are breaking the text barrier and moving into real-time, multimodal communication with "live mode" creating a more natural, human-like connection. Technologies like the Gemini Live API 31 enable bidirectional streaming, allowing a user to speak to an agent and interrupt it, just as they would in a natural conversation. This capability fundamentally changes the nature of agent-human collaboration. With access to a device's camera and microphone, the agent can see what the user sees and hear what they say, responding with generated speech at a latency that mimics human conversation. November 2025 32
33. Introduction to Agents and Agent architectures This opens up a vast array of use cases that are simply impossible with text, from a technician receiving hands-free guidance while repairing a piece of equipment to a shopper getting real-time style advice. It makes the agent a more intuitive and accessible partner. Agents and Agents Just as agents must connect with humans, they must also connect with each other. As an enterprise scales its use of AI, different teams will build different specialized agents. Without a common standard, connecting them would require building a tangled web of brittle, custom API integrations that are impossible to maintain. The core challenge is twofold: discovery (how does my agent find other agents and know what they can do?) and communication (how do we ensure they speak the same language?). The Agent2Agent (A2A) protocol is the open standard designed to solve this problem. It acts as a universal handshake for the agentic economy. A2A allows any agent to publish a digital "business card," known as an Agent Card. This simple JSON file advertises the agent's capabilities, its network endpoint, and the security credentials required to interact with it. This makes discovery simple and standardized. As opposed to MCP which focuses on solving transactional requests, Agent 2 Agent communication is typically for additional problem solving. Once discovered, agents communicate using a task-oriented architecture. Instead of a simple request-response, interactions are framed as asynchronous "tasks." A client agent sends a task request to a server agent, which can then provide streaming updates as it works on the problem over a long-running connection. This robust, standardized communication protocol is the final piece of the puzzle, enabling the collaborative, Level 3 multi-agent systems that represent the frontier of automation. A2A transforms a collection of isolated agents into a true, interoperable ecosystem. November 2025 33
34. Introduction to Agents and Agent architectures Agents and Money As AI agents do more tasks for us, a few of those tasks involve buying or selling, negotiating or facilitating transactions. The current web is built for humans clicking "buy," the responsibility is on the human. If an autonomous agent clicks “buy” it creates a crisis of trust – if something goes wrong, who is at fault? These are complex issues of authorization, authenticity, and accountability. To unlock a true agentic economy, we need new standards that allow agents to transact securely and reliably on behalf of their users. This emerging area is far from established, but two key protocols are paving the way. The Agent Payments Protocol (AP2) is an open protocol designed to be the definitive language for agentic commerce. It extends protocols like A2A by introducing cryptographically-signed digital "mandates." These act as verifiable proof of user intent, creating a non-repudiable audit trail for every transaction. This allows an agent to securely browse, negotiate, and transact on a global scale based on delegated authority from the user. Complementing this is x402, an open internet payment protocol that uses the standard HTTP 402 "Payment Required" status code. It enables frictionless, machine-to-machine micropayments, allowing an agent to pay for things like API access or digital content on a pay-per-use basis without needing complex accounts or subscriptions. Together, these protocols are building the foundational trust layer for the agentic web. Securing a Single Agent: The Trust Trade-Off When you create your first AI agent, you immediately face a fundamental tension: the trade-off between utility and security. To make an agent useful, you must give it power—the autonomy to make decisions and the tools to perform actions like sending emails or querying databases. However, every ounce of power you grant introduces a corresponding measure of risk. The primary security concerns are rogue actions—unintended or harmful behaviors— November 2025 34
35. Introduction to Agents and Agent architectures and sensitive data disclosure. You want to give your agent a leash long enough to do its job, but short enough to keep it from running into traffic, especially when that traffic involves irreversible actions or your company's private data. 32 To manage this, you cannot rely solely on the AI model's judgment, as it can be manipulated by techniques like prompt injection 33 . Instead, the best practice is a hybrid, defense-in- depth approach. 34 The first layer consists of traditional, deterministic guardrails—a set of hardcoded rules that act as a security chokepoint outside the model's reasoning. This could be a policy engine that blocks any purchase over $100 or requires explicit user confirmation before the agent can interact with an external API. This layer provides predictable, auditable hard limits on the agent's power. The second layer leverages reasoning-based defenses, using AI to help secure AI. This involves training the model to be more resilient to attacks (adversarial training) and employing smaller, specialized "guard models" that act like security analysts. These models can examine the agent's proposed plan before it's executed, flagging potentially risky or policy-violating steps for review. This hybrid model, combining the rigid certainty of code with the contextual awareness of AI, creates a robust security posture for even a single agent, ensuring its power is always aligned with its purpose. Agent Identity: A New Class of Principal In the traditional security model, there are human users which might use OAuth or SSO, and there are services which use IAM or service accounts. Agents add a 3rd category of principle. An agent is not merely a piece of code; it is an autonomous actor, a new kind of principal that requires its own verifiable identity. Just as employees are issued an ID badge, each agent on the platform must be issued a secure, verifiable "digital passport." This Agent November 2025 35
36. Introduction to Agents and Agent architectures Identity is distinct from the identity of the user who invoked it and the developer who built it. This is a fundamental shift in how we must approach Identity and Access Management (IAM) in the enterprise. Having each identity be verified and having access controls for all of them, is the bedrock of agent security. Once an agent has a cryptographically verifiable identity (often using standards like SPIFFE 35 ), it can be granted its own specific, least-privilege permissions. The SalesAgent is granted read/write access to the CRM, while the HRonboardingAgent is explicitly denied. This granular control is critical. It ensures that even if a single agent is compromised or behaves unexpectedly, the potential blast radius is contained. Without an agent identity construct, agents cannot work on behalf of humans with limited delegated authority. Principal entity Authentication / Verification Notes Users Authenticated with OAuth Human actors with full or SSO autonomy and responsibility for their actions Agents (new category of principles) Verified with SPIFFE Agents have delegated authority, taking actions on behalf of users Service accounts Integrated into IAM Applications and containers, fully deterministic, no responsible for actions Table 1: A non-exhaustive example of different categories of actors for authentication November 2025 36
37. Introduction to Agents and Agent architectures Policies to Constrain Access A policy is a form of authorization (AuthZ), distinct from authentication (AuthN). Typically, policies limit the capabilities of a principal; for example, “Users in Marketing can only access these 27 API endpoints and cannot execute DELETE commands.” As we develop agents, we need to apply permissions to the agents, their tools, other internal agents, context they can share, and remote agents. Think about it this way: if you add all the APIs, data, tools, and agents to your system, then you must constrain access to a subset of just those capabilities required to get their jobs done. This is the recommended approach: applying the principle of least privilege while remaining contextually relevant. 36 Securing an ADK Agent With the core principles of identity and policy established, securing an agent built with the Agent Development Kit (ADK) becomes a practical exercise in applying those concepts through code and configuration 37 . As described above, the process requires a clear definition of identities: user account (for example OAuth), service account (to run code), agent identity (to use delegated authority). Once authentication is handled, the next layer of defense involves establishing policies to constrain access to services. This is often done at the API governance layer, along with governance supporting MCP and A2A services. The next layer is building guardrails into your tools, models, and sub-agents to enforce policies. This ensures that no matter what the LM reasons or what a malicious prompt might suggest, the tool's own logic will refuse to execute an unsafe or out-of-policy action. This approach provides a predictable and auditable security baseline, translating abstract security policies into concrete, reliable code 38 . November 2025 37
38. Introduction to Agents and Agent architectures For more dynamic security that can adapt to the agent's runtime behavior, ADK provides Callbacks and Plugins. A before_tool_callback allows you to inspect the parameters of a tool call before it runs, validating them against the agent's current state to prevent misaligned actions. For more reusable policies, you can build plugins. A common pattern is a "Gemini as a Judge" 39 that uses a fast, inexpensive model like Gemini Flash-Lite or your own fine-tuned Gemma model to screen user inputs and agent outputs for prompt injections or harmful content in real time. For organizations that prefer a fully managed, enterprise-grade solution for these dynamic checks, Model Armor can be integrated as an optional service. Model Armor acts as a specialized security layer that screens prompts and responses for a wide range of threats, including prompt injection, jailbreak attempts, sensitive data (PII) leakage, and malicious URLs 40 . By offloading these complex security tasks to a dedicated service, developers can ensure consistent, robust protection without having to build and maintain these guardrails themselves. This hybrid approach within ADK—combining strong identity, deterministic in-tool logic, dynamic AI-powered guardrails, and optional managed services like Model Armor—is how you build a single agent that is both powerful and trustworthy. November 2025 38
39. Introduction to Agents and Agent architectures Figure 6: Security and Agents from https://saif.google/focus-on-agents Scaling Up from a Single Agent to an Enterprise Fleet The production success of a single AI agent is a triumph. Scaling to a fleet of hundreds is a challenge of architecture. If you are building one or two agents, your concerns are primarily about security. If you are building many agents, you must design systems to handle much more. Just like API sprawl, when agents and tools proliferate across an organization, November 2025 39
40. Introduction to Agents and Agent architectures they create a new, complex network of interactions, data flows, and potential security vulnerabilities. Managing this complexity requires a higher-order governance layer integrating all your identities and policies and reporting into a central control plane. Security and Privacy: Hardening the Agentic Frontier An enterprise-grade platform must address the unique security and privacy challenges inherent to generative AI, even when only a single agent is running. The agent itself becomes a new attack vector. Malicious actors can attempt prompt injection to hijack the agent's instructions, or data poisoning to corrupt the information it uses for training or RAG. Furthermore, a poorly constrained agent could inadvertently leak sensitive customer data or proprietary information in its responses. A robust platform provides a defense-in-depth strategy to mitigate these risks. It starts with the data, ensuring that an enterprise's proprietary information is never used to train base models and is protected by controls like VPC Service Controls. It requires input and output filtering, acting like a firewall for prompts and responses. Finally, the platform must offer contractual protections like intellectual property indemnity for both the training data and the generated output, giving enterprises the legal and technical confidence they need to deploy agents in production. Agent Governance: A Control Plane instead of Sprawl As agents and their tools proliferate across an organization, they create a new, complex network of interactions and potential vulnerabilities, a challenge often called "agent sprawl." Managing this requires moving beyond securing individual agents to implementing a higher- order architectural approach: a central gateway that serves as a control plane for all agentic activity. November 2025 40
41. Introduction to Agents and Agent architectures Imagine a bustling metropolis with thousands of autonomous vehicles—users, agents, and tools—all moving with purpose. Without traffic lights, license plates and a central control system, chaos would reign. The gateway approach creates that control system, establishing a mandatory entry point for all agentic traffic, including user-to-agent prompts or UI interactions, agent-to-tool calls (via MCP), agent-to-agent collaborations (via A2A), and direct inference requests to LMs. By sitting at this critical intersection, an organization can inspect, route, monitor, and manage every interaction. This control plane serves two primary, interconnected functions: 1. Runtime Policy Enforcement: It acts as the architectural chokepoint for implementing security. It handles authentication ("Do I know who this actor is?") and authorization ("Do they have permission to do this?"). Centralizing enforcement provides a "single pane of glass" for observability, creating common logs, metrics, and traces for every transaction. This transforms the spaghetti of disparate agents and workflows into a transparent and auditable system. 2. Centralized Governance: To enforce policies effectively, the gateway needs a source of truth. This is provided by a central registry—an enterprise app store for agents and tools. This registry allows developers to discover and reuse existing assets, preventing redundant work, while giving administrators a complete inventory. More importantly, it enables a formal lifecycle for agents and tools, allowing for security reviews before publication, versioning, and the creation of fine-grained policies that dictate which business units can access which agents. By combining a runtime gateway with a central governance registry, an organization transforms the risk of chaotic sprawl into a managed, secure, and efficient ecosystem. November 2025 41
42. Introduction to Agents and Agent architectures Cost and Reliability: The Infrastructure Foundation Ultimately, enterprise-grade agents must be both reliable and cost-effective. An agent that frequently fails or provides slow results has a negative ROI. Conversely, an agent that is prohibitively expensive cannot scale to meet business demands. The underlying infrastructure must be designed to manage this trade-off, securely and with regulatory and data sovereignty compliance. In some cases, the feature you need is scale-to-zero, when you have irregular traffic to a specific agent or sub-function. For mission-critical, latency-sensitive workloads, the platform must offer dedicated, guaranteed capacity, such as Provisioned Throughput 41 for LM services or 99.9% Service Level Agreements (SLAs) for runtimes like Cloud Run 42 . This provides a predictable performance, ensuring that your most important agents are always responsive, even under heavy load. By providing this spectrum of infrastructure options, coupled with comprehensive monitoring for both cost and performance, you establish the final, essential foundation for scaling AI agents from a promising innovation into a core, reliable component of the enterprise. How agents evolve and learn Agents deployed in the real world operate in dynamic environments where policies, technologies, and data formats are constantly changing. Without the ability to adapt, an agent's performance will degrade over time—a process often called "aging"—leading to a loss of utility and trust. Manually updating a large fleet of agents to keep pace with these changes is uneconomical and slow. A more scalable solution is to design agents that can learn and evolve autonomously, improving their quality on the job with minimal engineering effort. 43 November 2025 42
43. Introduction to Agents and Agent architectures How agents learn and self evolve Much like humans, agents learn from experience and external signals. This learning process is fueled by several sources of information: • Runtime Experience: Agents learn from runtime artifacts such as session logs, traces, and memory, which capture successes, failures, tool interactions, and decision trajectories. Crucially, this includes Human-in-the-Loop (HITL) feedback, which provides authoritative corrections and guidance. • External Signals: Learning is also driven by new external documents, such as updated enterprise policies, public regulatory guidelines, or critiques from other agents. This information is then used to optimize the agent’s future behavior. Instead of simply summarizing past interactions, advanced systems create generalizable artifacts to guide future tasks. The most successful adaptation techniques fall into two categories: • Enhanced Context Engineering: The system continuously refines its prompts, few- shot examples, and the information it retrieves from memory. By optimizing the context provided to the LM for each task, it increases the likelihood of success. • Tool Optimization and Creation: The agent’s reasoning can identify gaps in its capabilities and act to fill them. This can involve gaining access to a new tool, creating a new one on the fly (e.g., a Python script), or modifying an existing tool (e.g., updating an API schema). Additional optimization techniques, such as dynamically reconfiguring multi-agent design patterns or using Reinforcement Learning from Human Feedback (RLHF), are active areas of research. November 2025 43
44. Introduction to Agents and Agent architectures Example: Learning New Compliance Guidelines Consider an enterprise agent operating in a heavily regulated industry like finance or life sciences. Its task is to generate reports that must comply with privacy and regulatory rules (e.g., GDPR). This can be implemented using a multi-agent workflow: 1. A Querying Agent retrieves raw data in response to a user request. 2. A Reporting Agent synthesizes this data into a draft report. 3. A Critiquing Agent, armed with known compliance guidelines, reviews the report. If it encounters ambiguity or requires final sign-off, it escalates to a human domain expert. 4. A Learning Agent observes the entire interaction, paying special attention to the corrective feedback from the human expert. It then generalizes this feedback into a new, reusable guideline (e.g., an updated rule for the critiquing agent or refined context for the reporting agent). November 2025 44
45. Introduction to Agents and Agent architectures Figure 7: Sample multi agent workflow for compliance guidelines For instance, if a human expert flags that certain household statistics must be anonymized, the Learning Agent records this correction. The next time a similar report is generated, the Critiquing Agent will automatically apply this new rule, reducing the need for human intervention. This loop of critique, human feedback, and generalization allows the system to autonomously adapt to evolving compliance requirements 44 . November 2025 45
46. Introduction to Agents and Agent architectures Simulation and Agent Gym - the next frontier The design pattern we presented can be categorized as in-line learning, where agents need to learn with the resources and design pattern they were engineered with. More advanced approaches are now being researched, where there is a dedicated platform that is engineered to optimize the multi-agent system in offline processes with advanced tooling and capabilities, which are not part of the multi-agent run-time environment. The key attributes of such an Agent Gym 45 are: 1. It is not in the execution path. It is a standalone off-production platform, and therefore can have the assistance of any LM model, and offline tools, cloud application and more 2. It offers a simulation environment, so the agent can ‘exercise’ on new data and learn. This simulation environment is excellent for ‘trial-and-error’ with many optimizations pathways 3. It can call advance synthetic data generators, which guide the simulation to be as real as possible, and pressure test the agent (this can include advance techniques, such as red- teaming, dynamic evaluation and a family of critiquing agents) 4. The arsenal of the optimization tools is not fixed, and it can adopt new tools (either through open protocols such as MCP or A2A), or in a more advanced setting - learn new concepts and craft tools around them 5. Finally, even constructs such as Agent Gym, may not be able to overcome certain edge- case (due to the well known problem of ‘tribal knowledge’ in the enterprise). In those cases we see the Agent Gym able to connect to the human fabric of domain experts, and consult with them on the right set of outcomes to guide the next set of optimizations November 2025 46
47. Introduction to Agents and Agent architectures Examples of advanced agents Google Co-Scientist Co-Scientist is an advanced AI agent designed to function as a virtual research collaborator, accelerating scientific discovery by systematically exploring complex problem spaces. It enables researchers to define a goal, ground the agent in specified public and proprietary knowledge sources, and then generate and evaluate a landscape of novel hypotheses. In order to be able to achieve this, Co-Scientist spawns a whole ecosystem of agents collaborating with each other. Figure 8: The AI co-scientist design system November 2025 47
48. Introduction to Agents and Agent architectures Think of the system as a research project manager. The AI first takes a broad research goal and creates a detailed project plan. A "Supervisor" agent then acts as the manager, delegating tasks to a team of specialized agents and distributing resources like computing power. This structure ensures the project can easily scale up and that the team's methods improve as they work toward the final goal. Figure 9: Co-scientist multi agent workflow The various agents work for hours, or even days, and keep improving the generated hypotheses, running loops and meta loops that improve not only the generated ideas, but also the way that we judge and create new ideas. November 2025 48
49. Introduction to Agents and Agent architectures AlphaEvolve Agent Another example of an advanced agentis system is AlphaEvolve, an AI agent that discovers and optimizes algorithms for complex problems in mathematics and computer science. AlphaEvolve works by combining the creative code generation of our Gemini language models with an automated evaluation system. It uses an evolutionary process: the AI generates potential solutions, an evaluator scores them, and the most promising ideas are used as inspiration for the next generation of code. This approach has already led to significant breakthroughs, including: • Improving the efficiency of Google's data centers, chip design, and AI training. • Discovering faster matrix multiplication algorithms. • Finding new solutions to open mathematical problems. AlphaEvolve excels at problems where verifying the quality of a solution is far easier than finding it in the first place. Figure 10: Alpha Evolve design system November 2025 49
50. Introduction to Agents and Agent architectures AlphaEvolve is designed for a deep, iterative partnership between humans and AI. This collaboration works in two main ways: • Transparent Solutions: The AI generates solutions as human-readable code. This transparency allows users to understand the logic, gain insights, trust the results, and directly modify the code for their needs. • Expert Guidance: Human expertise is essential for defining the problem. Users guide the AI by refining evaluation metrics and steering the exploration, which prevents the system from exploiting unintended loopholes in the problem's definition. This interactive loop ensures the final solutions are both powerful and practical. The result of the agent is a continuous improvement of the code that keeps improving the metrics specified by the human. Figure 11: Algorithm evolution November 2025 50
51. Introduction to Agents and Agent architectures Conclusion Generative AI agents mark a pivotal evolution, shifting artificial intelligence from a passive tool for content creation to an active, autonomous partner in problem-solving. This document has provided a formal framework for understanding and building these systems, moving beyond the prototype to establish a reliable, production-grade architecture. We have deconstructed the agent into its three essential components: the reasoning Model (the "Brain"), the actionable Tools (the "Hands"), and the governing Orchestration Layer (the "Nervous System"). It is the seamless integration of these parts, operating in a continuous "Think, Act, Observe" loop, that unlocks an agent's true potential. By classifying agentic systems- from the Level 1 Connected Problem-Solver to the Level 3 Collaborative Multi-Agent System -architects and product leaders can now strategically scope their ambitions to match the complexity of the task at hand. The central challenge, and opportunity, lies in a new developer paradigm. We are no longer simply "bricklayers" defining explicit logic; we are "architects" and "directors" who must guide, constrain, and debug an autonomous entity. The flexibility that makes LMs so powerful is also the source of their unreliability. Success, therefore, is not found in the initial prompt alone, but in the engineering rigor applied to the entire system: in robust tool contracts, resilient error handling, sophisticated context management, and comprehensive evaluation. The principles and architectural patterns outlined here serve as a foundational blueprint. They are the guideposts for navigating this new frontier of software, enabling us to build not just "workflow automation," but truly collaborative, capable, and adaptable new members of our teams. As this technology matures, this disciplined, architectural approach will be the deciding factor in harnessing the full power of agentic AI. November 2025 51
52. Introduction to Agents and Agent architectures Endnotes 1. Julia Wiesinger, Patrick Marlow, et al. 2024 “Agents”. Available at: https://www.kaggle.com/whitepaper-agents. 2. Antonio Gulli, Lavi Nigam, et al. 2025 “Agents Companion”. Available at: https://www.kaggle.com/whitepaper-agent-companion. 3. Shunyu Yao, Y. et al., 2022, 'ReAct: Synergizing Reasoning and Acting in Language Models'. Available at: https://arxiv.org/abs/2210.03629. 4. Wei, J., Wang, X. et al., 2023, 'Chain-of-Thought Prompting Elicits Reasoning in Large Language Models'. Available at: https://arxiv.org/pdf/2201.11903.pdf. 5. Shunyu Yao, Y. et al., 2022, 'ReAct: Synergizing Reasoning and Acting in Language Models'. Available at: https://arxiv.org/abs/2210.03629. 6. https://www.amazon.com/Agentic-Design-Patterns-Hands-Intelligent/dp/3032014018 7. Shunyu Yao, et. al., 2024, ‘τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains’, Available at: https://arxiv.org/abs/2406.12045. 8. https://artificialanalysis.ai/guide 9. https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/vertex-ai-model-optimizer 10. https://gemini.google/overview/gemini-live/ 11. https://cloud.google.com/vision?e=48754805&hl=en 12. https://cloud.google.com/speech-to-text?e=48754805&hl=en 13. https://medium.com/google-cloud/genaiops-operationalize-generative-ai-a- practical-guide-d5bedaa59d78 14. https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/code-execution/overview 15. https://ai.google.dev/gemini-api/docs/function-calling 16. https://github.com/modelcontextprotocol/ 17. https://ai.google.dev/gemini-api/docs/google-search November 2025 52
53. Introduction to Agents and Agent architectures 18. https://google.github.io/adk-docs/ 19. https://google.github.io/adk-docs/sessions/memory/ 20. https://cloud.google.com/architecture/choose-design-pattern-agentic-ai-system 21. https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/overview 22. https://cloud.google.com/kubernetes-engine/docs/concepts/gke-and-cloud-run 23. https://github.com/GoogleCloudPlatform/agent-starter-pack 24. Sokratis Kartakis, 2024, ‘GenAI in Production: MLOps or GenAIOps?’. Available at: https://medium.com/ google-cloud/genai-in-production-mlops-or-genaiops-25691c9becd0. 25. Guangya Liu, Sujay Solomon, March 2025 “AI Agent Observability - Evolving Standards and Best Practice”. Available at: https://opentelemetry.io/blog/2025/ai-agent-observability/. 26. https://discuss.google.dev/t/agents-are-not-tools/192812 27. Damien Masson et. al, 2024, ‘DirectGPT: A Direct Manipulation Interface to Interact with Large Language Models’. Available at: https://arxiv.org/abs/2310.03691. 28. MCP UI is a system of controlling UI via MCP tools https://mcpui.dev/. 29. AG UI is a protocol of controlling UI via event passing and optionally shared state https://ag-ui.com/. 30. A2UI is a protocol of generating UI via structured output and A2A message passing https://github.com/google/A2UI. 31. https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-5-flash-live-api. 32. https://saif.google/focus-on-agents. 33. https://simonwillison.net/series/prompt-injection/. 34. https://storage.googleapis.com/gweb-research2023-media/pubtools/1018686.pdf. 35. https://spiffe.io/. 36. https://openreview.net/pdf?id=l9rATNBB8Y. 37. https://google.github.io/adk-docs/safety/. November 2025 53
54. Introduction to Agents and Agent architectures 38. https://google.github.io/adk-docs/callbacks/design-patterns-and-best-practices /#guardrails-policy-enforcement 39. TKTK 40. https://cloud.google.com/security-command-center/docs/model-armor-overview 41. https://cloud.google.com/vertex-ai/generative-ai/docs/provisioned-throughput/overview 42. https://cloud.google.com/run/sla 43. https://github.com/CharlesQ9/Self-Evolving-Agents 44. Juraj Gottweis, et. al., 2025, ‘Accelerating scientific breakthroughs with an AI co-scientist’. Available at: https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/. 45. Deepak Nathani et. al. 2025, ‘MLGym: A New Framework and Benchmark for Advancing AI Research Agents’, Available at: https://arxiv.org/abs/2502.14499. November 2025 54

trang chủ - Wiki
Copyright © 2011-2025 iteam. Current version is 2.148.0. UTC+08:00, 2025-11-14 15:45
浙ICP备14020137号-1 $bản đồ khách truy cập$