A Comprehensive Tutorial on the Five Levels of Agentic AI Architectures: From Basic Prompt Responses to Fully Autonomous Code Generation and Execution

Apr 26, 2025 by admin

In this tutorial, we explore five levels of Agentic Architectures, from the simplest language model calls to a fully autonomous code-generating system. This tutorial is designed to run seamlessly on Google Colab. Starting with a basic “simple processor” that simply echoes the model’s output, you will progressively build routing logic, integrate external tools, orchestrate multi-step workflows, and ultimately empower the model to plan, validate, refine, and execute its own Python code. Throughout each section, you’ll find detailed explanations, self-contained demo functions, and clear prompts that illustrate how to balance human control and machine autonomy in real-world AI applications.

Copy Code

import os
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
import re
import json
import time
import random
from IPython.display import clear_output

We import core Python and third-party libraries, including os and time for environment and execution control, torch, along with Hugging Face’s transformers (pipeline, AutoTokenizer, AutoModelForCausalLM) for model loading and inference. Also, we utilize re and json for parsing LLM outputs, random seeds, and mock data, while clear_output maintains a tidy Colab interface.

Copy Code

MODEL_NAME = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
def get_model_and_tokenizer():
    if not hasattr(get_model_and_tokenizer, "model"):
        print(f"Loading model {MODEL_NAME}...")
        tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
        model = AutoModelForCausalLM.from_pretrained(
            MODEL_NAME,
            torch_dtype=torch.float16,
            device_map="auto",
            low_cpu_mem_usage=True
        )
        get_model_and_tokenizer.model = model
        get_model_and_tokenizer.tokenizer = tokenizer
        print("Model loaded successfully!")
   
    return get_model_and_tokenizer.model, get_model_and_tokenizer.tokenizer

Here, we define MODEL_NAME to point at the TinyLlama 1.1B chat model and implement a lazy‐loading helper get_model_and_tokenizer() that downloads and initializes the tokenizer and model only once, caching them on first call to minimize overhead, and then returns the cached instances for all subsequent inference calls.

Copy Code

def get_model_and_tokenizer():
    if not hasattr(get_model_and_tokenizer, "model"):
        print(f"Loading model {MODEL_NAME}...")
        tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
        model = AutoModelForCausalLM.from_pretrained(
            MODEL_NAME,
            torch_dtype=torch.float16,
            device_map="auto",
            low_cpu_mem_usage=True
        )
        get_model_and_tokenizer.model = model
        get_model_and_tokenizer.tokenizer = tokenizer
        print("Model loaded successfully!")
   
    return get_model_and_tokenizer.model, get_model_and_tokenizer.tokenizer

This helper function implements a lazy-loading pattern for the TinyLlama model and its tokenizer. On the first call, it downloads and initializes both with half-precision and automatic device placement, caches them as attributes on the function object, and on subsequent calls, simply returns the already-loaded instances to avoid redundant overhead.

Copy Code

def generate_text(prompt, max_length=512):
    model, tokenizer = get_model_and_tokenizer()
   
    messages = [{"role": "user", "content": prompt}]
    formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)
   
    inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
   
    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_new_tokens=max_length,
            do_sample=True,
            temperature=0.7,
            top_p=0.9,
        )
   
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
   
    response = generated_text.split("ASSISTANT: ")[-1].strip()
    return response

The generate_text function wraps the TinyLlama inference workflow: it retrieves the cached model and tokenizer, formats the user prompt into the chat template, tokenizes and moves inputs to the model’s device, then samples a response with temperature and top-p settings. After generation, it decodes the output and extracts just the assistant’s reply by splitting on the “ASSISTANT: ” marker.

Level 1: Simple Processor

At the simplest level, the code defines a straightforward text‐generation pipeline that treats the model purely as a language processor. When the user provides a prompt, the `simple_processor` function invokes the `generate_text` helper, which is built on the TinyLlama 1.1B chat model, to produce a free-form response. It then displays that response directly. Under the hood, `generate_text` ensures the model and tokenizer are loaded just once by caching them inside the `get_model_and_tokenizer` function, formats the prompt for the chat model, runs generation with sampling parameters for diversity, and extracts the assistant’s reply by splitting on the “ASSISTANT:” marker. This level demonstrates the most basic interaction pattern: input is received, output is generated, and program flow remains entirely under human control.

Copy Code

def simple_processor(prompt):
    """Level 1: Simple Processor - Model has no impact on program flow"""
    response = generate_text(prompt)
    return response


def demo_level1():
    print("\n" + "="*50)
    print("LEVEL 1: SIMPLE PROCESSOR DEMO")
    print("="*50)
    print("At this level, the AI has no control over program flow.")
    print("It simply takes input and produces output.\n")
   
    user_input = input("Enter your question or prompt: ") or "Write a short poem about artificial intelligence."
    print("\nProcessing your request...\n")
   
    output = simple_processor(user_input)
    print("OUTPUT:")
    print("-"*50)
    print(output)
    print("-"*50)

The simple_processor function embodies the Simple Processor of our agent hierarchy by treating the model purely as a text generator; it accepts a user-provided prompt and delegates to generate_text. It returns whatever the model produces without any branching or decision logic. The accompanying demo_level1 routine provides a minimal interactive loop, printing a clear header, soliciting user input (with a sensible default), invoking simple_processor, and then displaying the raw output, showcasing the most basic prompt-to-response workflow in which the AI exerts no influence over the program’s flow.

Level 2: Router

The second level introduces conditional routing based on the model’s classification of the user’s query. The `router_agent` function first asks the model to classify a query into “technical,” “creative,” or “factual,” then normalizes the model’s response into one of those categories. Depending on which category is detected, the query is dispatched to a specialized handler, either `handle_technical_query`, `handle_creative_query`, or `handle_factual_query`, each of which wraps the user’s query in a system-style prompt tailored to the chosen tone and purpose. This routing mechanism provides the model with partial control over program flow, enabling it to guide the subsequent interaction path while still relying on human-defined handlers to generate the final output.

Copy Code

def router_agent(user_query):
    """Level 2: Router - Model determines basic program flow"""
   
    category_prompt = f"""Classify the following query into one of these categories:
    'technical', 'creative', or 'factual'.
   
    Query: {user_query}
   
    Return ONLY the category name and nothing else."""
   
    category_response = generate_text(category_prompt)
   
    category = category_response.lower()
    if "technical" in category:
        category = "technical"
    elif "creative" in category:
        category = "creative"
    else:
        category = "factual"
   
    print(f"Query classified as: {category}")
   
    if category == "technical":
        return handle_technical_query(user_query)
    elif category == "creative":
        return handle_creative_query(user_query)
    else:  
        return handle_factual_query(user_query)


def handle_technical_query(query):
    system_prompt = f"""You are a technical assistant. Provide detailed technical explanations.
   
    User query: {query}"""
   
    response = generate_text(system_prompt)
    return f"[Technical Response]\n{response}"


def handle_creative_query(query):
    system_prompt = f"""You are a creative assistant. Be imaginative and inspiring.
   
    User query: {query}"""
   
    response = generate_text(system_prompt)
    return f"[Creative Response]\n{response}"


def handle_factual_query(query):
    system_prompt = f"""You are a factual assistant. Provide accurate information concisely.
   
    User query: {query}"""
   
    response = generate_text(system_prompt)
    return f"[Factual Response]\n{response}"


def demo_level2():
    print("\n" + "="*50)
    print("LEVEL 2: ROUTER DEMO")
    print("="*50)
    print("At this level, the AI determines basic program flow.")
    print("It decides which processing path to take.\n")
   
    user_query = input("Enter your question or prompt: ") or "How do neural networks work?"
    print("\nProcessing your request...\n")
   
    result = router_agent(user_query)
    print("OUTPUT:")
    print("-"*50)
    print(result)
    print("-"*50)

The router_agent function implements Router behavior by first asking the model to classify the user’s query as “technical,” “creative,” or “factual,” then normalizing that classification and dispatching the query to the corresponding handler (handle_technical_query, handle_creative_query, or handle_factual_query), each of which wraps the original query in an appropriate system‐style prompt before calling generate_text. The demo_level2 routine provides a clear CLI-style interface, printing headers, accepting input (with a default), invoking router_agent, and displaying the categorized response, showcasing how the model can take basic control over program flow by choosing which processing path to follow.

Level 3: Tool Calling

At the third level, the code empowers the model to decide which of several external tools to invoke by embedding a JSON-based function selection protocol into the prompt. The `tool_calling_agent` presents the user’s question alongside a menu of potential tools, including weather lookup, web search simulation, current date and time retrieval, or direct response, and instructs the model to respond with a valid JSON message specifying the chosen tool and its parameters. A regex then extracts the first JSON object from the model’s output, and the code safely falls back to a direct response if parsing fails. Once the tool and arguments are identified, the corresponding Python function is executed, its result is captured, and a final model call integrates that result into a coherent answer. This pattern bridges LLM reasoning with concrete code execution by letting the model orchestrate which APIs or utilities to call.

Copy Code

def tool_calling_agent(user_query):
    """Level 3: Tool Calling - Model determines how functions are executed"""
   
    tool_selection_prompt = f"""Based on the user query, select the most appropriate tool from the following list:
    1. get_weather: Get the current weather for a location
    2. search_information: Search for specific information on a topic
    3. get_date_time: Get current date and time
    4. direct_response: Provide a direct response without using tools
   
    USER QUERY: {user_query}
   
    INSTRUCTIONS:
    - Return your response in valid JSON format
    - Include the tool name and any required parameters
    - For get_weather, include location parameter
    - For search_information, include query and depth parameter (basic or detailed)
    - For get_date_time, include timezone parameter (optional)
    - For direct_response, no parameters needed
   
    Example output format: {{"tool": "get_weather", "parameters": {{"location": "New York"}}}}"""
   
    tool_selection_response = generate_text(tool_selection_prompt)
   
    try:
        json_match = re.search(r'({.*})', tool_selection_response, re.DOTALL)
        if json_match:
            tool_selection = json.loads(json_match.group(1))
        else:
            print("Could not parse tool selection. Defaulting to direct response.")
            tool_selection = {"tool": "direct_response", "parameters": {}}
    except json.JSONDecodeError:
        print("Invalid JSON in tool selection. Defaulting to direct response.")
        tool_selection = {"tool": "direct_response", "parameters": {}}
   
    tool_name = tool_selection.get("tool", "direct_response")
    parameters = tool_selection.get("parameters", {})
   
    print(f"Selected tool: {tool_name}")
   
    if tool_name == "get_weather":
        location = parameters.get("location", "Unknown")
        tool_result = get_weather(location)
    elif tool_name == "search_information":
        query = parameters.get("query", user_query)
        depth = parameters.get("depth", "basic")
        tool_result = search_information(query, depth)
    elif tool_name == "get_date_time":
        timezone = parameters.get("timezone", "UTC")
        tool_result = get_date_time(timezone)
    else:
        return generate_text(f"Please provide a helpful response to: {user_query}")
   
    final_prompt = f"""User Query: {user_query}
    Tool Used: {tool_name}
    Tool Result: {json.dumps(tool_result)}
   
    Based on the user's query and the tool result above, provide a helpful response."""
   
    final_response = generate_text(final_prompt)
    return final_response


def get_weather(location):
    weather_conditions = ["Sunny", "Partly cloudy", "Overcast", "Light rain", "Heavy rain", "Thunderstorms", "Snowy", "Foggy"]
    temperatures = {
        "cold": list(range(-10, 10)),
        "mild": list(range(10, 25)),
        "hot": list(range(25, 40))
    }
   
    location_hash = sum(ord(c) for c in location)
    condition_index = location_hash % len(weather_conditions)
    season = ["winter", "spring", "summer", "fall"][location_hash % 4]
   
    temp_range = temperatures["cold"] if season in ["winter", "fall"] else temperatures["hot"] if season == "summer" else temperatures["mild"]
    temperature = random.choice(temp_range)
   
    return {
        "location": location,
        "temperature": f"{temperature}°C",
        "conditions": weather_conditions[condition_index],
        "humidity": f"{random.randint(30, 90)}%"
    }


def search_information(query, depth="basic"):
    mock_results = [
        f"First result about {query}",
        f"Second result discussing {query}",
        f"Third result analyzing {query}"
    ]
   
    if depth == "detailed":
        mock_results.extend([
            f"Fourth detailed analysis of {query}",
            f"Fifth comprehensive overview of {query}",
            f"Sixth academic paper on {query}"
        ])
   
    return {
        "query": query,
        "results": mock_results,
        "depth": depth,
        "sources": [f"source{i}.com" for i in range(1, len(mock_results) + 1)]
    }


def get_date_time(timezone="UTC"):
    current_time = time.strftime("%Y-%m-%d %H:%M:%S", time.gmtime())
    return {
        "current_datetime": current_time,
        "timezone": timezone
    }


def demo_level3():
    print("\n" + "="*50)
    print("LEVEL 3: TOOL CALLING DEMO")
    print("="*50)
    print("At this level, the AI selects which tools to use and with what parameters.")
    print("It can process the results from tools to create a final response.\n")
   
    user_query = input("Enter your question or prompt: ") or "What's the weather like in San Francisco?"
    print("\nProcessing your request...\n")
   
    result = tool_calling_agent(user_query)
    print("OUTPUT:")
    print("-"*50)
    print(result)
    print("-"*50)

In the Level 3 implementation, the tool_calling_agent function prompts the model to choose among a predefined set of utilities, such as weather lookup, mock web search, or date/time retrieval, by returning a JSON object with the selected tool name and its parameters. It then safely parses that JSON, invokes the corresponding Python function to obtain structured data, and makes a follow-up model call to integrate the tool’s output into a coherent, user-facing response.

Level 4: Multi-Step Agent

The fourth level extends the tool-calling pattern into a full multi-step agent that manages its workflow and state. The `MultiStepAgent` class maintains an internal memory of user inputs, tool outputs, and agent actions. Each iteration generates a planning prompt that summarizes the entire memory, asking the model to choose one of several tools, such as web search simulation, information extraction, text summarization, or report creation, or to conclude the task with a final output. After executing the selected tool and appending its results back to memory, the process repeats until either the model issues a “complete” action or the maximum number of steps is reached. Finally, the agent collates the memory into a cohesive final response. This structure shows how an LLM can orchestrate complex, multi-stage processes while consulting external functions and refining its plan based on previous results.

Copy Code

class MultiStepAgent:
    """Level 4: Multi-Step Agent - Model controls iteration and program continuation"""
   
    def __init__(self):
        self.tools = {
            "search_web": self.search_web,
            "extract_info": self.extract_info,
            "summarize_text": self.summarize_text,
            "create_report": self.create_report
        }
        self.memory = []
        self.max_steps = 5
   
    def run(self, user_task):
        self.memory.append({"role": "user", "content": user_task})
       
        steps_taken = 0
        while steps_taken < self.max_steps:
            next_action = self.determine_next_action()
           
            if next_action["action"] == "complete":
                return next_action["output"]
           
            tool_name = next_action["tool"]
            tool_args = next_action["args"]
           
            print(f"\n Step {steps_taken + 1}: Using tool '{tool_name}' with arguments: {tool_args}")
           
            tool_result = self.tools[tool_name](**tool_args)
           
            self.memory.append({
                "role": "tool",
                "content": json.dumps(tool_result)
            })
           
            steps_taken += 1
       
        return self.generate_final_response("Maximum steps reached. Here's what I've found so far.")
   
    def determine_next_action(self):
        context = "Current memory state:\n"
        for item in self.memory:
            if item["role"] == "user":
                context += f"USER INPUT: {item['content']}\n\n"
            elif item["role"] == "tool":
                context += f"TOOL RESULT: {item['content']}\n\n"
       
        prompt = f"""{context}
       
        Based on the above information, determine the next action to take.
        Choose one of the following options:
        1. search_web: Search for information (args: query)
        2. extract_info: Extract specific information from a text (args: text, target_info)
        3. summarize_text: Create a summary of text (args: text)
        4. create_report: Create a structured report (args: title, content)
        5. complete: Task is complete (include final output)
       
        Respond with a JSON object with the following structure:
        For tools: {{"action": "tool", "tool": "tool_name", "args": {{tool-specific arguments}}}}
        For completion: {{"action": "complete", "output": "final output text"}}
       
        Only return the JSON object and nothing else."""
       
        next_action_response = generate_text(prompt)
       
        try:
            json_match = re.search(r'({.*})', next_action_response, re.DOTALL)
            if json_match:
                next_action = json.loads(json_match.group(1))
            else:
                return {"action": "complete", "output": "I encountered an error in planning. Here's what I know so far: " + self.generate_final_response("Error in planning")}
        except json.JSONDecodeError:
            return {"action": "complete", "output": "I encountered an error in planning. Here's what I know so far: " + self.generate_final_response("Error in planning")}
           
        self.memory.append({"role": "assistant", "content": next_action_response})
        return next_action
   
    def generate_final_response(self, prefix=""):
        context = "Task history:\n"
        for item in self.memory:
            if item["role"] == "user":
                context += f"USER INPUT: {item['content']}\n\n"
            elif item["role"] == "tool":
                context += f"TOOL RESULT: {item['content']}\n\n"
            elif item["role"] == "assistant":
                context += f"AGENT ACTION: {item['content']}\n\n"
       
        prompt = f"""{context}
       
        {prefix} Generate a comprehensive final response that addresses the original user task."""
       
        final_response = generate_text(prompt)
        return final_response
   
    def search_web(self, query):
        time.sleep(1)  
       
        query_hash = sum(ord(c) for c in query)
        num_results = (query_hash % 3) + 2
       
        results = []
        for i in range(num_results):
            results.append(f"Result {i+1}: Information about '{query}' related to aspect {chr(97 + i)}.")
       
        return {
            "query": query,
            "results": results
        }
   
    def extract_info(self, text, target_info):
        time.sleep(0.5)  
       
        return {
            "extracted_info": f"Extracted information about '{target_info}' from the text: The text indicates that {target_info} is related to several key aspects mentioned in the content.",
            "confidence": round(random.uniform(0.7, 0.95), 2)
        }
   
    def summarize_text(self, text):
        time.sleep(0.5)
       
        word_count = len(text.split())
       
        return {
            "summary": f"Summary of the provided text ({word_count} words): The text discusses key points related to the subject matter, highlighting important aspects and providing context.",
            "original_length": word_count,
            "summary_length": round(word_count * 0.3)
        }
   
    def create_report(self, title, content):
        time.sleep(0.7)
       
        report_sections = [
            "## Introduction",
            f"This report provides an overview of {title}.",
            "",
            "## Key Findings",
            content,
            "",
            "## Conclusion",
            f"This analysis of {title} highlights several important aspects that warrant consideration."
        ]
       
        return {
            "report": "\n".join(report_sections),
            "word_count": len(content.split()),
            "section_count": 3
        }


def demo_level4():
    print("\n" + "="*50)
    print("LEVEL 4: MULTI-STEP AGENT DEMO")
    print("="*50)
    print("At this level, the AI manages the entire workflow, deciding which tools")
    print("to use, when to use them, and determining when the task is complete.\n")
   
    user_task = input("Enter a research or analysis task: ") or "Research quantum computing recent developments and create a brief report"
    print("\nProcessing your request... (this may take a minute)\n")
   
    agent = MultiStepAgent()
    result = agent.run(user_task)
    print("\nFINAL OUTPUT:")
    print("-"*50)
    print(result)
    print("-"*50)

The MultiStepAgent class maintains an evolving memory of user inputs and tool outputs, then repeatedly prompts the LLM to decide its next action, whether to search the web, extract information, summarize text, create a report, or finish, executing the chosen tool and appending the result until the task is complete or a step limit is reached. In doing so, it showcases a Level 4 agent that orchestrates multi-step workflows by letting the model control iteration and program continuation.

Level 5: Fully Autonomous Agent

At the most advanced level, the `AutonomousAgent` class demonstrates a closed-loop system in which the model not only plans and executes but also generates, validates, refines, and runs new Python code. After the user task is recorded, the agent asks the model to produce a detailed plan, then prompts it to generate self-contained solution code, which is automatically cleaned of markdown formatting. A subsequent validation step queries the model for any syntax or logic issues; if issues are found, the agent asks the model to refine the code. The validated code is then wrapped with sandboxing utilities, such as safe printing, captured output buffers, and result-capture logic, and executed in a restricted local environment. Finally, the agent synthesizes a professional report explaining what was done, how it was accomplished, and the final results. This level exemplifies a truly autonomous AI system that can extend its capabilities through dynamic code creation and execution.

Copy Code

class AutonomousAgent:
    """Level 5: Fully Autonomous Agent - Model creates & executes new code"""
   
    def __init__(self):
        self.memory = []
   
    def run(self, user_task):
        self.memory.append({"role": "user", "content": user_task})
       
        print(" Planning solution approach...")
        planning_message = self.plan_solution(user_task)
        self.memory.append({"role": "assistant", "content": planning_message})
       
        print(" Generating solution code...")
        generated_code = self.generate_solution_code()
        self.memory.append({"role": "assistant", "content": f"Generated code: ```python\n{generated_code}\n```"})
       
        print(" Validating code...")
        validation_result = self.validate_code(generated_code)
        if not validation_result["valid"]:
            print(" Code validation found issues - refining...")
            refined_code = self.refine_code(generated_code, validation_result["issues"])
            self.memory.append({"role": "assistant", "content": f"Refined code: ```python\n{refined_code}\n```"})
            generated_code = refined_code
        else:
            print(" Code validation passed")
       
        try:
            print(" Executing solution...")
            execution_result = self.safe_execute_code(generated_code, user_task)
            self.memory.append({"role": "system", "content": f"Execution result: {execution_result}"})
           
            # Generate a final report
            print(" Creating final report...")
            final_report = self.create_final_report(execution_result)
            return final_report
           
        except Exception as e:
            return f"Error executing the solution: {str(e)}\n\nGenerated code was:\n```python\n{generated_code}\n```"
   
    def plan_solution(self, task):
        prompt = f"""Task: {task}


        You are an autonomous problem-solving agent. Create a detailed plan to solve this task.
        Include:
        1. Breaking down the task into subtasks
        2. What algorithms or approaches you'll use
        3. What data structures are needed
        4. Any external resources or libraries required
        5. Expected challenges and how to address them


        Provide a step-by-step plan.
        """
       
        return generate_text(prompt)
   
    def generate_solution_code(self):
        context = "Task and planning information:\n"
        for item in self.memory:
            if item["role"] == "user":
                context += f"USER TASK: {item['content']}\n\n"
            elif item["role"] == "assistant":
                context += f"PLANNING: {item['content']}\n\n"
       
        prompt = f"""{context}


        Generate clean, efficient Python code that solves this task. Include comments to explain the code.
        The code should be self-contained and able to run inside a Python script or notebook.
        Only include the Python code itself without any markdown formatting.
        """
       
        code = generate_text(prompt)
       
        code = re.sub(r'^```python\n|```

A Comprehensive Tutorial on the Five Levels of Agentic AI Architectures: From Basic Prompt Responses to Fully Autonomous Code Generation and Execution

Leave a Comment

Leave a Comment

Leave a Comment