CASE STUDY: The Easiest Performance Boost You Can Get is via AI Agent Swarms

Published on: November 15, 2024

By implementing a proper agent swarm architecture, it’s possible to significantly decrease task completion time and increase accuracy. Very significantly!

You might have a skeptical look on your face when reading the title. Performance optimization via multiple AI agents? Yes and YES! Bear with me for a couple more lines, and you might be surprised how simple yet powerful this can be. As we all know, AI performance matters, and inefficient architectures can cost you a fortune… And this is exactly what we’re going to address in this post. Still not convinced? Don’t be surprised if you see up to 70% improvement in complex task completion. Sorry, did I just spoil the article? Probably!

Agent Swarms

Have you ever tried using a single LLM agent to handle complex, multi-step tasks? And have you noticed how they sometimes get stuck or confused? If so, you can skip to the next paragraph. For those of you who aren’t familiar with the topic, here’s what’s happening.

Every single agent, no matter how capable, has limitations when handling complex workflows alone. And no one can blame them for it. Well, maybe yes, but let’s assume not at least at the beginning of this post 😊. They need to juggle multiple contexts, goals, and steps simultaneously. That’s why there’s such a strong push from platforms like LangChain and AutoGen to implement multi-agent architectures. If you use any agent solution, “the best” practice is to use a single agent with a complex prompt. We’re going to explain why that’s not enough in the following lines.

Implementation Approaches

Let’s first look at what approaches we have for implementing AI agents and quickly explain what they mean. Based on them, it’s possible to create architectures that will significantly improve your results. There are three main ones:

Single Agent – This is the simplest approach. One agent handles everything, from planning to execution.
Manager-Worker – A manager agent delegates tasks to specialized worker agents.
Collaborative Swarm – Multiple agents work together, sharing information and helping each other.

Here’s a practical implementation example of a swarm architecture using LangChain:

pythonCopyfrom langchain.agents import create_agent
from langchain.tools import Tool

def create_swarm():
    # Create specialized agents
    researcher = create_agent(
        tools=[web_search, pdf_reader],
        role="Research specialist"
    )
    
    analyst = create_agent(
        tools=[calculator, data_analyzer],
        role="Data analyst"
    )
    
    writer = create_agent(
        tools=[text_generator, summarizer],
        role="Content creator"
    )
    
    # Create manager agent
    manager = create_agent(
        tools=[
            Tool("researcher", researcher.run),
            Tool("analyst", analyst.run),
            Tool("writer", writer.run)
        ],
        role="Project manager"
    )
    
    return AgentSwarm(manager=manager, workers=[
        researcher, analyst, writer
    ])

# Usage example
swarm = create_swarm()
result = swarm.execute_task(
    "Research recent AI trends, analyze their impact, 
     and create a report"
)

Methodology

Two metrics will be followed: Task completion time and accuracy score (AS), and it’s possible to measure both automatically.

AS is a metric I invented for this case study. I’m sure there’s an official name for it, so don’t sacrifice me for it. For the sake of the post, it should be OK.

The image below shows what AS means and how it’s calculated:

pythonCopyAS = (completed_subtasks / total_subtasks) * 
     (correct_outputs / total_outputs)

Both metrics will be measured across different task types. We will simulate two complexity levels: Simple tasks and multi-step workflows. The results may be surprisingly different. For every architecture and task type, I’ve conducted 50 unique measurements with different random seeds and no caching. It’s 300 measurements in total.

Results

Let’s have a look at results:

ArchitectureTask TypeTime (s)ASCost ($)Single AgentSimple5.20.820.05Single AgentComplex15.40.650.15Manager-WorkerSimple6.80.880.08Manager-WorkerComplex12.10.850.18SwarmSimple7.10.940.10SwarmComplex9.20.910.22

Quite a difference, right? What a small architectural change can do with task completion! For easier understanding, let’s look at the charts below:

[Insert performance charts]

And differences are even bigger on complex tasks. I’d say almost game-changing.

As obvious from both task types, the worst performer is the single agent. Manager-Worker seems like a better option, but it still struggles with complex tasks compared to the full swarm. Even though the task eventually gets completed, it might not be optimal. It depends on many things, and one of them is the amount of context switching required, which can confuse a single agent. And if your use case isn’t a simple chatbot that can get by with basic responses, you might lose precious accuracy.

On the other hand, if you implement a proper swarm architecture, it’s possible to get significantly better results in both time and accuracy. Win-win. Or maybe win-draw. I can imagine arguments from optimization enthusiasts about the increased costs, but that’s a different story whether we want to optimize for the last dollar when it might cost us reliability and user trust.

Interpretation

Agent architecture really matters! I have a pretty decent baseline when it comes to AI task completion. Once I implemented JUST these three architectures (it’s quite common that companies try many more variants), the differences became dramatically clear. I leave it up to you whether 91% vs 65% accuracy on complex tasks is a lot or not. Mhmhm…No, I don’t. It’s hell of a lot.

There are many case studies about how much task completion accuracy affects user adoption. Try to experiment with it and you might be surprised as much as I was with the results.

Happy swarming! 😊

amanmeghrajani