Swarm Code Reading
Source code: https://github.com/openai/swarm
My opensourced demo of how to modify and use swarm: https://github.com/haoruilee/huggingface-swarm
TL;DR: Swarm is a LLM multi-agent structure. It warps different LLM into function_calling format and passed them to too_choice to make one agent call another.
1. Overview of Swarm’s Components
Before delving into the core.py
file, let’s understand the supporting modules:
types.py
: Defines the core data structures likeAgent
,Response
, andResult
, which are used throughout the code.util.py
: Provides utility functions likedebug_print
,merge_chunk
, andfunction_to_json
, which assist in debugging, merging responses, and converting functions to JSON.
2. Initialization of the Swarm Class
The Swarm
class is initialized with a client:
1 | class Swarm: |
- Purpose: This constructor sets up the
Swarm
object, ensuring it has a client for interacting with the OpenAI API. - Key Points:
- If no client is provided, it defaults to creating an
OpenAI
client instance. - This client is stored as
self.client
, which is used in subsequent methods.
- If no client is provided, it defaults to creating an
3. Method: get_chat_completion
This method retrieves chat completions from the OpenAI model based on agent instructions and conversation history.
1 | def get_chat_completion(self, agent, history, context_variables, model_override, stream, debug): |
- Steps:
- Prepare Context Variables:
- Converts
context_variables
into adefaultdict
to handle missing keys gracefully.
- Converts
- Set Instructions:
- Determines the agent’s instructions, either by calling it (if it’s a function) or directly using the provided string.
- Compose Messages:
- Starts with a system message (the instructions) and appends the conversation history.
- Debugging:
- If debugging is enabled, prints the current state of messages.
- Convert Functions to Tools:
- Transforms agent functions into JSON-serializable objects using
function_to_json
fromutil.py
. - Removes
context_variables
from the tool’s parameters to avoid exposing internal details.
- Transforms agent functions into JSON-serializable objects using
- Create Parameters:
- Prepares the parameters needed for the OpenAI API call.
- Includes tools, messages, model choice, and whether to use parallel tool calls.
- API Call:
- Sends the request to OpenAI’s API to get the chat completion.
- Prepare Context Variables:
4. Method: handle_function_result
Handles the result returned by agent functions and ensures it’s properly formatted.
1 | def handle_function_result(self, result, debug): |
- Steps:
- Pattern Matching:
- Matches the type of
result
using thematch
statement.
- Matches the type of
- Result Object:
- If the result is already a
Result
object, it’s returned as-is.
- If the result is already a
- Agent Object:
- If the result is an
Agent
, it’s wrapped in aResult
with the agent’s name serialized as JSON.
- If the result is an
- Fallback:
- For any other type of result, attempts to convert it to a string.
- If conversion fails, raises a
TypeError
with a debug message.
- Pattern Matching:
5. Method: handle_tool_calls
Handles the execution of tool calls made by the assistant and returns a partial response.
1 | def handle_tool_calls(self, tool_calls, functions, context_variables, debug): |
- Steps:
- Build Function Map:
- Creates a dictionary mapping function names to their respective implementations.
- Iterate Through Tool Calls:
- For each tool call:
- Checks if the tool exists in the function map.
- If missing, appends an error message to the response.
- Parses the tool’s arguments and executes the corresponding function.
- For each tool call:
- Handle Results:
- Processes the function’s result using
handle_function_result
. - Appends the result to the response messages and updates context variables.
- Processes the function’s result using
- Return Partial Response:
- Returns a
Response
object containing messages, updated context variables, and any agent changes.
- Returns a
- Build Function Map:
6. Run and Stream Methods
The run
and run_and_stream
methods coordinate the overall conversation flow and tool execution. The run
method handles sequential message processing and tool execution in a structured loop until a final response is generated, while the run_and_stream
method streams responses in real-time, providing chunks of data as they are processed. These methods ensure that agents interact efficiently and tool calls are executed when needed.
Below is a simplified workflow diagram for better understanding:
The run
and run_and_stream
methods coordinate the overall conversation flow and tool execution. Below is a simplified workflow diagram for better understanding:
1 | +----------------+ +--------------------+ +----------------+ |
run
Method:- Calls
get_chat_completion
to generate a response. - Executes tool calls if present and updates context variables.
- Iterates until a final response is ready or the max turn limit is reached.
- Calls
run_and_stream
Method:- Similar to
run
but streams responses in real-time. - Yields chunks of data for immediate processing.
- Similar to