Manager Agent Gym - Quick Start Guide
Get up and running with autonomous workflow management in 10 minutes
🚀 What You'll Build
By the end of this guide, you'll have: - A working manager agent orchestrating a realistic workflow - An understanding of how manager and worker agents coordinate tasks - A complete example running locally with your chosen LLM provider
📋 Prerequisites
- Python 3.11+
- uv package manager
- OpenAI API key (get one at platform.openai.com)
⚡ 5-Minute Setup
Step 1: Install the Library
# Clone the repository
git clone https://github.com/DeepFlow-research/manager_agent_gym
cd manager_agent_gym
# Install with uv (recommended)
uv pip install -e .
# Install provider integrations (LLM + agents tooling)
uv pip install -e ".[openai,agents]"
# Alternative: Install with pip
pip install -e ".[openai,agents]"
Step 2: Configure API Keys
# Copy the example environment file
cp .env.example .env
# Edit .env file with your API keys
# The file should contain:
# OPENAI_API_KEY=sk-your-key-here
# ANTHROPIC_API_KEY=sk-ant-your-key-here # Optional
Note: The library uses
pydantic-settingswhich automatically picks up variables from the.envfile - no need to export environment variables manually.
Step 3: Run Your First Example
# Run the hello world example (recommended)
python examples/getting_started/hello_manager_agent.py
# Or launch the interactive CLI
python -m examples.cli
You should see output like:
🚀 Welcome to Manager Agent Gym!
📋 Creating workflow...
✅ Created workflow 'ICAAP Workflow' with 8 tasks
👥 Setting up agent registry...
✅ Registered 4 agents
🧠 Creating manager agent...
✅ Manager agent created with quality-focused preferences
🚀 Setting up execution engine...
✅ Execution engine ready
🎬 Starting workflow execution...
🎯 What Just Happened?
Your first manager agent just:
- 📋 Analyzed a complex workflow with 8 interconnected tasks
- 🧠 Made strategic decisions about task assignment and timing
- 👥 Coordinated a team of AI and simulated human agents
- ⚖️ Balanced multiple objectives (quality, time, cost, oversight)
- 📊 Tracked progress through discrete timesteps
🔧 Understanding the Code
Let's break down the key components:
Manager Agent Creation
from manager_agent_gym import ChainOfThoughtManagerAgent, PreferenceWeights, Preference
# Define what the manager cares about
preferences = PreferenceWeights(
preferences=[
Preference(name="quality", weight=0.4, description="High-quality deliverables"),
Preference(name="time", weight=0.3, description="Reasonable timeline"),
Preference(name="cost", weight=0.2, description="Cost-effective execution"),
Preference(name="oversight", weight=0.1, description="Manageable oversight"),
]
)
# Create the AI manager
manager = ChainOfThoughtManagerAgent(
preferences=preferences,
model_name="gpt-4o-mini", # Default cost-effective OpenAI model
manager_persona="Strategic Project Coordinator"
)
Workflow Execution
from manager_agent_gym import WorkflowExecutionEngine, AgentRegistry
# Set up the execution environment
engine = WorkflowExecutionEngine(
workflow=workflow,
agent_registry=agent_registry,
manager_agent=manager,
stakeholder_agent=stakeholder,
max_timesteps=20,
seed=42,
)
# Run the simulation
results = await engine.run_full_execution()
🎨 Customization Options
Different Manager Types
# Strategic LLM-based manager (default)
manager = ChainOfThoughtManagerAgent(preferences=prefs, model_name="gpt-4o-mini")
# Random baseline for comparison
from manager_agent_gym.core.manager_agent import RandomManagerAgentV2
manager = RandomManagerAgentV2(preferences=prefs, seed=42)
# Simple one-shot delegation
from manager_agent_gym.core.manager_agent import OneShotDelegateManagerAgent
manager = OneShotDelegateManagerAgent(preferences=prefs)
Different LLM Models
# OpenAI models
manager = ChainOfThoughtManagerAgent(preferences=prefs, model_name="gpt-4o")
manager = ChainOfThoughtManagerAgent(preferences=prefs, model_name="gpt-4o-mini")
manager = ChainOfThoughtManagerAgent(preferences=prefs, model_name="o3")
# Anthropic models
manager = ChainOfThoughtManagerAgent(preferences=prefs, model_name="claude-3-5-sonnet")
# Google models
manager = ChainOfThoughtManagerAgent(preferences=prefs, model_name="gemini-2.0-flash")
Preference Tuning
# Quality-focused preferences
quality_focused = PreferenceWeights(preferences=[
Preference(name="quality", weight=0.6, description="Exceptional deliverables"),
Preference(name="time", weight=0.2, description="Reasonable timeline"),
Preference(name="cost", weight=0.1, description="Cost consideration"),
Preference(name="oversight", weight=0.1, description="Minimal oversight"),
])
# Speed-focused preferences
speed_focused = PreferenceWeights(preferences=[
Preference(name="time", weight=0.5, description="Fast delivery"),
Preference(name="quality", weight=0.3, description="Adequate quality"),
Preference(name="cost", weight=0.1, description="Cost consideration"),
Preference(name="oversight", weight=0.1, description="Minimal oversight"),
])
# Cost-focused preferences
cost_focused = PreferenceWeights(preferences=[
Preference(name="cost", weight=0.5, description="Minimize expenses"),
Preference(name="quality", weight=0.2, description="Acceptable quality"),
Preference(name="time", weight=0.2, description="Reasonable timeline"),
Preference(name="oversight", weight=0.1, description="Efficient oversight"),
])
🌟 Try More Examples
Interactive CLI (Recommended)
# Interactive example selector with full scenario menu
python -m examples.cli
This opens an interactive menu where you can: - Choose from 20+ realistic business scenarios - Select different manager types and models - Run parallel experiments - Compare results across configurations
The CLI is the recommended way to run simulations as it provides the most comprehensive interface for experimentation.
Specific Scenarios
# Run a banking compliance workflow
python -m examples.cli --scenarios banking_license_application --manager-mode cot
# Run multiple scenarios in parallel
python -m examples.cli \
--scenarios data_science_analytics marketing_campaign \
--manager-mode cot \
--model-name gpt-4o \
--parallel-jobs 2
# Compare different manager types
python -m examples.cli \
--scenarios icaap \
--manager-mode cot random \
--model-name gpt-4o
Programmatic Examples
from examples.run_examples import run_demo
# Run a specific scenario
results = await run_demo(
workflow_name="data_science_analytics",
max_timesteps=30,
model_name="gpt-4o",
manager_agent_mode="cot",
seed=42,
)
# Analyze results
print(f"Completion rate: {results.completion_rate:.1%}")
print(f"Total cost: ${results.total_cost:.2f}")
print(f"Manager actions taken: {len(results.manager_actions)}")
📊 Understanding Results
After running an example, you'll see:
Execution Summary
📊 SUMMARY:
• Total timesteps: 15
• Tasks completed: 8/8
• Completion rate: 100.0%
• Final execution state: COMPLETED
Manager Actions
🧠 MANAGER ACTIONS TAKEN:
• assign_task: 5 times
• refine_task: 2 times
• send_message: 3 times
• create_task: 1 times
Performance Metrics
- Completion rate: Percentage of tasks successfully finished
- Timesteps: Discrete simulation steps taken
- Manager actions: Types and frequency of decisions made
- Cost tracking: Estimated and actual costs
- Quality scores: Evaluation against preferences
🔍 Key Features Demonstrated
1. Autonomous Decision Making
The manager agent observes the workflow state and makes strategic decisions without human intervention.
2. Multi-Objective Optimization
Balances competing goals like quality vs. speed vs. cost based on your preferences.
3. Dynamic Coordination
Adapts to changing conditions, task failures, and new requirements in real-time.
4. Realistic Simulation
Models human agent availability, AI agent capabilities, and real-world constraints.
5. Comprehensive Evaluation
Tracks multiple metrics beyond just task completion.
🎯 Next Steps
Explore More Scenarios
- Financial Services:
banking_license_application,icaap,orsa - Legal & Compliance:
legal_global_data_breach,legal_contract_negotiation - Technology:
genai_feature_launch,data_science_analytics - Marketing:
marketing_campaign,brand_crisis_management
Customize for Your Use Case
- Create your own workflows by extending the
Workflowclass - Define custom preferences for your specific domain
- Add specialized agents with domain-specific capabilities
- Implement custom evaluation metrics for your success criteria
Dive Deeper
- Read the full Library Documentation
- Explore the research paper PDF: https://arxiv.org/abs/2510.02557
- Check out advanced examples in
examples/ - Review the API reference in
docs/
💡 Pro Tips
Performance Optimization
# Use faster models for experimentation
manager = ChainOfThoughtManagerAgent(preferences=prefs, model_name="gpt-4o-mini")
# Limit timesteps for faster iteration
engine = WorkflowExecutionEngine(..., max_timesteps=10)
# Run multiple scenarios in parallel
python -m examples.cli --parallel-jobs 4
Debugging and Analysis
# Enable detailed logging
engine = WorkflowExecutionEngine(..., enable_timestep_logging=True)
# Save outputs for analysis
engine = WorkflowExecutionEngine(..., output_config=OutputConfig(base_dir="my_results/"))
# Use deterministic seeds
engine = WorkflowExecutionEngine(..., seed=42)
Cost Management
# Use cost-effective models
manager = ChainOfThoughtManagerAgent(preferences=prefs, model_name="gpt-4o-mini")
# Monitor token usage in outputs
# Check the execution results for API cost tracking
🚨 Troubleshooting
Common Issues
API Key Not Found
# Check your .env file exists and contains your API key
cat .env
# Should show: OPENAI_API_KEY=sk-your-key-here
# If missing, copy from example and edit
cp .env.example .env
# Edit .env with your actual API keys
Module Import Errors
# Ensure you're in the project directory
cd manager_agent_gym
# Reinstall with uv (recommended)
uv pip install -e .
# Or with pip
pip install -e .
Slow Execution
- Use smaller models (gpt-4o-mini)
- Reduce max_timesteps
- Check your internet connection
Out of API Credits - Check your OpenAI usage at platform.openai.com - Consider using smaller models for testing
🎉 You're Ready!
You now have a working autonomous manager agent system! The AI manager can:
- 🧩 Break down complex goals into manageable tasks
- 👥 Coordinate teams of specialized agents
- ⚖️ Balance multiple competing objectives
- 📊 Adapt to changing conditions in real-time
- 📋 Maintain governance and compliance
What's next? Try different scenarios, experiment with preferences, and see how the manager adapts to various challenges!
Happy orchestrating! 🎼