-
Notifications
You must be signed in to change notification settings - Fork 444
Description
Problem Statement
Enable meta-prompting for Strands agents - the ability to automatically optimize prompts based on execution patterns. This is critical for scenarios involving long-running agent trajectories where manual prompt tuning can't keep pace with evolving requirements. By learning from outcomes across multiple interactions, agents can achieve just-in-time optimization, adapting their instructions to match actual usage patterns rather than anticipated ones.
Currently, Strands agents use static prompts that require manual tuning when deployed to production. While agents can reason at runtime (CoT/ReAct), it can be difficult for them to learn instructions over time and perform actual just in time scaling.
Key distinction from existing Strands Agents capabilities:
- CoT/ReAct (current): Runtime reasoning for individual queries ("How do I solve THIS task?")
- Prompt Optimization (proposed): Learning better instructions from patterns ("What instructions lead to better outcomes?")
Pain points:
- Agents can't easily learn from successful/failed interactions
- Performance degrades on edge cases without manual intervention
- Each team must manually tune prompts for their specific use case
- No systematic way to incorporate execution feedback
Proposed Solution
Add a native prompt optimization capability based on GEPA (Genetic-Evolutionary Prompt Architecture) that allows agents to automatically improve their prompts through reflective evolution.
# Simple API - just like enabling a conversation manager
agent = Agent(system_prompt="You are a helpful assistant")
# Enable optimization with one line
agent.enable_optimization(
optimizer="gepa", # Start with GEPA, extensible to other algorithms
feedback="task_completion" # Use built-in or custom feedback
)
# Agent now learns from interactions automatically
response = await agent("Help me debug this code")
Example of evolution (not runtime reasoning):
# Initial prompt:
"You are a helpful assistant"
# After 20 interactions with code debugging tasks, evolves to:
"You are a helpful assistant. When debugging code:
- First identify the error type and stack trace
- Check for common issues: null pointers, off-by-one errors, type mismatches
- Suggest minimal reproducible examples
- Provide step-by-step debugging strategies"
How it works:
- Agent executes tasks normally
- Feedback function evaluates outcomes (using existing signals)
- After N interactions, optimizer reflects on patterns across executions
- Prompts evolve to incorporate lessons learned
- Changes validated before applying
The GEPA paper demonstrates 10-20% performance improvements with just 10-20 examples (vs 100k+ for RL approaches).
Use Case
1. Code Review Assistant
# Use existing PR metrics as feedback
def pr_feedback(output, context):
acceptance_rate = context["suggestions_accepted"] / context["suggestions_made"]
return {"score": acceptance_rate, "feedback": f"{acceptance_rate*100}% accepted"}
agent.enable_optimization(feedback=pr_feedback)
# Learns team conventions, common bugs, what actually gets fixed
2. Customer Support Bot
# Use ticket resolution as feedback
def ticket_feedback(output, context):
if context["ticket_resolved"]:
return {"score": 1.0, "feedback": "Resolved"}
return {"score": 0.3, "feedback": f"User still confused about {context['issue']}"}
agent.enable_optimization(feedback=ticket_feedback)
# Learns common issues, effective explanations, when to escalate
3. Data Analysis Agent
# Use notebook execution results
def analysis_feedback(output, context):
result = execute_notebook(output)
if result.error:
return {"score": 0, "feedback": f"Code error: {result.error}"}
return {"score": 1.0, "feedback": "Analysis completed"}
agent.enable_optimization(feedback=analysis_feedback)
# Learns to use available libraries, avoid common errors, generate visualizations
Alternatives Solutions
- External tools (DSPy, etc.): Poor integration, can't access Strands execution context
- Manual prompt management: Current approach - doesn't easily scale for long running agents
- Fine-tuning: Needs 100k+ examples, expensive, not real-time
Additional Context
Developer Experience:
- No complex setup - just provide feedback function
- Use existing signals (test results, user actions, metrics)
- Start with 0 examples, learn from production
- Built-in feedback functions for common cases
Why this matters:
- First SDK with native prompt optimization built in and gepa support
- Reduce operational costs through self-improvement
- Better user experience as agents adapt to specific domains
Note on related work: In the Blue Helix OSINT blog (https://blogs.infoblox.com/security/blue-helix-agentic-osint-researcher/) demonstrates successful production use of evolutionary optimization for agent strategy evolution. While Blue Helix evolves search strategies and GEPA evolves instruction prompts, both validate that genetic algorithms can effectively optimize agent behavior. Future implementations could explore combining both approaches for agents that optimize both their instructions and operational strategies.
This feature can enables a new class of self-improving agents that get better with use, without requiring developers to constantly tune prompts manually for long running agents.