Evaluator-Optimizer
Evaluator-Optimizer pattern allows an LLM to improve the quality of generation by optimizing a previous generation with feedback from an evaluator. This evaluation-optimization loop can be done multiple times.
Implementation
This pattern consists of three subtasks. Each subtask is implemented using Task Execution pattern.
- A task to generate the initial result.
- A task to evaluate the result and provide feedback.
- A task to optimize a previous result based on the feedback.
The flow chart below shows the basic steps.
Guides
Use Different Models
It's recommended to use different models for generation and evaluation to get better results. For example, we can use GPT-4o for generation and DeepSeek V3 for evaluation, or vice versa.
Max Iterations
It's recommended to set a limit to the maximum number of evaluations. This will reduce the response time and cut down the costs.
Evaluation Results
For the evaluation result, it can be a boolean type with values yes or no only, or a numeric type with the score from 0
to 100
. Numeric scores can be more flexible, as we can set different thresholds to mark the result as passed or not passed. When combined with Parallelization Workflow pattern, we can run multiple parallel evaluations and use the average score as the final score.
Example
Let's see an example. The example is code generation with multiple rounds of reviews.
Generate Initial Result
For generating the initial result, the prompt template only contains the task input. The template variable input
represents the task input. Java code will be generated.
Goal: Generate Java code to complete the task based on the input.
{input}
Output of this task is the generated code.
Evaluate
After generating the initial result, we can evaluate the generated code. The template variable code
is the code to evaluate. The template lists several rules to evaluate the code.
Goal: Evaluate this code implementation for correctness, time complexity, and best practices.
Requirements:
- Ensure the code have proper javadoc documentation.
- Ensure the code follows good naming conventions.
- The code is passed if all criteria are met with no improvements needed.
{code}
Below is the type of evaluation result. Here a boolean result is used.
package com.javaaidev.agenticpatterns.evaluatoroptimizer;
import org.jspecify.annotations.Nullable;
/**
* Evaluation result
*
* @param passed Passed or not passed
* @param feedback Feedback if not passed
*/
public record Evaluation(boolean passed, @Nullable String feedback) {
}
Optimize
If the evaluation result is not passed, then the code can be optimized based on the evaluation feedback. The template variable code
represents the code to optimize, while feedback
representing the feedback from the evaluator.
Goal: Improve existing code based on feedback.
Requirements:
- Address all concerns in the feedback.
Code:
{code}
Feedback:
{feedback}
The output is the optimized code. Optimized code will be evaluated again util it passes the evaluation, or after the maximum number of iterations.
Reference Implementation
See this page for reference implementation and examples.