Agents can be evaluated objectively or subjectively

This graph explains well how agents can be evaluated (image from Agentic AI course on DeepLearning.ai)
The way I like to think about this, is:
- Can the evaluation condition be expressed mathematically? If so, we can evaluate with code
- If it can't be expressed in code, it's definitely a task that requires non-deterministic thinking