Discussion about this post

User's avatar
Pawel Jozefiak's avatar

Your point that effort controls and agent teams matter more than benchmark scores is exactly right. The /insights command and workflow infrastructure upgrades are what actually enable production deployment at scale, not incremental MMLU improvements.

I looked at what features drive competitive advantage: https://thoughts.jock.pl/p/ai-agent-landscape-feb-2026-data

Jenny Ouyang's avatar

Great breakdown Ilia! Especially as you include the when-not-to-use-it section, that is what people really need to understand.

14 more comments...

No posts

Ready for more?