Every other week there's a new GPT-vs-Claude-vs-Gemini benchmark on coding or math or reasoning. None of them tell you whether the model can actually make a decision under uncertainty, where the answer isn't in the training data and the result shows up two weeks later in a P&L. So I built a different kind of eval. Seven frontier LLMs, $100,000 of paper capital each, identical tools, identical prompts, identical data. Every Monday they pick stock…
This story is only covered by news sources that have yet to be evaluated by the independent media monitoring agencies we use to assess the quality and reliability of news outlets on our platform. Learn more here.