How intelligent is a model that memorizes the answers before an exam? That’s the question facing OpenAI after it unveiled o3 in December, and touted its model’s impressive benchmarks. At the time, some pundits hailed it as being almost as powerful as AGI, the level at which artificial intelligence is capable of achieving the same performance as a human on any task required by the user.
But money changes everything—even math tests, apparently.
OpenAI’s victory lap over its o3 model’s stunning 25.2% score on FrontierMath, a challenging mathematical benchmark developed by Epoch AI, hit a snag when it turned out the company wasn’t just acing the test—OpenAI helped write it, too.
“We gratefully acknowledge OpenAI for their support in creating the benchmark,” Epoch AI wrote in an updated footnote on the FrontierMath whitepaper—and this was enough to raise some red flags among enthusiasts.
Worse, OpenAI had not only funded FrontierMath’s development but also had access to its problems and solutions to use as it saw fit. Epoch AI later
Go to Source to See Full Article
Author: Jose Antonio Lanz
