OpenAI’s o3 Hits Human-Level Scores, But Is It Good Enough to Be AGI?

OpenAI’s latest AI model family has achieved what many thought impossible, scoring an unprecedented 87.5% on the challenging, so-called Autonomous Research Collaborative Artificial General Intelligence benchmark—basically near the minimum threshold for what could theoretically be considered “human.”

The ARC-AGI benchmark tests how close a model is to achieving artificial general intelligence, meaning whether it can think, solve problems, and adapt like a human in different situations… even when it hasn’t been trained for them. The benchmark is extremely easy for humans to beat, but is extremely hard for machines to understand and solve.

The San Francisco-based AI research company unveiled o3 and o3-mini last week as part of its “12 days of OpenAI” campaign—and just days after Google announced its own o1 competitor. The release showed that OpenAI’s upcoming model was closer to reaching artificial general intelligence than expected.

OpenAI’s new reasoning-focused model marks a fundamental shift in how AI systems approach complex reasoning. Unlike traditional large language models that rely on pattern matching, o3 introduces a novel “program synthesis” approach that allows it to tackle entirely new problems it hasn’t encountered before.

“This is not merely incremental improvement, but a genuine breakthrough,” the ARC team stated in their evaluation report. In a blog post, ARC Prize co-founder Francois Chollet went even further, suggesting that “o3 is a system capable of adapting to tasks it has never encountered before, arguably approaching human-level performance i

Go to Source to See Full Article
Author: Jose Antonio Lanz

DeCrypt