Anthropic, an artificial intelligence (AI) and “public benefit” company, launched Claude 2 on July 11, marking another milestone in a year full of seemingly nonstop progress from the burgeoning generative AI sector.
Introducing Claude 2! Our latest model has improved performance in coding, math and reasoning. It can produce longer responses, and is available in a new public-facing beta website at https://t.co/uLbS2JNczH in the US and UK. pic.twitter.com/jSkvbXnqLd
— Anthropic (@AnthropicAI) July 11, 2023
According to a company blog post, Claude 2 shows improvements across nearly every measurable category. Perhaps most noteworthy among the differences between it and its predecessor is how the researchers discuss their work.
There’s no mention of traditional machine learning benchmarking or computational scores against similar models in the blog post announcing Claude 2. Instead, Anthropic tested both Claude and Claude 2 head-to-head on numerous tests meant to represent real-world knowledge, skills and problem-solving tests.
Claude 2 beat its predecessor across the board on knowledge, coding and other exams and, according to Anthropic, even scores well against human averages:
“When compared to college students applying to graduate school, Claude 2 scores above the 90th percentile on the GRE reading and writing exams, and similarly to the median applicant on quantitative reasoning.”
It is worth noting that many experts believe comparisons between human and AI test takers are inefficacious due to the nature of human cognitive reasoning and the likelihood that a large language model’s training data set contains test information. Essentially, tests designed for humans may not actually “test” an AI’s ability to reason or provide a proper demonstration of actual knowledge or skill.
Along with the launch of Claude 2, Anthropic
Go to Source to See Full Article
Author: Tristan Greene