Wednesday, August 27, 2025

TECH REVIEW: WaPo

Staff quizzed AI bots with tough trivia, recent events questions and more. Some answers were impressive. Others were not.

Three volunteer librarians scored 900 answers from Bing Copilot, ChatGPT, Claude, Grok, Meta AI and Perplexity, as well as ...
 
Top AI models that have shown to beat ChatGPT on accuracy
  • Anthropic's Claude: In 2025, Claude was reported to be the "overall winner" in a reading test and the only model that "never hallucinated". It achieved higher accuracy than ChatGPT in several languages.
  • Google's Gemini: Google's Gemini 2.5 Pro and Gemini Ultra demonstrated superior accuracy. In 2024, Gemini Ultra had a 90% accuracy rate on the Massive Multitask Language Understanding (MMLU) test, outperforming ChatGPT-4o's 88.7%. Gemini 2.5 Pro also led on the LMArena leaderboard, which is based on human feedback.
  • DeepSeek: This open-source reasoning model, primarily developed by a Chinese company, performs well in mathematics and coding. It was noted to be more cost-efficient than ChatGPT and scored well in hallucination tests. Perplexity's R1776 model is also based on DeepSeek's R1.
  • Perplexity: Designed for factual accuracy, Perplexity provides sources for its claims and allows users to select from various underlying models, including GPT, Gemini, and Claude.
  • AI models that specialize: For specific tasks, smaller, specialized AI models can outperform general-purpose ones like ChatGPT.

https://s.yimg.com/ny/api/res/1.2/yzQ.EnR4iNcNcpEYw64tvQ--/YXBwaWQ9aGlnaGxhbmRlcjt3PTk2MDtoPTU0MA--/https://media.zenfs.com/en/toms_guide_826/98e966eb182cdc5ab2f1524fd959d889

No comments:

REEVES RANT ___ Budget Breakdown