Technology

Revelation from Apple: GenAI Lacks True Human-Like Thinking

2025-06-11

Author: Wei Ling

Can AI Really Think Like Us?

In a groundbreaking study just released by Apple, researchers reveal a stunning truth: despite their eloquence and seemingly rational outputs, today's advanced Generative AI (GenAI) models are fundamentally incapable of genuine human-like reasoning. These sophisticated systems, while impressive on the surface, falter significantly when faced with complex problem-solving.

The Study's Crucial Findings

Entitled "The Illusion of Thinking," the research highlights Apple's rigorous testing of AI models including Claude 3.7 Sonnet and DeepSeek-R1 in controlled settings. The Apple team's innovative approach involved constructing intricate puzzle environments, varying in complexity, to evaluate performance across traditional large language models (LLMs) and cutting-edge large reasoning models (LRMs).

The results were eye-opening. For simpler tasks, the standard LLMs outperformed LRMs, requiring fewer computing resources while demonstrating higher accuracy. However, as the complexity escalated, LRMs began to shine—only to ultimately collapse under challenging scenarios. The study suggests that these advanced models struggle to maintain their performance, with accuracy plummeting to zero beyond certain complexity thresholds.

Questionable Evaluation Methods

The researchers caution that relying on traditional math benchmarks to assess LRMs may be misleading. They argue that the models' tendency for 'accuracy collapse' raises serious doubts about their reliability in real-world applications. Their findings indicate a clear limit to the reasoning capabilities of current LRMs, emphasizing that after a certain complexity, their reasoning effectiveness declines sharply.

Dueling Opinions in the AI Arena

Yet, not everyone agrees with Apple's dire take on GenAI. Prominent AI expert Ethan Mollick argues that while the study's highlights are valid, its implications are exaggerated. In a provocative LinkedIn post, he contends that even with existing technology, AI is poised for transformative impacts—a sentiment echoed by many in the tech community.

In a contrasting view, Mollick points out that Apple may not be keeping pace in the AI race. He points to the company's lack of disclosed benchmarks, suggesting their latest on-device AI models may lag behind Google's Gemma 3-4B and Qwen 3-4B.

Looking Ahead: Are We at a Crossroads?

As the landscape of AI continues to evolve, the question remains: can future advancements in GenAI overcome the significant limitations highlighted in Apple's study? With LRMs having appeared only recently, there's a tantalizing possibility that innovative techniques could significantly enhance AI capabilities. The future of AI holds exciting potential, but whether it can truly replicate human-like thought is still up for debate.