Despite much discussion about the risk of Artificial Intelligence (AI) replacing humans in the workforce, a new study shows that the technology is still far behind. Scale AI, in collaboration with the Center for AI Safety (CAIS), put several popular AI models to the test, asking them to perform real-world tasks, from product design and game development to data analysis and scientific writing.
The results were disappointing. The Manus model achieved the best result, but only 2.5% of its tasks were rated as “acceptable work” by a reasonable client, according to a panel of 40 judges. Gemini 2.5 Pro came in last, with only 0.8% of tasks meeting expectations. The researchers point out that while AI models are improving on standard tests, they still fall short of the quality demanded by the real job market.
According to Dan Hendrycks, director of CAIS and advisor to Elon Musk’s xAI company, combining human labor with AI is currently more efficient, but in the future “like in chess, it will probably be more effective to use AI alone.” However, another study from BetterUp and Stanford University’s Social Media Lab shows that the use of AI in the workplace often slows productivity.
In a survey of 1,150 American employees, 40% said they had received “workslop” — low-quality work created by AI — in the past month that they or their colleagues had to rework. In an organization with 10 employees, that translates to about $9 million in lost productivity per year, according to estimates. The results suggest that, for now at least, AI needs humans as much as humans need it.

