These days, large language models can handle increasingly complex tasks, writing complex code and engaging in sophisticated reasoning. But when it comes to four-digit multiplication, a task taught in elementary school, even state-of-the-art systems fail. Why?
A new paper posted to the arXiv preprint server by University of Chicago computer science Ph.D. student Xiaoyan Bai and faculty co-director of the Data Science Institute’s Novel Intelligence Research Initiative Chenhao Tan finds answers by reverse-engineering failure and success.
They worked with collaborators from MIT, Harvard University, University of Waterloo and Google DeepMind to probe AI’s “jagged frontier”—a term for its capacity to excel at complex reasoning yet stumble on seemingly simple tasks.



