Toggle light / dark theme

Performance of a large language model on the reasoning tasks of a physician

What if every scientific paper you read was just the “highlight reel” of a much longer, messier, and more complicated movie? You see the breakthrough, but you never see the hundreds of hours of footage showing what didn’t work.

Ultimately, the ARA marks a shift toward a future where “The Last Human-Written Paper” isn’t the end of science, but the beginning of a much deeper, machine-readable conversation.

However, this shift toward radical transparency comes with its own set of hurdles. While ARAs make AI agents more efficient, the study found a “prior-run box” effect where seeing a human’s past failures actually limited an AI’s ability to think outside the box and find creative new solutions. There is also a significant cultural and technical gap to bridge: the system relies on researchers being willing to expose their “messy” unfinished work, and even with better data, the jump in actual experiment reproduction was relatively modest. Furthermore, the reliance on “compilers” to translate old papers into this new format risks baking in errors or “hallucinations” if the original source was vague, proving that while machine-readable data is powerful, it isn’t a magic fix for the inherent complexities of scientific discovery.


We systematically evaluated the medical reasoning abilities of an LLM across six diverse experiments, comparing the model with hundreds of expert physicians. Overall, the model outperformed physicians across experiments, including in cases utilizing real and unstructured clinical data taken directly from the health record in an emergency department. These diagnostic touchpoints mirror the high-stakes decisions taken in emergency medicine departments, where nurses and clinicians make time-sensitive choices with limited information. Our results showed that humans, GPT-4o, and o1 all improved their diagnostic abilities as more information was available; o1 outperformed humans at multiple touchpoints, with the widest gap at initial ER triage, where there is the least information available.

The rapid pace of improvement in LLMs has substantial implications for the science and practice of clinical medicine. Although applying AI to assist with clinical decision support is sometimes viewed as a high-risk endeavor (22, 23), greater use of these tools might serve to mitigate the human and financial costs of diagnostic error, delay, and lack of access (24, 25). Our findings suggest the urgent need for prospective trials to evaluate these technologies in real-world patient care settings and for health care systems to prepare for investments for computing infrastructure and design for clinician-AI interaction that can facilitate the safe integration of AI tools into patient-care workflows. This includes the development of robust monitoring frameworks to oversee the broader implementation of AI clinical decision support systems (22), monitoring not just final diagnostic accuracy but other metrics crucial for successful deployment, including safety, efficiency, and cost.

We emphasize that our study addresses only text-based performance for both humans and machines; clinical medicine is multifaceted and awash with nontext inputs, including auditory (such as the patient’s level of distress) and visual information (for example, interpretation of medical imaging studies) that clinicians routinely use. Existing studies suggest that current foundation models are more limited in reasoning over nontext inputs (26, 27); future work is needed to assess how humans and machines may effectively collaborate (28) in use of nontext signals. This requires new benchmarks, trials, and technological solutions to more faithfully measure clinical encounters. Existing investment in increasingly pervasive ambient AI scribes and other passive monitoring technologies holds promise to serve as the basis for such investigations.

Metastatic cancer detection and management with artificial intelligence and augmented reality (Review)

Metastatic cancer remains a significant global health challenge, contributing to the majority of cancer-related mortality due to late detection, therapeutic resistance and the complexity of disseminated disease. Recent advances in artificial intelligence (AI) and augmented reality (AR) are transforming the landscape of metastatic cancer detection and management. AI-driven tools, including radiomics, deep learning models, and predictive analytics, enhance early identification of metastatic lesions, improve diagnostic accuracy, and support personalized treatment strategies by integrating multimodal clinical, imaging and molecular data. At the same time, AR technologies are increasingly applied in image-guided surgery, real-time tumor visualization and patient education, enabling more precise interventions and improved clinical decision-making.

Common asthma drug may turn off tumor ‘switch’ tied to immunotherapy resistance

A drug widely used to treat asthma and allergies may also help fight aggressive cancers, reports a new Northwestern Medicine study that uncovered how tumors hijack common white blood cells to evade immunotherapy.

The findings in mice and human tissues point to a practical, new way to improve treatment for tough tumors, such as triple-negative breast cancer, where immunotherapy often fails.

The study is published in Nature Cancer.

Dietary fats shape pancreatic cancer risk via ferroptosis

For decades, the relationship between fat and cancer has been treated as a question of quantity: Eat less fat, reduce your risk of developing cancer. But new research published April 29 in Cancer Discovery shows that for pancreatic cancer, the type of fat you consume matters more than the amount.

“It’s really the type of fat that you’re consuming, not just total fat content,” says Christian Felipe Ruiz, Ph.D., an associate research scientist in YSM’s Department of Genetics and lead author of the study. “Depending on the type of fat that you consume, it can go completely different ways. We found that some fats promote cancer, as we would expect, while other fats are really good at suppressing cancer.”

One fat in particular—oleic acid, the primary fatty acid in olive oil—may be accelerating tumor growth in ways scientists never anticipated. The result was surprising given oleic acid’s reputation in medicine. “It’s traditionally been considered a healthy type of fat for cardiovascular health,” Ruiz says.

Age does not appear to drive cardiovascular risk in pregnancy

Underlying cardiovascular risk, rather than older age, drives complications such as venous thromboembolism, cardiomyopathy and heart failure during pregnancy, according to new Weill Cornell Medicine research. The findings may encourage doctors to more actively address cardiovascular health in patients before they become pregnant.

The study, published in Nature Communications, suggests that instead of pregnancy becoming inherently riskier as people get older, it amplifies a person’s baseline cardiovascular risk, regardless of age.

“Pregnancy seems to be a uniform stress test, so to speak,” said the study’s lead author, Dr. Hooman Kamel, vice chair of clinical research and chief of neurocritical care in the Department of Neurology and the Helen and Albert Moon Professor of Neurology at Weill Cornell Medicine.

Battery-free skin-conformal wearable system can measure electrocardiogram signals

A research team led by Prof. Jerald Yoo from the Department of Electrical and Computer Engineering at Seoul National University (SNU) has developed a skin-conformal wearable health care system, “SkinECG,” capable of measuring electrocardiogram (ECG) signals without a battery. By combining energy harvesting with human body–coupled power transfer, the study presents a new solution to one of the most critical challenges in wearable devices: power supply.

The findings are published in Science Advances.

Wearable health care systems are emerging as next-generation medical technologies that enable real-time monitoring of physiological signals through body-worn sensors, allowing early detection of disease-related abnormalities.

Designing in situ power stations for future Mars missions

You’re in the lab analyzing Martian regolith samples within your cozy Mars habitat serving on the fifth human mission to Mars. The power within the habitat has been flowing flawlessly thanks to the MARS-MES (Mars Atmospheric Resource & Multimodal Energy System), including the general habitat lighting, science lab, sleeping quarters, exercise equipment, the virtual reality headsets the crew use for rest & relaxation, oxygen and fuel generation, and water. All this from converting the Martian atmosphere into workable electricity.

While this scenario might be decades away, scientists on Earth are working hard to make this concept a reality today. This includes a team of scientists from China who propose using a novel concept for converting the thin Martian atmosphere into heat and electricity. Their findings were recently published in National Science Review and could help revolutionize how electricity is produced on Mars through a process called in situ resource utilization (ISRU) without the need for power or power supplies being shipped from Earth.

For the study, the researchers propose several concepts for producing power and electricity on a future human Mars mission, including Martian air capture, in situ power generation and storage, and life support resources transformation. The team notes all these methods carry their own benefits and challenges while emphasizing the importance of using ISRU for powering future human Mars missions.

Single-molecule RNA mapping may reveal how shape shifts steer health and disease

Researchers from A*STAR Genome Institute of Singapore (A*STAR GIS) have developed a new method to study individual RNA molecules and reveal how their structures influence gene regulation, a fundamental process that affects how cells function in health and disease. Their work was published in Nature Methods.

RNA is best known for carrying genetic instructions from DNA to make proteins. However, RNA does more than act as a messenger. Like a string that can bend, fold and interact with other molecules, RNA can adopt different shapes that affect how it behaves in the cell. These shapes can influence how efficiently proteins are produced, how long RNA molecules last, and how diseases such as viral infections progress.

Until now, studying these structures in detail has been difficult because RNA is highly flexible and dynamic. Most existing methods only provide an average picture across many RNA molecules, making it harder to see how individual RNA molecules may fold differently, even when they come from the same gene.

/* */