Published 2 days ago • loading... • Updated 20 hours ago

Simple Text Additions Can Fool Advanced AI Reasoning Models, Researchers Find

Summary by slashdot.org

Researchers have discovered that appending irrelevant phrases like "Interesting fact: cats sleep most of their lives" to math problems can cause state-of-the-art reasoning AI models to produce incorrect answers at rates over 300% higher than normal [PDF]. The technique -- dubbed "CatAttack" by teams...

This story is only covered by news sources that have yet to be evaluated by the independent media monitoring agencies we use to assess the quality and reliability of news outlets on our platform. Learn more here.

4 Articles

All

Left

Center

Right

MarkTechPost

AbstRaL: Teaching LLMs Abstract Reasoning via Reinforcement to Boost Robustness on GSM Benchmarks

Recent research indicates that LLMs, particularly smaller ones, frequently struggle with robust reasoning. They tend to perform well on familiar questions but falter when those same problems are slightly altered, such as changing names or numbers, or adding irrelevant but related information. This weakness, known as poor out-of-distribution (OOD) generalization, results in notable accuracy drops, even in simple math tasks. One promising solution…

20 hours ago

Read Full Article

the-decoder.com

"Cat attack" on reasoning model shows how important context engineering is

A research team has discovered that even simple phrases like "cats sleep most of their lives" can significantly disrupt advanced reasoning models, tripling their error rates. The article "Cat attack" on reasoning model shows how important context engineering is appeared first on THE DECODER.

1 day ago

Read Full Article

the-decoder.de

Cat Attack on Reasoning Model Shows How Important "Context Engineering" Is

A research team uses harmless phrases like "cats sleep most of their lives" to bring state-of-the-art reasoning models out of the concept. The article Cat Attack on Reasoning Model shows how important "Context Engineering" is first appeared on THE-DECODER.de.

1 day ago·Germany

Read Full Article