AbstRaL: Teaching LLMs Abstract Reasoning via Reinforcement to Boost Robustness on GSM Benchmarks
Recent research indicates that LLMs, particularly smaller ones, frequently struggle with robust reasoning. They tend to perform well on familiar questions but falter when those same problems are slightly altered, such as changing names or numbers, or adding irrelevant but related information. This weakness, known as poor out-of-distribution (OOD) generalization, results in notable accuracy drops, […] The post AbstRaL: Teaching LLMs Abstract Reasoning via Reinforcement to Boost Robustness on GSM Benchmarks appeared first on MarkTechPost. read more