From Why Research at Enkefalos?

We do research to solve problems that matter in the real world

Our clients operate in regulated, high-risk industries (insurance, finance, public safety).

These domains need trustworthy AI that can reason, infer, and adapt — not just autocomplete.

Generic LLMs are fragile and verbose. We’re fixing that by pushing the limits of model reasoning .

Each paper informs a product — whether it’s our InsurancGPT , custom GenAI solutions, or low-resource language models .

Impact of Noise on LLM-Models Performance in Abstraction and Reasoning Corpus (ARC) Tasks with Model Temperature Considerations

Abstract

Recent advancements in Large Language Models (LLMs) have sparked interest in their structured reasoning capabilities, particularly in abstraction and pattern recognition tasks. The Abstraction and Reasoning Corpus (ARC) benchmark serves as a key evaluation tool for assessing AI models’ ability to generalize and solve novel reasoning tasks. While GPT-4o successfully solves all ARC tasks at zero noise, models such as DeepSeek R1 and LLaMA 3.2 fail to solve any, raising questions about their abstraction and generalization capabilities beyond pattern matching. To investigate this further, we evaluate these models under varying noise levels and temperature settings. Our findings indicate that introducing noise significantly degrades performance across all models, underscoring their fragility under uncertain conditions. This suggests that while some models demonstrate reasoning abilities, they remain highly sensitive to input perturbations, limiting their robustness. By analyzing how different architectures handle noise and uncertainty, we provide insights into the limitations of current AI systems in structured reasoning. Our study highlights the need for more resilient AI models that can adapt to real-world complexity, informing future research on improving generalization, robustness, and alignment with human cognitive flexibility.

Other White Papers

Whitepaper 2

Exploring Next Token Prediction in Theory of Mind (ToM) Tasks: Comparative Experiments with GPT-2 and LLaMA-2 AI Models

Whitepaper 3

Representational Alignment in Theory of Mind

Whitepaper 4

InsurancGPT: Secure and Cost-Effective LLMs for the Insurance Industry

Enkefalos Research

From Why Research at Enkefalos?

Impact of Noise on LLM-Models Performance in Abstraction and Reasoning Corpus (ARC) Tasks with Model Temperature Considerations

Other White Papers