Whitepapers - Enkefalos - Your partner for digital innovation

Schedule a Demo

Our Whitepapers

Each whitepaper below is a snapshot of our research journey, solving one core weakness in current AI systems.

Whitepaper 1

Impact of Noise on LLM-Models Performance in Abstraction and Reasoning Corpus (ARC) Tasks with Model Temperature Considerations

What’s the problem? Models like GPT-4o fail when even tiny noise is introduced into abstract reasoning tasks.
What we did: Used the ARC benchmark and systematically injected noise (0.05–0.3%) into grids to test model resilience.
Key Result: GPT-4o collapsed under minimal noise. LLaMA and DeepSeek failed even in noiseless conditions.
Use case: We now build noise-aware abstraction engines in AI copilots.

arXiv Link : https://arxiv.org/pdf/2504.15903

Whitepaper 2

Exploring Next Token Prediction in Theory of Mind (ToM) Tasks: Comparative Experiments with GPT-2 and LLaMA-2 AI Models

What’s the problem? LLMs lose grounding when you increase distractors in narrative tasks, failing to infer intent.
What we did: Inserted 0 to 64 distractor sentences in Theory-of-Mind stories and tracked token prediction.
Key Result: Both GPT-2 and LLaMA-2 showed significant performance degradation. LLaMA-2 resisted better but still struggled with nested beliefs.
Use case: We are designing intent-tracking models that maintain coherence under ambiguity.

arXiv Link : https://arxiv.org/pdf/2504.15903

Whitepaper 3

Representational Alignment in Theory of Mind

What’s the problem? Most models don’t organize internal knowledge by belief or perspective—they cluster by keywords.
What we did: Conducted triplet-based alignment tasks and evaluated similarity matrices.
Key Result: Alignment improved ToM reasoning accuracy. Our work was featured at the ICLR 2025 Re-Align Workshop.
Use case: This research powers our belief-aware reasoning layers in copilots.

OpenReview Link : https://openreview.net/pdf?id=MbQ9lZ5d5z

Whitepaper 4

InsuranceGPT: Secure and Cost-Effective LLMs for the Insurance Industry

What’s the problem? General LLMs don’t perform well on industry-specific tasks like claims, underwriting, or policy compliance.
What we did: Built InsuranceGPT, a fine-tuned Mistral-based model trained on NAIC, CPCU, and claims data.
Architecture: Combines Direct Preference Optimization (DPO) with Retrieval-Augmented Generation (RAG).
Key Result: Outperformed GPT-4 and GPT-3.5 in BLEU, METEOR, and BERTScore on insurance tasks.
Use case: Now deployed in production copilots across document intelligence, policy QA, and fraud detection.

Link : https://insurancgpt.ai/research.html#