As AI systems increasingly tackle complex reasoning tasks,
understanding how they internally structure mental states
is crucial. While prior research has explored representational
alignment in vision models, its role in higher-order cognition
remains under-examined, particularly in Theory of Mind (ToM) tasks.
This study evaluates how AI models encode and compare mental states
in ToM tasks, focusing on tasks such as False Belief, Irony,
and Faux Pas reasoning. Using a triplet-based similarity framework,
we assess whether structured reasoning models (e.g., DeepSeek R1)
exhibit better alignment than token-based models like LLaMA. While
AI models correctly answer individual ToM queries, they fail to recognize
broader conceptual structures, clustering stories by surface-level
textual similarity rather than belief-based organization. This misalignment
persists across 0th, 1st, and 2nd-order ToM reasoning, highlighting
a fundamental gap between human and AI cognition. Moreover, explicit
reasoning mechanisms in DeepSeek do not reliably improve alignment, as
models struggle to capture hierarchical ToM structures. To further probe
this gap, we propose extending representational analysis to temporally
evolving, multi-agent belief systems—capturing how beliefs about beliefs
shift across time and interaction. Our findings suggest that achieving deeper
AI alignment requires moving beyond task accuracy toward developing structured,
human-like mental representations. Using triplet-based alignment metrics,
we propose a novel approach to quantify AI cognition and guide future improvements
in reasoning, interpretability, and social alignment. Additionally, we propose
this representational framework as a potential foundation for a noninvasive,
scalable cognitive monitoring tool for early-stage dementia or Alzheimer’s,
analogous to fMRI-based biomarkers but deployable through everyday interactions
on mobile platforms.