Join our research team! Recruiting volunteer/paid positions for interested undergraduate students.
Large Language Models (LLMs) are transforming how we interact with technology, enabling capabilities like text generation, translation, and complex question answering. However, a key challenge is the phenomenon of "hallucination," where LLMs produce factually incorrect or fabricated information, undermining their reliability. This is especially concerning as LLMs are increasingly used in critical sectors such as healthcare, education, and finance.
Our research addresses this issue by focusing on detecting "hallucination spans"—parts of the LLM output that are incorrect—particularly in multilingual contexts. We propose an approach that analyzes the "logits" (internal numerical representations) generated by LLMs during text production. By treating hallucination detection as an anomaly detection problem, we analyze these logits as time-series data to identify unusual patterns indicative of factual inaccuracies.
We extract key features from the token-level logits, capturing statistical properties that differentiate reliable text from hallucinated content. A machine learning model is then trained to detect these anomalies, enabling accurate, probabilistic predictions of hallucinations at a granular, character level.
This method operates solely on the LLM's output, avoiding external knowledge sources. Our approach demonstrates competitive performance in identifying hallucination spans across multiple languages, contributing to the responsible and trustworthy deployment of AI systems. By addressing hallucinations, we aim to improve the reliability and fairness of AI, ensuring equitable access to trustworthy technology across diverse languages and communities.
2024-2025