M
MercyNews
Home
Back
LLMs Don't Hallucinate – They Drift
Technology

LLMs Don't Hallucinate – They Drift

Hacker News2h ago
3 min read
📋

Key Facts

  • ✓ A new framework challenges the common use of the term "hallucination" for AI errors, proposing "semantic drift" as a more accurate descriptor.
  • ✓ The framework introduces a method for measuring "fidelity decay," which quantifies how a model's output deviates from expected meaning over time.
  • ✓ This conceptual shift provides a structured diagnostic tool for analyzing and addressing reliability issues in large language models.
  • ✓ The approach reframes AI errors as predictable outcomes of complex processing rather than random, inexplicable failures.
  • ✓ The framework has been detailed in a recent conference contribution, signaling a move toward more rigorous evaluation metrics in AI research.

In This Article

  1. Quick Summary
  2. Redefining AI Errors
  3. Measuring Fidelity Decay
  4. Implications for AI Development
  5. A New Diagnostic Lens
  6. Looking Ahead

Quick Summary#

The terminology surrounding artificial intelligence errors is undergoing a significant transformation. A new framework challenges the widely used term "hallucination" when describing large language model (LLM) failures, proposing a more precise alternative: semantic drift.

This conceptual shift is detailed in a recent conference contribution that introduces a method for measuring fidelity decay within AI systems. The framework provides a structured way to diagnose how and why model outputs deviate from expected or factual information, moving beyond anecdotal descriptions toward quantifiable metrics.

Redefining AI Errors#

The term "hallucination" has become a catch-all for when AI models generate incorrect or nonsensical information. However, this metaphor is criticized for being imprecise and anthropomorphic. The new framework argues that what is often called a hallucination is better understood as a form of semantic drift—a gradual or sudden departure from intended meaning or factual grounding.

This reframing is not merely semantic; it has practical implications for diagnosis and improvement. By viewing errors as drift, developers can trace the degradation of information through the model's processing pipeline. The framework provides a method to measure this decay, offering a clearer lens through which to analyze model behavior.

  • Shifts from vague "hallucination" to measurable "semantic drift"
  • Introduces "fidelity decay" as a quantifiable metric
  • Provides a diagnostic framework for model errors

Measuring Fidelity Decay#

At the core of the new framework is the concept of fidelity decay. This metric allows researchers to quantify how much a model's output drifts from a source of truth or a given prompt over time or through successive processing steps. It transforms a subjective observation into an objective measurement.

The framework establishes a systematic approach to tracking this decay. Instead of labeling an output as simply "wrong," analysts can now measure the degree of deviation. This enables more nuanced comparisons between different models, prompts, or architectural changes, focusing on the stability of semantic meaning rather than just factual accuracy.

The framework provides a method to measure this decay, offering a clearer lens through which to analyze model behavior.

Implications for AI Development#

Adopting the language of semantic drift and fidelity decay could reshape AI development and evaluation. It moves the conversation from blaming a model for "making things up" to understanding the systemic factors that cause information to degrade. This perspective encourages a more engineering-focused approach to reliability.

For developers, this means new tools for debugging and improving model performance. For users, it offers a more transparent understanding of AI limitations. The framework suggests that errors are not random failures but predictable outcomes of complex processing, which can be measured, monitored, and potentially mitigated through targeted interventions.

  • Enables precise tracking of information degradation
  • Facilitates comparison between different model architectures
  • Shifts focus to systemic causes of errors

A New Diagnostic Lens#

The proposed framework serves as a diagnostic tool for the AI community. By categorizing and measuring different types of drift, it helps identify specific failure modes within large language models. This structured analysis is crucial as these models become more integrated into critical applications where reliability is paramount.

The discussion around this framework has already begun within technical communities, highlighting a growing demand for more rigorous methods to assess AI performance. As the field matures, the ability to precisely measure and describe model behavior will be essential for building more trustworthy and effective AI systems.

Errors are not random failures but predictable outcomes of complex processing.

Looking Ahead#

The move from "hallucination" to "semantic drift" represents a maturation in the discourse surrounding artificial intelligence. It reflects a deeper understanding of how these complex systems operate and fail. This framework provides the vocabulary and methodology needed for more productive conversations about AI safety and reliability.

As research continues to build on this foundation, the concepts of fidelity decay and semantic drift will likely become standard in the evaluation of large language models. This evolution in terminology is a critical step toward developing AI that is not only more powerful but also more predictable and transparent in its operation.

Continue scrolling for more

AI Transforms Mathematical Research and Proofs
Technology

AI Transforms Mathematical Research and Proofs

Artificial intelligence is shifting from a promise to a reality in mathematics. Machine learning models are now generating original theorems, forcing a reevaluation of research and teaching methods.

Just now
4 min
388
Read Article
The great e-bike crackdown has begun
Politics

The great e-bike crackdown has begun

This is The Stepback, a weekly newsletter breaking down one essential story from the tech world. For more on the e-bike movement, follow Andrew J. Hawkins. The Stepback arrives in our subscribers' inboxes at 8AM ET. Opt in for The Stepback here. How it started Last week, I did something I don't typically do, which is call up one of my elected officials and yell at them about a new bill. New Jersey's car-brained lawmakers had just passed legislation that would impose heavy restrictions on e-bike ownership in the state, and I was livid. Obviously there's been a lot of concern about the growing number of teenagers being injured and killed w … Read the full story at The Verge.

2h
3 min
0
Read Article
3 Lunar Craters to Explore Tonight
Science

3 Lunar Craters to Explore Tonight

The Moon's first quarter phase offers a perfect viewing opportunity to explore three craters named after astronomers who revolutionized our understanding of the night sky.

2h
5 min
1
Read Article
Alarm Overload Undermining Maritime Safety
Technology

Alarm Overload Undermining Maritime Safety

Crews on modern vessels are facing an overwhelming barrage of alerts, with new research showing they can receive tens of thousands of notifications daily. This constant stream of data is creating dangerous distraction and alarm fatigue, undermining safety at sea.

2h
7 min
1
Read Article
Nango Launches Remote Hiring Initiative
Technology

Nango Launches Remote Hiring Initiative

YC-backed infrastructure company Nango announces remote hiring push, expanding its developer platform team globally.

3h
5 min
1
Read Article
The Science Behind Ice's Slippery Surface
Science

The Science Behind Ice's Slippery Surface

A thin, watery layer coating the surface of ice is what makes it slick. Despite a great deal of theorizing over the centuries, though, it isn't entirely clear why that layer forms.

3h
5 min
0
Read Article
Phonak Audeo Infinio Ultra Sphere Review
Technology

Phonak Audeo Infinio Ultra Sphere Review

A new prescription hearing aid features a special chip designed to improve hearing in noisy conditions, offering a potential breakthrough for users.

3h
5 min
1
Read Article
Bonsplit: Native macOS Tab and Split Management
Technology

Bonsplit: Native macOS Tab and Split Management

Bonsplit introduces tab and split window management to native macOS applications, offering a streamlined workflow for users seeking better window organization.

3h
5 min
1
Read Article
New TUI Tool Simplifies Linux Default App Management
Technology

New TUI Tool Simplifies Linux Default App Management

A developer has released a new terminal user interface program designed to simplify the management of default applications on the Linux desktop, offering a streamlined alternative to existing methods.

3h
5 min
2
Read Article
Hedge Funds Tap Prediction Markets for Data Edge
Economics

Hedge Funds Tap Prediction Markets for Data Edge

While some trading firms are actively using prediction market platforms, the 'smart money' is focused on the data being generated to help inform their bets. This new data stream offers insights into market sentiment and potential shifts in consensus expectations.

4h
7 min
7
Read Article
🎉

You're all caught up!

Check back later for more stories

Back to Home