M
MercyNews
Home
Back
DeepSeek Unveils AI Training Breakthrough for Model Scaling
Technology

DeepSeek Unveils AI Training Breakthrough for Model Scaling

Business InsiderJan 2
3 min read
📋

Key Facts

  • ✓ DeepSeek published a research paper on a new training method called Manifold-Constrained Hyper-Connections (mHC).
  • ✓ The method is designed to scale models without them becoming unstable or breaking.
  • ✓ Wei Sun, principal analyst for AI at Counterpoint Research, called the approach a 'striking breakthrough.'
  • ✓ The paper was co-authored by DeepSeek founder Liang Wenfeng.
  • ✓ DeepSeek is reportedly working toward the release of its next flagship model, R2.

In This Article

  1. Quick Summary
  2. The Technical Innovation: Manifold-Constrained Hyper-Connections
  3. Industry Analysts React to the Breakthrough
  4. Context: The Road to R2 and Market Position

Quick Summary#

China's DeepSeek has initiated 2026 with the publication of a new AI training method that industry analysts are calling a significant advancement for the sector. The research paper introduces a technique designed to scale large language models more effectively without the instability often associated with growing model sizes. By enabling models to share richer internal communication in a constrained manner, the method preserves training stability and computational efficiency.

The paper, co-authored by founder Liang Wenfeng, details a process dubbed Manifold-Constrained Hyper-Connections (mHC). This approach addresses the challenge of maintaining performance as models grow, a critical hurdle in current AI development. Analysts suggest this innovation could shape the evolution of foundational models and allow the company to bypass compute bottlenecks, potentially unlocking new leaps in intelligence.

The Technical Innovation: Manifold-Constrained Hyper-Connections#

The Chinese AI startup published a research paper on Wednesday describing a method to train large language models that could shape "the evolution of foundational models." The paper introduces what DeepSeek calls Manifold-Constrained Hyper-Connections, or mHC, a training approach designed to scale models without them becoming unstable or breaking altogether.

As language models grow, researchers often try to improve performance by allowing different parts of a model to share more information internally. However, this increases the risk of the information becoming unstable. DeepSeek's latest research enables models to share richer internal communication in a constrained manner, preserving training stability and computational efficiency even as models scale.

By redesigning the training stack end-to-end, the company is signaling that it can pair rapid experimentation with highly unconventional research ideas. This technical feat is viewed by industry observers as a statement of DeepSeek's internal capabilities.

"The approach is a 'striking breakthrough.'"

— Wei Sun, Principal Analyst for AI at Counterpoint Research

Industry Analysts React to the Breakthrough#

Analysts have reacted positively to the publication, describing the approach as a "striking breakthrough." Wei Sun, the principal analyst for AI at Counterpoint Research, noted that DeepSeek combined various techniques to minimize the extra cost of training a model. She added that even with a slight increase in cost, the new training method could yield much higher performance.

Sun further stated that DeepSeek can "once again, bypass compute bottlenecks and unlock leaps in intelligence," referring to the company's "Sputnik moment" in January 2025. During that time, the company unveiled its R1 reasoning model, which shook the tech industry and the US stock market by matching top competitors at a fraction of the cost.

Lian Jye Su, the chief analyst at Omdia, told Business Insider that the published research could have a ripple effect across the industry, with rival AI labs developing their own versions of the approach. "The willingness to share important findings with the industry while continuing to deliver unique value through new models showcases a newfound confidence in the Chinese AI industry," Su said. He added that openness is embraced as "a strategic advantage and key differentiator."

Context: The Road to R2 and Market Position#

The paper comes as DeepSeek is reportedly working toward the release of its next flagship model, R2, following an earlier postponement. R2, which had been expected in mid-2025, was delayed after Liang expressed dissatisfaction with the model's performance. The launch was also complicated by shortages of advanced AI chips, a constraint that has increasingly shaped how Chinese labs train and deploy frontier models.

While the paper does not mention R2, its timing has raised eyebrows. DeepSeek previously published foundational training research ahead of its R1 model launch. Su said DeepSeek's track record suggests the new architecture will "definitely be implemented in their new model."

However, Wei Sun is more cautious regarding the timeline. "There is most likely no standalone R2 coming," Sun said. Since DeepSeek has already integrated earlier R1 updates in its V3 model, she believes the technique could form the backbone of DeepSeek's V4 model instead. Despite these innovations, reports suggest that DeepSeek's updates to its R1 model failed to generate much traction in the tech industry, with distribution remaining a challenge compared to leading AI labs like OpenAI and Google, particularly in Western markets.

"Deepseek can 'once again, bypass compute bottlenecks and unlock leaps in intelligence.'"

— Wei Sun, Principal Analyst for AI at Counterpoint Research

"The willingness to share important findings with the industry while continuing to deliver unique value through new models showcases a newfound confidence in the Chinese AI industry."

— Lian Jye Su, Chief Analyst at Omdia

"Openness is embraced as 'a strategic advantage and key differentiator.'"

— Lian Jye Su, Chief Analyst at Omdia

"There is most likely no standalone R2 coming."

— Wei Sun, Principal Analyst for AI at Counterpoint Research

Continue scrolling for more

AI Transforms Mathematical Research and Proofs
Technology

AI Transforms Mathematical Research and Proofs

Artificial intelligence is shifting from a promise to a reality in mathematics. Machine learning models are now generating original theorems, forcing a reevaluation of research and teaching methods.

Just now
4 min
169
Read Article
Technology

Meta Pivots to AI, Cuts VR Jobs

Meta has initiated significant layoffs within its Reality Labs division and shuttered multiple VR studios. This strategic move signals a major pivot towards artificial intelligence, redirecting company resources and focus.

1h
4 min
6
Read Article
China Warns of Foreign Mapping Operations Targeting Geodata
Politics

China Warns of Foreign Mapping Operations Targeting Geodata

China's top counter-espionage agency has issued a stark warning regarding overseas entities attempting to steal the country's geographic data through covert mapping operations.

1h
3 min
7
Read Article
Kiefer Sutherland Arrested After Altercation
Entertainment

Kiefer Sutherland Arrested After Altercation

The '24' star was taken into custody by the Los Angeles Police Department following an incident near Sunset Boulevard and Fairfax Avenue. Authorities responded to a call regarding an assault.

2h
3 min
6
Read Article
BTS Announces 2026-2027 World Tour After Military Service
Entertainment

BTS Announces 2026-2027 World Tour After Military Service

After a nearly four-year hiatus, BTS has officially announced a massive 2026-2027 world tour spanning five continents and more than 70 dates. The comeback marks the group's first headline performances since completing mandatory military service.

2h
5 min
7
Read Article
The Hidden Cost of Everyday Deception
Health

The Hidden Cost of Everyday Deception

Small lies may seem harmless, but they can create isolation and anxiety. Discover the psychological impact of bending the truth.

2h
3 min
6
Read Article
Economics

Lotofácil Contest 3586: R$5 Million Jackpot Rolls Over

The latest Lotofácil draw concluded without a grand prize winner, causing the jackpot to accumulate to R$5 million. Discover the winning numbers for Contest 3586 and the full breakdown of prize tiers.

2h
5 min
7
Read Article
Quina Contest 6926: Jackpot Reaches R$8 Million
Economics

Quina Contest 6926: Jackpot Reaches R$8 Million

The Quina contest 6926 concluded without a top winner, causing the jackpot to accumulate to R$8 million for the next drawing.

2h
5 min
6
Read Article
Lionsgate Sells Lionsgate Play Asia to Founder Rohit Jain
Economics

Lionsgate Sells Lionsgate Play Asia to Founder Rohit Jain

Lionsgate has sold its South Asian and Southeast Asian streaming operations to founder Rohit Jain. The deal marks a major shift in the regional streaming landscape.

2h
4 min
6
Read Article
Political Theorist Claims He 'Red Pilled' AI Chatbot
Technology

Political Theorist Claims He 'Red Pilled' AI Chatbot

A political theorist has published a transcript he claims demonstrates the ease with which artificial intelligence can be manipulated to reflect specific ideological viewpoints.

2h
3 min
6
Read Article
🎉

You're all caught up!

Check back later for more stories

Back to Home