Huawei-led team claims it post-trained DeepSeek's 1.6-trillion-parameter model - 1,000 Ascend 910C chips used in training

The Great AI Decoupling: Can Domestic Silicon Actually Topple the Nvidia Empire?

For years, the narrative surrounding AI development in China has been one of desperation and workarounds. With U.S. Export controls tightening the noose around high-end GPUs, the industry has been forced to ask a critical question: Can you build a frontier AI model without an Nvidia H100?

The recent news that a Huawei-led research group completed full-parameter post-training of the 1.6-trillion-parameter DeepSeek V4-Pro on Ascend 910C chips is more than just a technical milestone. It is a signal that the “silicon curtain” is falling, and a parallel AI ecosystem is emerging.

Pro Tip: When analysing AI hardware claims, always distinguish between inference (running a model) and training (creating/tuning a model). Training is orders of magnitude more demanding on memory bandwidth and interconnect speeds.

Beyond Inference: The Shift Toward Training Sovereignty

Historically, Chinese accelerators have been “inference-competent.” They could take a model trained on Nvidia hardware and run it reasonably well. However, the “holy grail” is training—the process of recalculating billions of weights across massive datasets.

The move to full-parameter post-training is a significant leap. Unlike “adapter-based” tuning (like LoRA), which only updates a tiny fraction of the model, full-parameter tuning touches every single weight. For a model as behemoth as the DeepSeek V4-Pro, this requires immense stability and high-speed chip-to-chip communication.

If Huawei can prove that the Ascend 910C can handle these workloads at scale, the strategic value of Nvidia’s dominance begins to erode. We are moving toward a future where “AI sovereignty” means owning the entire stack—from the sand in the chip to the tokens in the model.

The “Scaling Law” Workaround

One trend we are seeing is the use of sheer volume to compensate for individual chip inferiority. While a single Ascend 910C might not match an H100 in raw power, a cluster of 1,000+ chips working in concert can bridge the gap. The future of AI hardware in restricted markets will likely rely on massive-scale clustering and innovative interconnect fabrics to mimic the efficiency of a more powerful, single-chip architecture.

Did you know? The DeepSeek V4-Pro was pre-trained on over 32 trillion tokens. To put that in perspective, that is equivalent to millions of libraries’ worth of text, all processed to create a coherent “world model.”

The Invisible Battle: CANN vs. CUDA

Hardware is only half the story. The real moat Nvidia has built isn’t just silicon; it’s CUDA. CUDA is the software layer that allows developers to talk to the GPU. It is the industry standard, and moving away from it is like trying to switch an entire city from driving cars to riding hoverboards overnight.

Huawei’s CANN (Compute Architecture for Neural Networks) is the direct challenger. The struggle DeepSeek faced with its R2 model—citing unstable performance and software gaps—highlights that the “software tax” is the biggest hurdle for domestic silicon.

The trend moving forward will be the “abstraction of the backend.” We will see more frameworks that allow developers to write code once and deploy it across different hardware (Nvidia, Huawei, AMD) without rewriting the entire stack. This “hardware-agnostic” AI development is the only way to break the CUDA monopoly.

Future Trend: The Rise of Efficiency-First Architectures

Because hardware is limited, we are entering an era of algorithmic efficiency. When you can’t buy more compute, you have to get smarter with the compute you have. This leads to several emerging trends:

DeepSeek V4 Runs on Huawei Ascend 950 Chips – China Kills NVIDIA

Model Distillation: Creating smaller, “student” models that mimic the performance of 1.6-trillion-parameter “teacher” models.
Sparse Architectures: Moving away from dense models to Mixture-of-Experts (MoE), where only a fraction of the model activates for any given prompt.
Post-Training Optimization: A heavier reliance on high-quality, curated data for tuning rather than raw, brute-force pre-training.

For more on how these architectures are changing the game, check out our deep dive on AI model optimization strategies.

The Geopolitical Bifurcation of AI

We are witnessing the birth of two distinct AI worlds. On one side, the Western ecosystem powered by Nvidia and AMD; on the other, a Chinese ecosystem centered around Huawei and Biren.

This bifurcation will lead to different “flavors” of AI. Western models may continue to push the boundaries of raw scale, while Chinese models may lead the world in resource-constrained efficiency. In the long run, the side that learns to do more with less often wins the economic war.

Frequently Asked Questions

Q: What is “full-parameter post-training”?

A: It is a tuning process where every single weight in a neural network is updated to refine the model’s behavior, safety, and instruction-following capabilities, rather than just adding a small layer of new parameters on top.

Q: Why is the Ascend 910C important?

A: It represents China’s best attempt to create a domestic alternative to Nvidia’s high-end AI chips, aiming to reduce reliance on foreign technology amidst trade sanctions.

Q: Can these chips pre-train a model from scratch?

A: While post-training is a success, pre-training a frontier model from scratch is significantly more demanding. It remains unproven whether current domestic silicon can handle a full pre-training run of a trillion-parameter model with the same efficiency as Nvidia hardware.

Join the Conversation

Do you think domestic silicon can eventually outperform Nvidia, or is the CUDA moat too deep to cross? Let us know your thoughts in the comments below!

Want more insider tech analysis? Subscribe to our weekly AI Intelligence newsletter.

Huawei-led team claims it post-trained DeepSeek’s 1.6-trillion-parameter model – 1,000 Ascend 910C chips used in training