Huawei-led team claims it post-trained DeepSeek’s 1.6-trillion-parameter model – 1,000 Ascend 910C chips used in training
The Great AI Decoupling: Can Domestic Silicon Actually Topple the Nvidia Empire?
For years, the narrative surrounding AI development in China has been one of desperation and workarounds. With U.S. Export controls tightening the noose around high-end GPUs, the industry has been forced to ask a critical question: Can you build a frontier AI model without an Nvidia H100?
The recent news that a Huawei-led research group completed full-parameter post-training of the 1.6-trillion-parameter DeepSeek V4-Pro on Ascend 910C chips is more than just a technical milestone. It is a signal that the “silicon curtain” is falling, and a parallel AI ecosystem is emerging.
Beyond Inference: The Shift Toward Training Sovereignty
Historically, Chinese accelerators have been “inference-competent.” They could take a model trained on Nvidia hardware and run it reasonably well. However, the “holy grail” is training—the process of recalculating billions of weights across massive datasets.
The move to full-parameter post-training is a significant leap. Unlike “adapter-based” tuning (like LoRA), which only updates a tiny fraction of the model, full-parameter tuning touches every single weight. For a model as behemoth as the DeepSeek V4-Pro, this requires immense stability and high-speed chip-to-chip communication.
If Huawei can prove that the Ascend 910C can handle these workloads at scale, the strategic value of Nvidia’s dominance begins to erode. We are moving toward a future where “AI sovereignty” means owning the entire stack—from the sand in the chip to the tokens in the model.
The “Scaling Law” Workaround
One trend we are seeing is the use of sheer volume to compensate for individual chip inferiority. While a single Ascend 910C might not match an H100 in raw power, a cluster of 1,000+ chips working in concert can bridge the gap. The future of AI hardware in restricted markets will likely rely on massive-scale clustering and innovative interconnect fabrics to mimic the efficiency of a more powerful, single-chip architecture.

The Invisible Battle: CANN vs. CUDA
Hardware is only half the story. The real moat Nvidia has built isn’t just silicon; it’s CUDA. CUDA is the software layer that allows developers to talk to the GPU. It is the industry standard, and moving away from it is like trying to switch an entire city from driving cars to riding hoverboards overnight.
Huawei’s CANN (Compute Architecture for Neural Networks) is the direct challenger. The struggle DeepSeek faced with its R2 model—citing unstable performance and software gaps—highlights that the “software tax” is the biggest hurdle for domestic silicon.
The trend moving forward will be the “abstraction of the backend.” We will see more frameworks that allow developers to write code once and deploy it across different hardware (Nvidia, Huawei, AMD) without rewriting the entire stack. This “hardware-agnostic” AI development is the only way to break the CUDA monopoly.
Future Trend: The Rise of Efficiency-First Architectures
Because hardware is limited, we are entering an era of algorithmic efficiency. When you can’t buy more compute, you have to get smarter with the compute you have. This leads to several emerging trends:
- Model Distillation: Creating smaller, “student” models that mimic the performance of 1.6-trillion-parameter “teacher” models.
- Sparse Architectures: Moving away from dense models to Mixture-of-Experts (MoE), where only a fraction of the model activates for any given prompt.
- Post-Training Optimization: A heavier reliance on high-quality, curated data for tuning rather than raw, brute-force pre-training.
For more on how these architectures are changing the game, check out our deep dive on AI model optimization strategies.
The Geopolitical Bifurcation of AI
We are witnessing the birth of two distinct AI worlds. On one side, the Western ecosystem powered by Nvidia and AMD; on the other, a Chinese ecosystem centered around Huawei and Biren.

This bifurcation will lead to different “flavors” of AI. Western models may continue to push the boundaries of raw scale, while Chinese models may lead the world in resource-constrained efficiency. In the long run, the side that learns to do more with less often wins the economic war.
Frequently Asked Questions
A: It is a tuning process where every single weight in a neural network is updated to refine the model’s behavior, safety, and instruction-following capabilities, rather than just adding a small layer of new parameters on top.
A: It represents China’s best attempt to create a domestic alternative to Nvidia’s high-end AI chips, aiming to reduce reliance on foreign technology amidst trade sanctions.
A: While post-training is a success, pre-training a frontier model from scratch is significantly more demanding. It remains unproven whether current domestic silicon can handle a full pre-training run of a trillion-parameter model with the same efficiency as Nvidia hardware.
Join the Conversation
Do you think domestic silicon can eventually outperform Nvidia, or is the CUDA moat too deep to cross? Let us know your thoughts in the comments below!
Want more insider tech analysis? Subscribe to our weekly AI Intelligence newsletter.