DeepSeek: Domestic Chip Adaptation

Advertisements

As the DeepSeek phenomenon sweeps through the tech landscape, domestic GPU manufacturers are diving headlong into this adaptation wave. However, despite the apparent similarity in moves, each company has distinct advantages and strategies that set them apart.

Today, industry reports often focus on the sheer number of companies adapting to DeepSeek. However, there is a notable lack of in-depth exploration of the differences among these companies. Is it a divergence in technological routes, varying performance levels, diverse ecosystem developments, or differing application scenarios that leads to these distinctions?

Choosing Between Original and Distilled Models

When it comes to adapting DeepSeek models, the actions of chip manufacturers can generally be categorized into two groups. One focuses on adapting the original R1 and V3 models, while the other targets the lighter distilled versions derived from R1.

The distinctions among these three models are significant:

DeepSeek R1 is positioned as an inference-priority model, designed for scenarios requiring deep logical analysis and problem-solving. It excels across several tasks, including mathematics, programming, and reasoning.

Conversely, DeepSeek V3 is a general-purpose large language model that supports efficient and flexible applications across a variety of natural language processing tasks, catering to the needs of multiple fields. The original R1 and V3 models usually possess a larger parameter count, resulting in a more complex structure.

The DeepSeek-R1 series of distilled models offers a lightweight version with fewer parameters, intended to maintain a certain level of performance while reducing resource consumption. This makes it suitable for lightweight deployments and scenarios with limited resources, such as edge device inference and rapid AI application validation for small to medium enterprises.

Even though manufacturers are racing to adapt to DeepSeek, the types of models they are adapting differ greatly.

While mainstream GPU vendors are accelerating the adaptation of DeepSeek models, only about half have explicitly announced their support for the original R1 and V3 models. These models have extremely high requirements regarding chip computing power, memory bandwidth, and multi-card interconnect technology. Companies like Huawei Ascend and Haiguang Information fall into this category.

The other portion of manufacturers primarily supports the DeepSeek-R1 series of distilled models (with parameter specifications ranging from 1.5 billion to 8 billion). Since the original models are based on Tongyi Qianwen and LLAMA, platforms that can support Tongyi Qianwen and LLAMA models can generally adapt to these distilled DeepSeek models with minimal extra effort. Companies like Moore Threads and Birun Tech are examples of this group.

Different sizes of models are suited for varying scenarios; cloud-side inference requires larger model parameters and optimal performance, primarily adapting to the original R1 or V3 models; edge-side chips typically accommodate models in the 1.5B to 8B range, which are mature enough that no substantial extra work is needed.

Company Advantages: What Sets Them Apart?

In addition to the differences in model types, companies have adopted various technological routes, resulting in distinct challenges during adaptation.

Firstly, considering the current technological ecosystem and practical application scenarios, running and adapting DeepSeek models primarily rely on Nvidia's hardware and programming language. As such, each manufacturer's adaptability is contingent upon their compatibility with the original development ecosystem.

This means that, at present, DeepSeek mainly adapts to Nvidia chips, which influences the application and performance of other hardware platforms. The ease of adapting large models like DeepSeek, developed based on Nvidia GPUs, is related to whether the chips are compatible with CUDA. Manufacturers compatible with CUDA have varying degrees of interoperability.

Secondly, performance varies across different GPUs in terms of computational capacity (such as FLOPS and memory bandwidth), directly affecting the speed at which DeepSeek can handle large-scale deep learning tasks. Some GPUs may demonstrate superior efficiency ratios, making them suitable for running DeepSeek in low-power environments.

Commercial Applications of DeepSeek

The commercial deployment of DeepSeek can take various forms:

Cloud Deployment:

For instance, DeepSeek models can provide services through the Huawei Cloud platform, allowing enterprise customers to utilize DeepSeek's capabilities such as image recognition, natural language processing, and speech recognition via API calls or cloud services. Companies pay based on actual usage (like computing resources or the number of API calls), reducing initial investment costs. This cloud service model eliminates the need for enterprises to deploy hardware on-site, allowing for quick implementation and application.

On-Premise Deployment:

There are also integrated machine forms: currently, DeepSeek large model integrated machines are categorized into inference machines and training-inference machines. The DeepSeek inference integrated machine is equipped with different size models like DeepSeek-R1 32B, 70B, and full-version 671B, priced from hundreds of thousands to several million yuan, targeting companies sensitive to data security and privacy. The training-integrated machines, which are even more expensive, can reach millions in price, particularly for those designed for pre-training and fine-tuning the DeepSeek-R1 32B model.

Enterprises can also deploy solutions themselves: for those with extremely high performance requirements (such as autonomous driving or financial risk control) or those demanding high security (like government and financial institutions), the DeepSeek model can be deployed locally on hardware like GPU chips to achieve maximum performance.

Currently, the commercial model shows that due to high costs associated with deploying GPU chips and DeepSeek models on-premises, enterprise users will initially test on public clouds to ascertain compatibility with their needs, eventually considering private cloud deployments or integrated machine forms. Therefore, small and medium-sized enterprises might lean more towards utilizing related technologies through cloud services.

Of course, some enterprises that prioritize data security or urgently require high performance capabilities are investing tens of thousands or even millions to deploy integrated machines that meet their requirements. As the development of DeepSeek's open-source model progresses, the demand for privatized deployments has increasingly emerged, leading to a burgeoning market for integrated machines and attracting numerous companies to engage.

Who is Excelling in Commercializing DeepSeek?

In terms of DeepSeek's concept, both Ascend and Haiguang have made significant strides toward commercialization.

Integrated machines are in high demand, benefiting Ascend:

Around 70% of businesses are expected to adopt DeepSeek based on Ascend technology.

Recently, companies such as Huakun Zhiyu, Baode, Shenzhou Kuntai, and Yangtze Computing have released DeepSeek integrated machines, all built on Ascend products.

Notably, as the frequency of DeepSeek integrated machine releases increases, the industrial alliance surrounding Ascend continues to broaden.

Reports indicate that more than 80 companies have rapidly adapted or launched DeepSeek series models based on Ascend technology, providing external services. An additional 20 or more companies are expected to go live in the next two weeks. This signifies that about 70% of Chinese enterprises are aligning themselves with Ascend to adopt DeepSeek.

Compared to imported GPU solutions, the localized services and teams associated with Ascend chips significantly influence the deployment outcomes of DeepSeek. For instance, in a data center with thousands of cards, the automatic parallelism feature of the MindSpore toolchain reduces the amount of distributed training code by 70%.

Haiguang: Penetrating diverse scenarios including intelligent computing centers and finance:

Haiguang's collaboration with DeepSeek covers critical scenarios such as intelligent computing centers, finance, and smart manufacturing.

In the realm of intelligent computing centers, Haiguang Information has partnered with QCloud Technology to launch the “Haiguang DCU + Base Stone Intelligent Computing + DeepSeek Model” solution, supporting flexible billing based on tokens to lower the entry barriers for enterprise AI applications.

In financial technology, Zhongke Jincai has collaborated with Haiguang Information Technology Co., Ltd. to launch an integrated soft and hardware solution that combines self-developed multi-scenario models with the Haiguang DCU acceleration cards and achieves in-depth adaptation with DeepSeek models.

In smart manufacturing, the Haiguang DCU empowers industrial visual inspection and automated decision-making by adapting the DeepSeek-Janus-Pro multimodal model, assisting companies like SANY Heavy Industry in achieving intelligent upgrades on production lines.

In data management, the smart data management platform developed by Kongtian will fully adapt to Haiguang DCU, embedding DeepSeek into the platform as a “super engine” to support data processing in fields like natural resources, energy, and aerospace.

Moreover, JD Cloud has also released a DeepSeek large model integrated machine that supports domestic AI acceleration chips like Huawei Ascend and Haiguang.

Opportunities for Domestic GPUs are Arising

With the rollout and widespread application of integrated machines like DeepSeek, the market demand for domestic chips is significantly increasing.

Yang Jian, CTO of Muxi Technology, noted that many non-Nvidia cards are expected to join the large model post-training section this year. He believes that the privatization of large models like DeepSeek presents an opportunity for domestic chips.

“The opportunity for domestic GPUs in 2025 lies in privatized deployment, primarily focusing on post-training and inference of large models,” Yang noted. He explained that Nvidia GPUs, while they have become prevalent in the AI sector, are becoming scarce in retail markets, and privatized deployments rely significantly on the retail sector. Should the privatized deployment market erupt, domestic cards could see considerable opportunity.

As challenges arising from overseas chip computing restrictions become closer, global computing capabilities may evolve along two parallel paths, gradually decoupling. By 2026-2027, the strong GPU base for pre-training and post-training is expected to remain Nvidia in the US, while a portion in China will be handled by Nvidia and another portion by domestic chips. The post-training section will likely welcome more non-Nvidia cards this year, as its requirements for clusters are relatively lower and do not require thousands of cards.

Individuals involved with Tianshui Zhixin indicate that as breakthroughs emerge in domestic models, the demand for compatibility with domestic chips is increasing, presenting substantial growth opportunities this year.

The surge of interest surrounding the DeepSeek model also implies opportunities for the explosion of AI applications, steering chip manufacturers toward the necessary inference computing capabilities for AI. Last year, Chinese chip evaluations primarily focused on training, viewing domestic chips as alternatives to Nvidia in training situations. However, starting in 2025, there will be a shift where attention gradually moves to domestic chips in the inference market.

Yu Qingyang, a senior consulting advisor at the Digital Economy Research Center of the Shanghai Institute of Artificial Intelligence, highlighted the positive impact of DeepSeek on domestic chips. “DeepSeek reduces ineffective training by 60% with its reinforcement learning mechanism, while demand for parallel computing is decreased by 40% compared to traditional architectures, enhancing the energy efficiency of domestic chips in specific computational tasks to around 75% of Nvidia GPUs.”

Moreover, not just limited to GPU chips, ASIC and FPGA chips with differentiated advantages in AI inference are also poised for significant development opportunities. It is crucial to note that although the surge of interest in DeepSeek models presents multiple opportunities for domestic chip firms, the reliance on Nvidia's CUDA ecosystem remains prominent, necessitating further improvements in interconnectivity and ecological development within domestic chip companies.

DeepSeek: Domestic Chip Adaptation

Leave a Reply