NVIDIA GB200: Surpassing Expectations in Progress and Demand with Amazing Industry Collaboration

1. CoWoS Demand Exceeds Expectations

Morgan Stanley's latest report mentions that NVIDIA's Blackwell chips are entering mass production, and more CoWoS capacity will likely be needed by 2025.

TSMC's CoWoS capacity at the end of 2025 has been increased to 80-90k wpm (previously 68k wpm). In terms of capacity allocation, Nvidia alone accounts for nearly 70%, with Broadcom as the second-largest customer, occupying about 13%. A point of interest is that AMD also occupies 9%, which is roughly 13% of Nvidia's share.

2. Strong Blackwell Demand, Beyond Expectations

Oracle recently announced the construction of a 130,000-card NVL72 supercluster, capable of providing 2.4 ZettaFLOPS of AI computing power. Oracle's request for increased GPU supply is beyond expectations, indicating that Nvidia's growth potential is far from being fully tapped.

A key point here is that a 100,000-card cluster will become the most exciting thing next year. Microsoft, Meta, Tesla, AWS, and Google have already built or are about to build their own 100,000-card AI clusters. Now, Oracle joining this group is quite unexpected. With Oracle's current market value of $450 billion, investing in a 100,000-card cluster requires at least $5 billion, which is quite an extravagant investment for Oracle, which only has a net profit of $10 billion in 2024.

Advertisement

However, this indicates a problem: for American tech giants, the urgency and importance of obtaining AI resources have not eased, and investing a large amount of computing resources will not change in the slightest due to the so-called "shift of computing power to inference." Regarding the issue of domestic A-share investment, it will be discussed at the end of the article.

3. B-Series Capacity Demand Exceeds Expectations, Prompting TSMC to Expand CoWoS Capacity in 2025

Morgan Stanley estimates that TSMC will increase its CoWoS capacity to 80k-90k wpm in 2025 (previously 68k), and TSMC's capital expenditure in 2025 will reach $38 billion.

At the beginning of the month, we mentioned in the community that TSMC has solved the problem of poor CoWoS-L capacity yield by replacing materials, increasing from about 85% before the replacement to about 90% at the beginning of the month (8-9% lower than S). TSMC's goal is to raise the yield to above 95% before mass production and then gradually approach the 98-99% yield of CoWoS-S.Here is the English translation of the provided text:

Regarding the CoWoS production capacity issue, the expectation for August is 70k for next year and 90-100k for 2026. Our view is that the production capacity forecast has been increased for 2025, and it will also be increased for 2026, with our expectation being 110-120k. Friends interested in the specific logic and professional topics are welcome to join the planet chat and download the complete report.

4. Hopper's demand is healthy, and Blackwell is close to mass production.

Regarding the potential inventory risk for H200, Morgan Stanley analyst Daniel Yen's view is that "the growth in Hopper GPU demand comes from smaller CSP (cloud service provider) customers, and the end market may be sovereign AI. Regarding Sovereign AI, we have also discussed it in private. This is a very unexpected event, with Singapore and the Middle East aggressively promoting the construction of sovereign AI.

The main CSPs may still wait for GB200 cabinets. Regarding Blackwell production, the current production plan is 250k Blackwell chips per month, with 60% to be completed in 24Q4.

Translated, this part of the Blackwell chips in 24Q4 corresponds to NVIDIA obtaining more than $10 billion in revenue. This is a significant over-expectation compared to the previous concerns of $3-5 billion, or even delays and postponements. This means that B's mass production work is going very smoothly.

5. The PCB redesign work for Switch trays has basically been completed.

The assembly work for GB200 server racks is still in the de-bugging debugging stage. The heat/leakage issues have been resolved, and the PCB redesign work for Switch trays appears to be completed. The latest issue is in interconnectivity, and the supply chain is working hard to solve this problem.

Regarding de-bugging, my view is that it is very normal to have problems because the engineering difficulty of NVL72 is indeed too great, and the entire supply chain is feeling its way through the river. It is normal for other issues to arise in the future, such as replacing overpass with PCB, etc., and none of these are major problems. My view on NVIDIA is:

Huang will not be bound by any technology; NVIDIA's philosophy in completing the overall design and iteration is high performance, low power consumption, and controllable costs. Whatever technology can solve the problem, that's what will be used. If copper cables are good, use copper cables; if optical modules are good, use optical modules; if CPO can be mass-produced and costs are controllable in the future, use CPO; if CPO encounters problems, switch back to optical modules.

Regarding Switch trays and various trays, here is an image.

Please note that the translation provided is a direct translation of the original text, and some terms may require further clarification or context for full understanding, especially in the context of specific industry jargon or technical details.6. The AI chip supply chain is the most efficient in the world, without a doubt.

In an article titled "A supply chain with no elasticity from ASML to TSMC to Nvidia," we observe a supply chain that is monopolistic at every stage. What is truly astonishing is that their monopoly is achieved through strong organizational capabilities and technological innovation in technological competition. TSMC, in particular, has taken semiconductor wafer foundry services to the extreme, relying on a robust organizational system that can gather and disperse resources effectively.

As mentioned in previous articles, after enduring the trials of the B-series delays, Nvidia's competitiveness will be strengthened once again, and its monopolistic position will continue to rise. Based on this, in the subsequent competition for inference cards, Nvidia's competitiveness and the market share it gains may exceed expectations.

The B-series faced issues from design to TSMC's CoWoS-L yield problems, and then encountered numerous challenges in server debugging, starting from July. However, it is incredible that two months later, the vast majority of these issues have been resolved.

The innovations in this supply chain are purely binary, 0-1 technological advancements. While they may not be of the highest order, the ability to solve design, material, assembly, and other problems within two months without any link in the chain failing is truly remarkable.

In contrast, domestic collaboration on this matter has been underwhelming and mediocre. It is a largely public secret that Huawei's 910C encountered some issues, though we do not know the specifics, and the duration of the delay has been somewhat prolonged. This indicates that our capabilities at each stage of the entire chain are still relatively weak, and it is currently unrealistic to build a supply chain like Nvidia's. We can only rely on Huawei.

The relative backwardness is an objective reality; TSMC and ASML are not ordinary players and are not comparable to what we have domestically at present. What I want to say is that we are aware that since 2021, there has been a significant emergence of talent in the AI chip field domestically, including industrial professionals from Taiwan. However, inter-industry collaboration is indeed not systematic. We know that in the United States, there are only Nvidia and AMD for AI training cards, but domestically, we have at least ten unicorn companies...

The issue reflected behind this matter is somewhat severe, but I will not delve into it here... as it might make the article too long.

The significance of this matter is that relying solely on Huawei is ultimately quite challenging. When we can have top-tier technology companies in China closely collaborate in cutting-edge fields, that is when we will truly no longer be at risk of being strangled by supply chain constraints.The domestic investment logic, from last year to this year, has actually seen the main investment opportunities in the AI hard technology chain concentrated in the optical module and PCB sectors. Optical modules have benefited more, hence they started earlier and have seen greater increases compared to PCBs; PCBs are a newly explored investment area this year due to factors such as low valuation and rapid growth, and they do indeed benefit, but they are only a part of the overall benefit.

As for copper cables, the market scale is indeed too small, with a scale of just a few billion RMB. The future growth rate, as mentioned earlier, is not fixed; the technology that works best is used, without a clear trend. Therefore, after the turmoil of overpass replacement with PCBs, it is prudent to remain cautious about copper cables.

The major opportunities in the future are still concentrated in optical modules and PCBs. For PCBs, there is a logic of both increased quantity and price for HDI, while for optical modules, it's the expectations for 1.6T and CPO. It may be slower, but the industry trend is certain. Investing at the right valuation is still very good.

Please enter your full name.
Please enter your email address.
Please enter your phone number.
Please enter a subject for your message.
Please enter a message.
Please agree to the terms & conditions and privacy policy.