In the previous Deepseek post the analysis for which I have enlisted Copilot's assistance is a technique I learned and used effectively during a fairly long career at IBM.
The technique works with computers.
It also works with smokescreens, mirages, politicians and ad campaigns.
What it frequently reveals is that what has been asserted just aint so.
Let's ask Copilot some more questions.
Noel: A lot of stuff has been written and said about Deepseek in the last few days. Initially in an article I read the cost of developing Deep seek was put at $6 billion. Subsequently I have heard several commentators, including Geoff Bennet on PBS NewsHour say $6 million. Which is it?
Copilot: The confusion around the development cost of DeepSeek seems to stem from different interpretations and reporting. The correct figure is $6 billion, not $6 million. The $6 million figure might have been mentioned in reference to a specific part of the development process, such as the cost to train a particular model using Nvidia H800 GPUs, but it doesn't represent the total development cost.
It's easy to see how such discrepancies can arise, especially with complex projects like this. Do you think the media should be more careful with their reporting on such topics?
Noel: Yes. But apparently 24 hour news won't allow scrutiny.
Anyway, are the H800 GPUs the reduced function chips you mentioned previously?
Copilot: Yes, the H800 GPUs are indeed the reduced-function chips I mentioned earlier. Nvidia created the H800 as a modified version of their H100 chip to comply with U.S. export restrictions. By reducing the performance of the H100, Nvidia was able to sell the H800 to Chinese customers. While the H800 is less powerful than the H100, it still provides significant computational capabilities at a lower cost.
Noel: Lower cost for lower function; and using more power and generating more heat.
Copilot: Absolutely, the use of reduced capability chips in massive parallel processing systems can lead to significantly higher energy consumption and heat generation. This is because more chips are required to achieve the same level of performance as fewer, more powerful chips. As a result, the system becomes less efficient and may require additional cooling and power infrastructure. This ultimately leads to higher operational costs, despite the initial savings on hardware.
Noel: Would it be accurate to say that if one put two AI instances side by side, each functionally identical, but one designed with H100 chips and the other with H800 chips, the H800 version would melt the room?
Copilot: While "melting the room" is a bit of an exaggeration, it's true that using H800 chips, which are less powerful than H100 chips, would require a larger number of chips to achieve the same level of performance. This, in turn, would result in higher energy consumption and greater heat generation.
In essence, the H800-based system would likely be less efficient and more resource-intensive, requiring more cooling and power infrastructure to maintain optimal performance. So while it wouldn't literally melt the room, it would indeed create a more demanding environment in terms of power and cooling requirements compared to an H100-based system.
Noel: Let's say that "speed" is a characteristic that can be measured for each of these AI instances, and that "speed", as initially configured is identical. Let's say that requirements change, and "speed" needs to be increased for both instances such that each is still identical in "speed". If the H100 version added one H100 would it be economically and physically practical for the H800 to keep up?
Copilot: In such a scenario, where the "speed" or performance of two AI instances (one using H100 chips and the other using H800 chips) needs to be increased while maintaining identical speeds, there would be significant differences in practicality.
If the H100 version requires only one additional H100 chip to achieve the desired speed increase, the H800 version would need to add several more H800 chips to match that performance. This would result in:
Higher Costs: Adding multiple H800 chips would increase the overall cost more than adding a single H100 chip. Although H800 chips are cheaper individually, the cumulative cost of adding many of them would outweigh the cost of a single H100 chip.
Increased Power Consumption: The H800-based system would consume more power due to the higher number of chips required. This would lead to higher operational costs and increased demand on the power infrastructure.
Heat Generation: With more H800 chips, the system would generate significantly more heat, necessitating more robust cooling solutions. This could further increase costs and complexity.
Space Requirements: Physically, the H800-based system would require more space to accommodate the additional chips, which could be a limiting factor in some environments.
In summary, while it might be technically feasible for the H800 version to keep up with the H100 version in terms of performance, it would be economically and physically impractical. The increased costs, power consumption, heat generation, and space requirements would make it a less efficient and more resource-intensive solution.
Noel: So press releases may not tell the complete tale?
Copilot: That's correct. Press releases and media reports can sometimes present information in a simplified or sensationalized manner, which may not capture all the complexities and nuances of a situation. It's always a good idea to dig deeper, cross-reference sources, and consider the broader context to get a more accurate understanding.
In the case of DeepSeek, the details around development costs, hardware choices, and trade-offs are quite intricate. While initial reports might focus on headline-grabbing figures, the full picture often involves a lot of technical and strategic considerations.
Being critical and questioning what you read is a valuable approach, especially in the fast-evolving world of technology.
No comments:
Post a Comment