👋 If you are a new reader, my name is Danar Mustafa. I write about product management focusing on AI, tech, business and agile management. You can visit my website here.
In today’s article, I will talk about Deepseek - The AI Startup that shook the tech world.
What is Deepseek?
DeepSeek is a Chinese artificial intelligence startup founded in July 2023 by Liang Wenfeng, who previously co-founded one of China’s leading hedge funds focused on AI-driven quantitative trading. The company is based in Hangzhou, China, and has quickly gained attention for its advancements in AI technology, particularly with its large language models that are said to rival those developed by major U.S. companies like OpenAI.
Deepseek - Technology Capabilities:
DeepSeek’s flagship product is an AI assistant that has become popular among users, achieving the status of the most downloaded free app on Apple’s iPhone store shortly after its release. The company claims that its latest AI model demonstrates advanced reasoning abilities and operates at a significantly lower cost compared to similar models from competitors. For instance, their model R1 reportedly showcases capabilities such as re-evaluating approaches to mathematical problems.
The technology behind DeepSeek’s models utilizes Nvidia’s H800 chips, which are less powerful than the A100 chips previously used by Liang’s hedge fund but are not subject to export restrictions imposed by the U.S. government. This choice indicates that DeepSeek aims to demonstrate that high-end hardware may not be essential for cutting-edge AI research.
Deepseek - Market Impact
The emergence of DeepSeek has caused significant ripples in the stock market, particularly affecting shares of established tech companies like Nvidia and ASML. Analysts have noted that DeepSeek’s ability to deliver competitive AI solutions at a fraction of the cost—estimated at around $6 million for development—contrasts sharply with the projected investments of nearly $1 trillion by major U.S. firms over the coming years.
This disruption has led to substantial declines in stock prices for companies heavily invested in AI technologies, with Nvidia experiencing a historic one-day loss of 17%, amounting to a $600 billion drop in market value. The implications of DeepSeek’s technology raise questions about future demand for high-performance chips and whether investors have overvalued tech stocks based on anticipated advancements in AI.
Deepseek - How Can they train their AI Models so cheaply?
DeepSeek managed to train their AI model at a fraction of the usual cost by optimizing the hardware they had access to, specifically using NVIDIA H800 GPUs instead of the more advanced H100 models. Despite the limitations of the H800, which has lower chip-to-chip bandwidth and performance caps, DeepSeek engineers focused on low-level code optimizations that maximized memory usage efficiency. This approach allowed them to extract maximum performance from their existing resources without needing to invest in more expensive hardware.
Traditional AI training often involves updating all parts of a model, leading to significant resource waste. DeepSeek addressed this inefficiency by implementing a technique called Auxiliary-Loss-Free Load Balancing. This method ensures that only the most relevant parts (or experts) of their mixture of experts (MoE) model are activated and updated during training. By dynamically distributing tasks to the appropriate sections of the model and avoiding unnecessary updates, they significantly reduced computational costs.
In running AI models, particularly during inference (the process of generating outputs), memory usage can be extremely high and costly. DeepSeek utilized an innovative approach known as Low-Rank Key-Value (KV) Joint Compression. This technique allows for efficient storage of key-value pairs essential for attention mechanisms while reducing overall memory requirements without sacrificing performance. By compressing these key-value pairs, they achieved faster results and lower operational costs.
Is Deepseek free to use?
DeepSeek is free to use on the web, app, and API platforms. However, users are required to create an account to access its features. The startup has positioned its app as a competitive alternative to other AI models in the market, emphasizing that it provides advanced capabilities without any cost to the user.
The DeepSeek app has gained significant traction since its launch, topping the charts in various countries and attracting millions of downloads. This free access model allows users to explore and utilize the AI’s functionalities without financial barriers, which can be particularly appealing in a landscape where many competing services charge subscription fees.
Deepseek - US vs China in the AI Race:
DeepSeek’s rapid rise also plays into broader geopolitical tensions between the U.S. and China regarding technological competition. Some experts suggest that the timing of DeepSeek’s announcements may be politically motivated, aiming to challenge U.S. export controls on semiconductor technologies and showcase China’s capabilities in AI innovation.