Economic Corner 11 - What should you see after a Deepseek? 01/28/2025
Event created by richardmurray
Event details
This event began 01/28/2025 and repeats every year forever
DeepSeek and the quality of usa finance
MY THOUGHTS
600 billion dollars. Nvidia lost 17th percent of its value in a day . Like many USA firms or industries outside military products, they are weak from the 1900s to today.
DeepSeek said it cost 5 million dollars to produce a product rivaling any comparative computer program in storage/speed/calculation at 1/20th of the cost. So this proves the value of the usa firms is incorrect. which is my issue. Tesla was given such a high value. The USA's financial environment allows for a bloating of firms, like Nvidia, like Tesla that to be blunt, have each lost huge market shares which they shouldn't. The fact that the best electric cars are made in china exposes Tesla's management to me. The fact that Nvidia who was part of an industry that biden gave billions of investment to and are playing catchup exposes the chip industry in the usa. The fact that OpenAI and Anthropic isn't open source, and have been outed for their financial dysfunction, demanding such investment while not making the code public exposes them.
Yes, I will use this economic corner to share DeepSeek information as best I can. But my agenda is actually not about DeepSeek but the financial argument that the USA has a problem in the investment in technologies. There are those who believe that the one world has already been created and the USA is really the binder to all governments, in that mindset, no one is competing because the usa is really, the interchange between all governments. Human history proves fissures that are wanted, eventually become real, even if it takes a long time. The lesson in Chinese industries to all non white European governments, is to consider how they research , how they approach technological development. Is it about the Massachusetts institute of technology M.I.T. , is it about Stanford, is it about nepotism? I remember being a college student and I remember so often it was blacks who graduated from an oxford or an M.I.T. that would be given opportunities but didn't have the imagination or passion to do well with them. And the reason is simple, as anyone non white european knows, many people, including many asians that go to college in the usa are more interested with the appearance of intellect than being an ambitious creative. And for the record, the black people two generations earlier than mine, in my bloodline, earned multiple degrees or graduated from the ivy league schools, so my position is not about not going to an ivy league school or gaining multiple degrees, which i find so many black people love to suggest in a very enslaved way when another black person speaks of imaginations speaks of passion. Getting degrees for too many Black Descendent of Enslaved people is a Keeping up with the Jones act, to compare to other blacks in a view display to whites, not an important act to creativity or learning. The second article below may convince you, of my point in this economic corner, which has been uttered by many Black DOSers since the end of the war between the states in the usa.
I quote the first article below, and the source article the quotes are from are present.
Liang told Chinese tech publication 36Kr that the decision was motivated by scientific curiosity, not a desire to make a profit. “I couldn’t find a commercial reason to start DeepSeek even if you asked me,” he said. “Because it’s not commercially viable. Basic research has a very low return on investment. When OpenAI’s early investors gave it money, they probably didn’t think about the return they would get. Rather, they really wanted to do this business.”
...
While OpenAI o1 costs $15 per million incoming tokens and $60 per million outgoing tokens, the DeepSeek Reasoner API based on the R1 model offers $0.55 per million incoming tokens and $2.19 per million outgoing tokens.
...
To train its models, the High-Flyer hedge fund purchased more than 10,000 NVIDIA H100 GPUs before the US export restrictions were introduced in 2022. Billionaire and Scale AI CEO Alexander Wang recently told CNBC that he estimates that DeepSeek now has about 50,000 NVIDIA H100 chips that they cannot talk about precisely because of US export controls. If this estimate is correct, then compared to the leading companies in the AI industry, such as OpenAI, Google, and Anthropic, this is very small. After all, each of them has more than 500,000 GPUs.
...
This also calls into question the feasibility of the Stargate project, an initiative under which OpenAI, Oracle, and SoftBank promise to build next-generation AI data centers in the United States, allegedly willing to spend up to $500 billion.
Deepseek provides detailed technical reports explaining how the models work, as well as code that anyone can look at and try to copy.
Code on hugging face
https://huggingface.co/deepseek-ai/DeepSeek-R1
The code on GitHub
https://github.com/deepseek-ai/DeepSeek-R1
ARTICLES
Where DeepSeek came from and who is behind the AI lab that shocked Silicon Valley
Taras Mishchenko
Editor-in-Chief of Mezha.Media. Taras has more than 15 years of experience in IT journalism, writes about new technologies and gadgets.
28.01.2025 at 09:56
A new artificial intelligence model DeepSeek-R1 from the Chinese laboratory DeepSeek appeared as if from nowhere. For the general public, the first mentions of it began to appear in the media only last week, and now it seems that everyone is talking about DeepSeek. Moreover, in just a week, the DeepSeek app has overtaken the well-known ChatGPT in the US App Store rankings. The model has also skyrocketed to the top downloadson the Hugging Face developer platform, asdevelopers are rushing to try it out and understand what this release can bring to their AI projects. So, logical questions arise: where did DeepSeek come from, who is behind this startup, and why has it made so much noise. I will try to answer them in this article.
Where DeepSeek came from
Given the history of Chinese tech companies, DeepSeek should have been a project of giants like Baidu, Alibaba, or ByteDance. But this AI lab was launched in 2023 by High-Flyer, a Chinese hedge fund founded in 2015 by entrepreneur Liang Wenfeng. He made a fortune using AI and algorithms to identify patterns that could affect stock prices. The hedge fund quickly gained popularity in China, and was able to raise more than 100 billion yuan (about $15 billion). Since 2021, this figure has dropped to about $8 billion, but High-Flyer is still one of the most important hedge funds in the country.
As High-Flyer’s core business overlapped with the development of AI models, the hedge fund accumulated GPUs over the years and created Fire-Flyer supercomputers to analyze financial data. In the wake of the growing popularity of ChatGPT, a chatbot from the American company OpenAI, Liang, who also holds a master’s degree in computer science, decided in 2023 to invest his fund’s resources in a new company called DeepSeek, which was to create its own advanced models and develop general artificial intelligence (AGI).
Liang told Chinese tech publication 36Kr [ https://36kr.com/p/2272896094586500 ] that the decision was motivated by scientific curiosity, not a desire to make a profit. “I couldn’t find a commercial reason to start DeepSeek even if you asked me,” he said. “Because it’s not commercially viable. Basic research has a very low return on investment. When OpenAI’s early investors gave it money, they probably didn’t think about the return they would get. Rather, they really wanted to do this business.”
According to Liang, when he assembled DeepSeek’s R&D team, he also didn’t look for experienced engineers to build a consumer-facing product. Instead, he focused on doctoral students from top universities in China, including Peking University, Tsinghua University, and Beihang University, who were eager to prove themselves. Many of them had published in top journals and won awards at international academic conferences, but had no industry experience, according to Chinese technology publication QBitAI. [ https://www.qbitai.com/2025/01/241000.html ; identity of workers at DeepSeek]
“Our main technical positions are mostly filled by people who graduated this year or within the last one or two years,” Liang said in an interview in 2023. He believes that students may be better suited for high-investment, low-return research. “Most people, when they are young, can fully commit to a mission without utilitarian considerations,” Liang explained. His pitch to potential employees is that DeepSeek was created to “solve the world’s toughest questions.”
Liang, who is personally involved in DeepSeek’s development, uses the proceeds from his hedge fund to pay high salaries to top AI talent. Along with TikTok owner ByteDance, DeepSeek is known in China for providing top compensation to AI engineers, and staff are based in offices in Hangzhou and Beijing.
Liang positions DeepSeek as a uniquely “local” company, staffed by PhDs from leading Chinese universities. In an interview with the domestic press last year, he said that his core team “didn’t have any people who came back from abroad. They are all local… We have to develop the best talent ourselves.” DeepSeek’s identity as a purely Chinese LLM company has earned it popularity at home, as this approach is fully in line with Chinese government policy.
This week, Liang was the only representative of China’s AI industry chosen to participate in a highly publicized meeting of entrepreneurs with the country’s second-in-command, Li Qiang. Entrepreneurs were told to “focus on breakthroughs in key technologies.”
Not much is known about how DeepSeek started building its own large language models (LLMs), but the lab quickly opened their source code, and it is likely that, like many Chinese AI developers, it relied on open source projects created by Meta, such as the Llama model and the Pytorch machine learning library. At the same time, DeepSeek’s particular focus on research makes it a dangerous competitor for OpenAI, Meta, and Google, as the AI lab is, at least for now, willing to share its discoveries rather than protect them for commercial gain. DeepSeek has not raised funds from outside and has not yet taken significant steps to monetize its models. However, it is not known for certain whether the Chinese government is involved in financing the company.
What makes the DeepSeek-R1 AI model unique
In November, DeepSeek first announced that it had achieved performance that surpassed the leading-edge OpenAI o1 model, but at the time it only released a limited R1-lite-preview model. With the release of the full DeepSeek-R1 model last week and the accompanying white paper, the company introduced a surprising innovation: a deliberate departure from the traditional supervised fine-tuning (SFT) process that is widely used for training large language models (LLMs).
SFT is a standard approach for AI development and involves training models on prepared datasets to teach them step-by-step reasoning, often referred to as a chain of thought (CoT). However, DeepSeek challenged this assumption by skipping SFT entirely and instead relying on reinforcement learning (RL) to train DeepSeek-R1.
According to Jeffrey Emanuel, a serial investor and CEO of blockchain company Pastel Network, DeepSeek managed to outpace Anthropic in the application of the chain of thought (CoT), and now they are practically the only ones, apart from OpenAI, who have made this technology work on a large scale.
At the same time, unlike OpenAI, which is incredibly secretive about how these models actually work at a low level and does not provide the actual model weights to anyone other than partners like Microsoft, these DeepSeek models are completely open and permissively licensed. They have released extremely detailed technical reports explaining how the models work, as well as code that anyone can look at and try to copy.
With R1, DeepSeek essentially cracked one of the holy grails of AI: getting models to reason step by step without relying on massive teacher datasets. Their DeepSeek-R1-Zero experiment showed something remarkable: using pure reinforcement learning with carefully designed reward functions, the researchers were able to get the models to develop complex reasoning capabilities completely autonomously. It wasn’t just problem solving-the model organically learned to generate long chains of thought, check its own work, and allocate more computational time to more complex problems.
In this way, the model learned to revise its thinking on its own. What is particularly interesting is that during training, DeepSeek observed what they called an “aha moment,” a phase when the model spontaneously learned to revise its chain of thought mid-process when faced with uncertainty. This sudden behavior was not explicitly programmed, but arose naturally from the interaction between the model and the reinforcement learning environment. The model literally stopped itself, flagged potential problems in its reasoning, and restarted with a different approach, all without being explicitly trained to do so.
DeepSeek also solved one of the main problems in reasoning models: language consistency. Previous attempts at chain-of-thought reasoning often resulted in models mixing languages or producing incoherent output. DeepSeek solved this problem by smartly rewarding language consistency during RL training, sacrificing a slight performance hit for a much more readable and consistent output.
As a result, DeepSeek-R1 achieves high accuracy and efficiency. At AIME 2024, one of the toughest math competitions for high school students, R1 achieved 79.8% accuracy, which is in line with OpenAI’s o1 model. At MATH-500, it reached 97.3%, and at the Codeforces programming competition, it reached the 96.3 percentile. But perhaps most impressively, DeepSeek was able to distill these capabilities down to much smaller models: their 14 billion-parameter version outperforms many models several times its size, showing that reasoning power depends not only on the number of parameters but also on how you train the model to process information.
However, the uniqueness of DeepSeek-R1 lies not only in the new approach to model training, but also in the fact that it is the first time a Chinese AI model has gained such great popularity in the West. Users, of course, immediately went to ask it questions about Tiananmen Square and Taiwan that were sensitive to the Chinese government, and quickly realized that DeepSeek was censored. Indeed, it would be futile to expect a Chinese AI lab to not comply with Chinese law or policy.
However, many developers consider this censorship to be an infrequent extreme case in real-world use that can be mitigated by fine-tuning. Therefore, it is unlikely that the issue of ethical use of DeepSeek-R1 will stop many developers and users who want to get access to the latest AI development and essentially for free.
Of course, for many, the security of the data remains a question mark, as DeepSeek-R1 probably stores it on Chinese servers. But as a precautionary measure, you can try the model on Hugging Face in sandbox mode [ https://huggingface.co/deepseek-ai/DeepSeek-R1 ] , or even run it locally on your PC if you have the necessary hardware. In such cases, the model will not be fully functional, but it will remove the issue of data transfer to Chinese servers.
How much did it cost to develop DeepSeek-R1?
To train its models, the High-Flyer hedge fund purchased more than 10,000 NVIDIA H100 GPUs before the US export restrictions were introduced in 2022. Billionaire and Scale AI CEO Alexander Wang recently told CNBC that he estimates that DeepSeek now has about 50,000 NVIDIA H100 chips that they cannot talk about precisely because of US export controls. If this estimate is correct, then compared to the leading companies in the AI industry, such as OpenAI, Google, and Anthropic, this is very small. After all, each of them has more than 500,000 GPUs.
According to NVIDIA engineer Jim Fan, DeepSeek trained its base model, called V3, with a budget of $5.58 million over two months. However, it is difficult to estimate the total cost of training DeepSeek-R1. The use of 60,000 NVIDIA GPUs could potentially cost hundreds of millions of dollars, so the exact figures remain speculative.
Why DeepSeek-R1 shocked Silicon Valley
DeepSeek largely disrupts the business model of OpenAI and other Western companies working on their own closed AI models. After all, DeepSeek-R1 not only performs better than the best open-source alternative, Llama 3 by Meta. The model transparently shows the entire chain of thought in its answers. This is a blow to the reputation of OpenAI, which has hitherto hidden the thought chains of its models, citing trade secrets and the fact that it does not want to embarrass users when the model is wrong.
In addition, DeepSeek’s success emphasizes that cost-effective and efficient AI development methods are realistic. We have already determined that in the case of a Chinese company, it is difficult to calculate the cost of development, and there may always be “surprises” in the form of multi-billion dollar government funding. But at the moment, DeepSeek-R1, with a similar level of accuracy to OpenAI o1, is much cheaper for developers. While OpenAI o1 costs $15 per million incoming tokens and $60 per million outgoing tokens, the DeepSeek Reasoner API based on the R1 model offers $0.55 per million incoming tokens and $2.19 per million outgoing tokens.
However, while DeepSeek’s innovations are groundbreaking, they have by no means given the Chinese AI lab market leadership. As DeepSeek has published its research, other AI model development companies will learn from it and adapt. Meta and Mistral, a French open-source model development company, may be a bit behind, but it will probably only take them a few months to catch up with DeepSeek. As Ian LeCun, a leading AI researcher at Meta, said: “The idea is that everyone benefits from the ideas of others. No one is “ahead” of anyone and no country is “losing” to another. No one has a monopoly on good ideas. Everyone learns from everyone.”
DeepSeek’s offerings are likely to continue to lower the cost of using AI models, which will benefit not only ordinary users but also startups and other businesses interested in AI. But if developing a DeepSeek-R1 model with fewer resources does turn out to be a reality, it could be a problem for AI companies that have invested heavily in their own infrastructure. In particular, years of operating and capital expenditures by OpenAI and others could be wasted.
The market doesn’t yet know the final answer to whether AI development will indeed require less computing power in the future, but it is already reacting nervouslywith a drop in shares of NVIDIA and other suppliers of AI data center components. This also calls into question the feasibility of the Stargate project, an initiative under which OpenAI, Oracle, and SoftBank promise to build next-generation AI data centers in the United States, allegedly willing to spend up to $500 billion.
But on the other hand, while American companies will still have excess capacity for the development of artificial intelligence, China’s DeepSeek, with the US export restrictions on chips still in place, may face a severe shortage. If we assume that resource constraints have indeed pushed it to innovate and allowed it to create a competitive product, the lack of computing power will simply prevent it from scaling, while competitors will catch up. Therefore, despite all the innovation of DeepSeek, it is still too early to say that Chinese companies will be able to compete with Western AI tech giants, even if we put aside the issues of censorship and data security.
Question and Answer excerpts from 疯狂的幻方:一家隐形AI巨头的大模型之路
...
36Kr: What deductions and assumptions have we made about the business model?
Liang Wenfeng: What we want now is that we can share most of our training results publicly, so that it can be combined with commercialization. We hope that more people, even a small app, can use large models at a low cost, instead of technology only in the hands of some people and companies, forming a monopoly.
...
36Kr: In any case, it's a bit crazy for a commercial company to do a kind of research exploration with unlimited investment.
Liang Wenfeng: If you have to find a commercial reason, it may not be found, because it can't be done.
From a business point of view, basic research has a very low return on investment. When OpenAI's early investors invested money, they must not have thought about how much return I would get back, but really wanted to do it.
What we are more certain now is that since we want to do this and have the ability, we are one of the most suitable candidates at this point in time.
...
36Kr: How would you see the competitive landscape of large models?
Liang Wenfeng: Large manufacturers definitely have advantages, but if they can't be applied quickly, they may not be able to continue to adhere to them, because they need to see results.
The top startups also have solid technology, but like the old wave of AI startups, they have to face commercialization problems.
...
36Kr: Talents for large-scale model entrepreneurship are also scarce, and some investors say that many suitable talents may only be in the AI labs of giants such as OpenAI and FacebookAI Research. Do you go overseas to poach this kind of talent?
Liang Wenfeng: If you are pursuing short-term goals, it is right to find someone with existing experience. But if you look at the long term, experience is not so important, but basic ability, creativity, passion, etc. are more important. From this point of view, there are many suitable candidates in China.
36Kr: Why isn't experience so important?
Liang Wenfeng: You don't have to be able to do this by someone who has done this. High-Flyer's principle of recruiting people is to look at ability, not experience. Our core technical positions are basically mainly fresh graduates and those who have graduated for one or two years.
36Kr: Do you think experience is an obstacle when it comes to innovating business?
Liang Wenfeng: When you do something, experienced people will tell you without thinking that you should do it, but people without experience will repeatedly explore and think seriously about what should be done, and then find a solution that is in line with the current actual situation.
36Kr: High-Flyer has entered the industry from a layman with no financial genes at all, and has become the head in a few years, is this recruitment rule one of the secrets?
Liang Wenfeng: Our core team, even myself, didn't have quantitative experience at the beginning, which is very special. It can't be said to be the secret of success, but it's one of the cultures of High-Flyer. We don't deliberately shy away from experienced people, but it's more about ability.
Take the sales position as an example. Our two main sales officers are both amateurs in this industry. One was originally engaged in the foreign trade of German machinery categories, and the other was originally written in the background of the brokerage. When they enter the industry, they have no experience, no resources, no accumulation.
And now we may be the only big private equity firm that can focus on direct sales. Doing direct selling means that there is no need to divide the fees to the middlemen, and the profit margin is higher under the same scale and performance, and many companies will try to imitate us, but they do not succeed.
36Kr: Why are many families trying to imitate you, but they are not successful?
Liang Wenfeng: Because that's not enough for innovation to happen. It needs to match the culture and management of the company.
In fact, they couldn't do anything in the first year, and only in the second year did they start to make some progress. But our assessment criteria are different from those of ordinary companies. We don't have KPIs and we don't have so-called tasks.
36Kr: What are your assessment criteria?
Liang Wenfeng: We are not like ordinary companies, we value the number of orders placed by customers, and our sales sales and commissions are not good at the beginning, but will encourage sales to develop their own circles, meet more people, and have greater influence.
Because we believe that an honest salesperson who can be trusted by customers may not be able to get customers to place orders in a short period of time, but it can make you feel that he is a reliable person.
URL
https://36kr.com/p/2272896094586500
Prior entry
https://aalbc.com/tc/topic/11445-economiccorner010/
POST URL
https://aalbc.com/tc/topic/11447-economiccorner011/
PRIOR EDITION
https://aalbc.com/tc/events/event/166-economic-corner-10-online-divestiture- 01282025/
NEXT EDITION
https://aalbc.com/tc/events/event/193-economic-corner-12-02122025/
User Feedback
There are no reviews to display.
