On-Device LLM Spring

The Spring of On-Device Large Language Models

This spring will be the spring of on-device large language models;
The older generation of on-device large language models, such as gemma2, Phi3, lamma3.1, and Qwen2.5, have a large gap in output quality compared to large models of 70b, 100b and above, and can only be regarded as toys;
A new generation of on-device large language models, led by deepseek-r1 32b and QwQ 32b, uses distillation, reinforcement learning, reasoning chains, etc., and their level is approaching or even surpassing the previous generation of flagship models such as GPT4o. Most importantly, these training methods and weights are all public;
On this basis, Microsoft launched Phi4 14b, Google launched gemma3 27b, and Mistral launched Mistral small 3.1 24b, which continuously improved the level of on-device large language models;
In the near future, perhaps in the first half of this year, the level of on-device large language models, that is, large language models that can be deployed and used on local consumer-grade terminals, even if they cannot compare with those ultra-large-scale flagship large models in the cloud in terms of the breadth and accuracy of knowledge, can also be only slightly behind with the help of external knowledge bases.

That’s why Altman was so anxious to write to the US government, asking them to treat DeepSeek as a second Huawei and block it completely;
Altman originally thought that he could form a moat by scaling up the model, and maintain the US’s lead in AI by restricting Chinese companies from getting computing cards;
As a result, Chinese companies were forced to find another way and achieve breakthroughs in efficiency;
The evolution of on-device large language models can be said to have revolutionized the ultra-large models in the cloud. If on-device large language models have sufficiently powerful reasoning and thinking capabilities, then with the help of external knowledge bases, they can approach or even reach the level of ultra-large models in the cloud; at that time, no one will spend a lot of money to buy OpenAI’s services.

What distinguishes humans from other creatures is mainly the ability of rational thinking, which may originate from the magical structure of a certain area of the brain’s neural network (perhaps located in the frontal lobe);
The current progress of on-device large language models shows that small and refined models also contain powerful thinking capabilities that are not inferior to those of ultra-large-scale models;
Further, if a refined basic neural network model with rational thinking ability is obtained through mathematical logic (and perhaps spatial perception) training;
Then using this basic model, perhaps with the assistance of some other auxiliary models, and connecting to an external, huge, and dynamic knowledge base, we can get a real AI that can work for us autonomously;
Perhaps this AI cannot have self-awareness, emotions, morals, thoughts, etc. like humans, because those may not be able to escape from the human’s own flesh and blood senses;
But wouldn’t an AI workhorse that works hard and has no selfish desires be better;
But maybe all the above guesses are wrong, and self-awareness and emotions will naturally emerge from the connection of massive language and image information.