Thoughts of Grok 3

About Grok 3

  • Grok 3 was officially released at noon yesterday, and Musk personally promoted it.
  • Training scale: trained with 200,000 H100s;
    • Of course, it was also mentioned in the live broadcast that due to the failure rate in actual training, roughly 100K+ cards actually worked;
    • Judging from OpenAI’s eagerness to build a supercomputing center, the scale of OpenAI’s training cluster should not be larger than Musk’s.
  • Model parameter scale:
    • I asked this question to https://grok.com/chat
    • The answer is: 1.8 trillion
    • This scale is roughly the same as GPT-4o.
  • Ability:
    • Judging from the current ranking of https://lmarena.ai/, the ability of the base model is temporarily ranked first;
    • However, it is still at about the same level as O3-mini, Gemini 2, and DeepSeek R1, and has not reached the height boasted by Musk before its release.

Some Thoughts

  • Basically all the language (text, audio, and video) materials in human history have been compressed into the neural network of the large language model;
  • Some time ago, some big names in the AI ​​field believed that the path of base models competing on scale was coming to an end;
  • The reasoning chain discovered by OpenAI and DeepSeek is a new path to continue to improve the ability of the model;
  • A recent paper https://arxiv.org/abs/2502.03387 proposed a point of view: less is more, I think it is a new direction.

Less is More

  • The advantage of large models over humans is mainly the breadth of knowledge, not the depth of thinking; essentially because all the knowledge in human history has been compressed into that huge neural network.
  • The advantage of humans is logical thinking ability, which can deduce very in-depth results from a small number of tokens, such as mathematics.
  • In the animal kingdom, the neural network of the human brain is not the largest, and the ratio of the human brain to body weight is similar to that of gorillas. In addition to thinking, the human brain also needs to manage the body. The scale of the neural network used for thinking and the energy consumed are not very large.
  • The history of human scientific development reveals that the scientific laws behind phenomena are generally very concise and elegant.
    • Simple patterns can evolve into very complex phenomena through iteration.
  • Humans are not born as a blank slate. The brain generated by genes is a basic neural network with some magical structural patterns.
    • The human brain is extremely plastic, and human intelligence can adapt to various growth environments accordingly;
    • It is conceivable that humans born on Earth, humans born on Mars in the future, and humans growing up on spaceships will have differences in their thinking and behavior patterns;
    • The most basic neural network carried by the brain of a baby at birth is the treasure of biological evolution and a wonder of the universe.
    • The neural networks carried by the brains of various animals at birth are determined by genes. Although some types of animals have existed for hundreds of millions of years, their genes will not change, and the basic neural network will not change, so advanced intelligence cannot be produced.
    • The same may be true for humans. The upper limit of intelligence is determined by this basic neural network, unless a gene mutation occurs.
    • The neural network is based on probability when thinking fast, so human sensibility is imprecise, vague, intuitive, and artistic.
    • But the human neural network must have some magical structural patterns that give us the ability to think rationally;
    • These magical structures may mainly exist in the frontal lobe, or they may exist globally, at least acting on the whole;
    • Rational thinking is slow, painful, and energy-consuming.
  • To make large language models think like humans, we need to find or generate this magical neural network structure.
    • This magical structure should be refined and small in scale;
    • The mechanism of this magical structure may be similar to: simple patterns evolve into very complex patterns through iteration.
    • To generate this magical structure, we can only rely on solving mathematical problems, not other ways such as solving programming problems;
    • Programming is essentially engineering, not science, and the solutions can be diverse;
      • When I use AI for programming, the same requirement, the same AI, will also give different solutions.
    • Interestingly, modern mathematics has proven that mathematics is essentially a human way of thinking, not an objective existence.
      • In other words: human genes determine the basic neural network of humans, and the structure of this neural network determines the human way of thinking. The most essential and underlying part of the way of thinking is mathematics.
  • At present, there is a kind of research that builds a digital world and lets AI live in this world, in the hope of evolving into artificial intelligence.
    • I think that through this method, it may be possible to evolve animal-level intelligence, but to evolve human-level intelligence, those magical structures that can produce mathematical thinking ability are absolutely needed.

The Future of AI

  • Perhaps there is another possibility that the digital world can never produce intelligence like humans, but it can produce intelligence that surpasses humans in specific fields.
  • For example, the current large language models are far beyond humans in the breadth of knowledge; in the near future, AI will definitely surpass humans in things like programming.
  • For example, human self-awareness, emotions, etc., may be closely related to human endocrine chemicals and human flesh and blood, which may be difficult to simulate in the digital world.
  • If we can decode human self-awareness, emotions, etc., and let human consciousness enter the digital world, then humans can abandon this fragile body and achieve immortality.
  • Perhaps even if we finally know how to form those magical structures, we still cannot make AI generate human-like self-awareness and emotions. What we get is just an assistant that can help us solve scientific problems. That would not be a bad future.
  • Just like the Go played by AlphaGo, it is difficult for humans to understand. If an artificial intelligence that is good at reasoning is born, the methods and conclusions it uses to solve scientific problems will be completely incomprehensible to humans.