Mechanism and Performance are different.

The ceiling on test performance is constantly dynamic. We revise the difficulty of standardized tests every year, and the limit on how many tests one could justify as a good measure of “intelligence” is infinite. We will never know if we’ve achieved AGI if we continue down this short sighted path. There will be no “AGI test”.

Rather, I believe real machine intelligence will be achieved when we’ve replicated the mechanisms behind the the inner workings of the neocortex on computers.

Why is the mechanism, the means in which we achieve something, so important? There are two examples that illustrate this well:

A bird and airplane can both fly. But that doesn’t mean the airplane can understand flight in the adaptable, self-driven, generalized, and natural way a bird can. There is a strong difference in the mechanisms between both things flying, but at the end of the day they can both fly.

An LLM is able to accurately predict the next word in a sequence of words, and generates with enough scale responses that sound comprehensive. But that doesn’t mean the LLM understands English the same way humans understand it, at least according to the latest neuroscience literature. There is a strong difference in mechanisms behind how each agent crafts language, but similar performative ability.

Let’s say you know the study habits of two students for a test on quadratic equations. Student A’s study strategy is to memorize the quadratic formula and plug in numbers to find the answer. Student B’s strategy is to learn about factoring techniques, and solve quadratic equations through factoring and completing the square. They both perform equally on a math test. Would it be fair to say Student A is as intelligent as Student B? Not quite. You could argue yes, because they did equally well on the test, but the counterpoint is: how do you know that the test was a good indicator for their intelligence? Their methodologies were different, yet their performance was the same.

Despite these different scenarios, what’s common throughout is that when evaluating for performance, you can have two methodologies that work in entirely separate ways, yet achieve similar results.

This means that the first AGI system might not actually do as well on the SAT, math olympiad, or bar exams compared to modern day LLMs (ChatGPT) and its future versions. I think thatwhen the mechanisms in which the AGI is able to “think” is the same as the ones in primates, dogs, and other “smart” mammals with a neocortex, that will be AGI. Because in principle, it will be a dumbed down version of the human brain, and I would always argue a 5 year old child is more “intelligent” than ChatGPT.

The best set of criteria to evaluate if something is AGI or not is rooted by the mechanisms in our brains – Hawkins' 1000 Brains Theory:

  1. The machine needs to learn continuously (Continual Lifelong Learning). The machine needs to learn from its mistakes to update its world model. The machine needs to create new connections to acquire new knowledge without replacing old ones or delete them.
  2. The machine will need to learn via movement (called embodiment). The motion leads to location. The world representation will be biased if it is avoided.
  3. The machine needs to create many models. Each cortical column of the neocortex learns a model of thousand objects, and the process to resolve the binding problem (one unique perception) is made by voting. A machine needs to acquire the same process.
  4. The machine needs to use reference frames to store knowledge. Thinking is a sort of movement. It is appeared by connecting the dots in the reference frames. If a machine can’t use motion, it cannot think. Cortical columns use cells like grid cells and place cells.