Neural networks is all the rage in computing these days. Many engineers think that, with enough computer power and fancy tweaks, they can be as smart as humans. Recent successes in playing and predicting protein folds have poured fuel on the AI fire. We may be on the brink of the mysterious Singularity, when humans and computers merge and we become immortal gods.
Let’s turn the clock back to the beginning of neural networks. In computer science terms, they are actually an old technology. The oldest version, called a perceptron, (a single-layer neural network) was invented in the 1960s, inspired by McCulloch and Pitt’s first model of neurons in the brain. However, the perceptron was ignored for decades because Marvin Minsky (1927-2016) proved that it could not learn the simple XOR logic function.
The XOR function says that X and Y cannot be true at the same time; you can’t have your cake and eat it too. The problem with the poor perceptron is that it can only solve half the problem. You can’t have your cake, but you can eat it. You can’t have your cake and eat it, but you can have it. The perceptron doesn’t understand that you can’t have your cake AND eat it at the same time.
The researchers believed at first that they just needed to give the perceptron more nodes to figure out the XOR. Turns out the real problem is the perceptron learning algorithm. People can easily program the perceptron network to learn XOR, but the perceptron learning algorithm is too slow for many nodes.
Totally impractical. So, Minsky’s research agenda with logic-based systems ruled the early days of AI as neural networks languished in the shadows.
So, Why are perceptrons learning too slowly? The reason is that they are modeled on the activity of neurons in the brain.
Neurons in the brain operate on the so-called “all or nothing” principle. When enough charge builds up in the neuron’s synapses, this neuron fires. But until then, the neuron does nothing at all. The neuron can thus be seen as an on-off switch. It fires, or it doesn’t. There is no in-between stage. This makes learning difficult.
To understand why the “all or nothing” principle makes learning difficult, imagine playing a “hot or cold” search game where you search a room for a treasure. A helpful onlooker says “It’s hotter!” as you get closer to the treasure and “Cool!” as you get further away from it. This signal is very effective in helping you find wealth.
But what if we tweak the game. Now, the viewer just tells you if you found the treasure or not. If you find the treasure, the viewer says “Yes.” If you don’t find the treasure, the viewer says “No.” This new version will take longer to play because the viewer doesn’t really provide any information to speed up the game.
We have the same situation with the “all or nothing” principle used by the perceptron. When the perceptron makes a prediction, it is told that the prediction is right or wrong. There is no indication whether the current prediction is more or less correct than an alternative. This makes it difficult for the perceptron to update itself to get closer to the correct answer. As the structure of the perceptron becomes more complex, the difficulty only increases.
This state of affairs was completely changed by the discovery of backpropagation algorithm. The key insight is to stop trying to copy how neurons in the brain work. Instead of relying on the “all or nothing” principle, the perceptron gets a “hotter” or “colder” signal. The trick is to use a differentiable function to generate the perceptron output, instead of the simple threshold function used to simulate the “all or nothing” principle.
The good thing about using a different function is that the “hotter” or “colder” signal can be passed to the remaining nodes of the multilayer perceptron network and the value of “hotter” or “colder” contributed by each node and its The connections can be accurately calculated. This allows for very accurate updates to the multilayer perceptron, and makes efficient learning practical. Thus, the modern neural network was born.
At the same time, the umbilical cord between the neural network and the brain is severed. Practicality won out over biological reality.
This brings us to the point: the modern neural network is a better learning algorithm than the old perceptron because the modern neural network is a better learning algorithm than the brain. The neuron brain is limited by its “all or nothing” principle which makes rapid learning impossible. In contrast, the different “hotter” or “cooler” functions used in neural networks enable programmers to train algorithms on networks with trillions of parameters.
Which begs the question: what if the human mind could learn better than a neural network?
We’ll look at that possibility shortly.
You can also read: Artificial neural networks can show that the mind not the brain. Because artificial neural networks are better versions of the brain, whatever neural networks can’t do, the brain can’t do. The human mind can perform tasks that an artificial neural network (ANN) cannot. Since the brain works like an ANN, the mind cannot be what the brain does. (Eric Holloway)