Abstract

The connectionist paradigm builds intelligence out of one repeated element: the neuron. This note traces that element from its biological original, through the first mathematical model of it, the McCulloch-Pitts neuron of 1943, to the modern artificial neuron that powers today’s networks. The three are not the same object, and the differences between them are exactly where learning enters the story.

To follow the trajectory of the connectionist paradigm mapped in the evolution of AI paradigms, the right place to begin is the unit from which every network is assembled. A neural network is a composition of many copies of one simple computation, and most of what a network can do follows from how that computation is defined and from how its parameters are set.

The biological neuron

The brain is a massive, decentralised, parallel network, and its computational power rests on a single cell type repeated billions of times.

  • Dendrites form the input surface, receiving electrochemical signals from neighbouring neurons.
  • The cell body (soma) integrates those incoming signals.
  • The axon and its terminals transmit an output to other neurons once the integrated input crosses a threshold, firing an all-or-nothing action potential.

This biological picture has two aspects that recur throughout the rest of the note:

  • the thresholding: a neuron does not pass its input through smoothly, but fires only once the accumulated signal is strong enough;
  • the plasticity: the strength of a synapse, the connection between two neurons, changes with activity. Plasticity is the biological substrate of learning and memory, and it is precisely the property that the earliest mathematical models would leave out.

The McCulloch-Pitts neuron (1943)

The first mathematical model of a neuron came from Warren McCulloch and Walter Pitts in 1943, and it was deliberately austere. Their unit takes binary inputs, forms a simple sum of them, and outputs if that sum reaches a fixed threshold and otherwise. Some connections are excitatory and add to the sum; others are inhibitory and can veto firing entirely.

The decisive feature of the McCulloch-Pitts neuron is what it does not have: it does not learn. Its weights and threshold are fixed in advance, chosen by hand to implement a specific logical function. A single unit can be wired to compute a logical AND, OR, or NOT, and McCulloch and Pitts proved that networks of such units can, in principle, compute any Boolean function at all.

Why 1943 mattered

The McCulloch-Pitts result joined two worlds that had been separate: the biology of neurons and the mathematics of logic. It established that threshold units, suitably wired, are a model of computation. This is the conceptual birth of the field, but it describes a network that is designed, not trained. The weights encode a function the engineer already knows; nothing about them is discovered from data.

What was missing: learning

The McCulloch-Pitts neuron is a fixed circuit. Turning it into something that improves from experience required two things: the weights had to become adjustable, and a rule for adjusting them had to be specified. Both arrived within a decade.

  • In 1949 Donald Hebb proposed that a connection strengthens when the units it joins are active together, the principle later compressed into “cells that fire together wire together”, giving a first account of how synaptic strengths might change with experience.
  • In 1958 Frank Rosenblatt’s Perceptron turned that intuition into an algorithm: a unit whose weights are corrected whenever it misclassifies an example, so that its decision boundary moves in response to mistakes.

The detailed account of the Perceptron, and of what it could and could not represent, is given in the history of neural networks.

The move from McCulloch-Pitts to the Perceptron is the move from a neuron that is wired to one that is trained. Everything modern follows from it.

The modern artificial neuron

Today’s artificial neuron keeps the skeleton of the McCulloch-Pitts unit, a weighted sum followed by a thresholding decision, but generalises every part of it so that learning by gradient descent becomes possible.

AspectBiological neuronMcCulloch-Pitts (1943)Modern artificial neuron
InputsElectrochemical signalsBinary, Real-valued,
WeightsSynaptic strengthsFixed, hand-setReal-valued and learned
IntegrationSummation in the somaSum of inputsWeighted sum plus a bias
Output ruleAll-or-nothing spikeHard thresholdSmooth nonlinear activation
LearningSynaptic plasticityNoneGradient-based weight updates

A modern neuron forms a real-valued weighted sum of its inputs, adds a bias, and passes the result through a nonlinear activation function:

The modern unit departs from the 1943 model in three essential respects. The inputs and weights are now real-valued, so the unit measures degree rather than mere presence. The weights are learned from data instead of being set by hand. And the hard threshold is replaced by a smooth, differentiable activation such as the sigmoid, hyperbolic tangent, or ReLU, which is what lets the error signal flow back through the unit during backpropagation. The McCulloch-Pitts threshold blocked gradients; the differentiable activation lets them pass.

From a single unit to a network

A single neuron is a linear combination of its inputs followed by a nonlinearity, and on its own it can separate only data that one boundary can divide. The expressive power of connectionism comes not from the unit but from the composition of many units into layers, where the output of one layer becomes the input of the next.

That composition is what lets a network bend and refold its input space until a problem no single boundary could solve becomes separable. The canonical demonstration is the XOR function, unsolvable by one unit and solvable by a small network of them, developed in the history of neural networks; the general principle of networks of adaptive units is the subject of connectionism.

Summary

The artificial neuron is an exercise in deliberate abstraction. McCulloch and Pitts kept the threshold and discarded the chemistry; the modern unit keeps the weighted sum and discards the hard threshold, trading it for a differentiable activation and learnable weights. What survives across all three versions, biological, logical, and trainable, is one idea: a unit that integrates many inputs and responds once their combination is strong enough. What changes, and what made deep learning possible, is that the connection strengths are no longer given but learned.