The challenges of using graphs in machine learning

1. The Geometric Nature of Graph-Structured Data

The foundational premise of Graph Neural Networks (GNNs) rests upon abandoning the rigid, grid-like topologies (Euclidean data) assumed by traditional convolutional or recurrent networks. Images and text possess an intrinsic, absolute coordinate system (e.g., “top-left pixel”, “first word”). In stark contrast, graphs represent non-Euclidean data where relationships dictate structure, and absolute positioning is undefined.

Important

For rigorous theoretical analysis, frameworks are typically constrained to simple, undirected graphs: topologies characterized by unweighted, bidirectional interactions devoid of self-loops.

Graph Definitions and Notation

Formally, a graph is defined by the tuple $G = (V, E)$ :

$V$ (Vertices/Nodes): The discrete set of entities, with cardinality $∣ V ∣ = N$ . Nodes are arbitrarily indexed by $n \in {1, \dots, N}$ .

$E$ (Edges/Links): The set of pairwise, relational connections. An undirected edge connecting node $n$ and node $m$ is denoted by the unordered pair $(n, m)$ .

The Neighborhood Function

The defining characteristic of a node in a non-Euclidean space is not its coordinate, but its connectivity. The complete set of adjacent entities to a focal node $n$ forms its neighborhood, mathematically denoted as $N (n) = {m \in V ∣ (n, m) \in E}$ .

Feature Representation Formulation

Graphs encapsulate rich, domain-specific semantics. The intrinsic properties of each node $n$ are parameterized by a $D$ -dimensional feature vector, $x_{n} \in R^{D}$ . To enable highly parallelized tensor operations, these vectors are aggregated row-wise into a continuous Node Feature Matrix, $X \in R^{N \times D}$ .

2. Topological Encoding: The Adjacency Matrix and the Ordering Bottleneck

To process topological connectivity via differentiable tensor operations, the relational graph must be projected into a matrix format. The canonical mechanism is the Adjacency Matrix, $A \in {0, 1}^{N \times N}$ .

Constructing $A$ requires a fundamental compromise: we must impose an arbitrary sequence, or “ordering,” upon the $N$ nodes. Given a chosen index $(1 \dots N)$ :

$A_{n, m} = 1 ⟺ (n, m) \in E$
$A_{n, m} = 0 ⟺ (n, m) \in / E$

Because $G$ is undirected, $A$ is strictly symmetric ( $A^{T} = A$ ).

The Illusion of Coordinates

Intuitively, flattening the adjacency matrix $A$ to feed it into a standard Multi-Layer Perceptron (MLP) is a catastrophic theoretical error. The physical reality of a graph (e.g., a molecule) exists independently of how we label its atoms. The ordering $1 \dots N$ is merely an artificial coordinate system we imposed to write down the matrix. Because the space of isomorphic representations (different matrices representing the exact same graph) scales factorially with $N!$ , relying on massive datasets to force a standard MLP to “unlearn” this artificial ordering is mathematically intractable.

(to myself) put here: Image demonstrating isomorphic graphs: the exact same topological structure resulting in drastically different adjacency matrices purely due to node relabeling

3. The Algebraic Formulation of Isomorphism: Permutations

To systematically neutralize the $N!$ combinatorial explosion, neural architectures must embed a strict structural inductive bias. The network must be mathematically aware that re-indexing the nodes does not alter the underlying entity. This requires formalizing the concept of node reordering via Permutation Matrices.

Permutation Matrix definition

A permutation matrix $P \in {0, 1}^{N \times N}$ is an orthogonal boolean matrix containing exactly one entry of $1$ in each row and each column. An entry $P_{n, m} = 1$ constitutes an algebraic instruction: “relocate the data currently at index $n$ to index $m$ “.

P = 0010000001100000001001000

I = u_{1}^{T} u_{2}^{T} \dots u_{N}^{T}

P = u_{π (1)}^{⊤} u_{π (2)}^{⊤} \dots u_{π (N)}^{⊤}

When an arbitrary permutation $P$ is applied to a graph’s underlying data structures, the matrices must transform synchronously to preserve the topology:

Permuting the Feature Matrix: Node attributes are relocated via standard pre-multiplication (which shuffles the rows).
$\tilde{X} = PX$
Permuting the Adjacency Matrix: Relocating an edge requires moving both its origin and its destination. Algebraically, this is achieved by pre-multiplying by $P$ (shuffling the rows) and post-multiplying by its transpose $P^{T}$ (shuffling the columns).
$\tilde{A} = PA P^{T}$

4. Architectural Symmetries: Invariance vs. Equivariance

If a graph is an entity independent of its node labels, any robust predictive mapping $f (X, A)$ must satisfy rigorous symmetry constraints. The network’s architectural design must natively guarantee one of two properties, contingent upon the resolution of the predictive task:

(to myself) put here: Image diagramming the theoretical difference between permutation invariance (a many-to-one mapping converging to a single representation) and permutation equivariance (a mapping that shifts synchronously with the input)

Permutation Invariance (Graph-Level Objectives)

When predicting a macroscopic property of the entire graph (e.g., determining if a molecule is toxic), the final scalar or vector must remain immutable regardless of how the input matrices are shuffled. The global output is an invariant anchor.

$f (PX, PA P^{T}) = f (X, A)$ Intuition: If you rotate a molecule in space or list its atoms in a different order on paper, its total mass does not change.

Permutation Equivariance (Node-Level Objectives)

When the objective is localized to individual nodes (e.g., classifying which specific users in a network are malicious bots), the output tensor must mirror the input’s transformation perfectly. If we shuffle the input nodes, the network’s localized predictions must undergo the exact same permutation.

$f (PX, PA P^{T}) = P f (X, A)$ Intuition: If “User A” is moved from row 1 to row 5 in our input matrix, the prediction classifying “User A as a bot” must automatically move from row 1 to row 5 in the output tensor.

Designing parameterized differentiable layers (such as message-passing layers) that intrinsically respect these symmetries without needing to “learn” them from data is the core triumph of Geometric Deep Learning.

Deep Learning: Zero to Hero

Explorer

The challenges of using graphs in machine learning

1. The Geometric Nature of Graph-Structured Data

Feature Representation Formulation

2. Topological Encoding: The Adjacency Matrix and the Ordering Bottleneck

3. The Algebraic Formulation of Isomorphism: Permutations

4. Architectural Symmetries: Invariance vs. Equivariance

Permutation Invariance (Graph-Level Objectives)

Permutation Equivariance (Node-Level Objectives)

Graph View

Table of Contents