The role of abstraction in geometric deep learning
The fundamental mechanism underlying the representation of complex systems in geometric deep learning is abstraction. Before any computational processing can occur, physical and conceptual phenomena must be abstracted into a purely mathematical framework.
In classical machine learning, structured data is conventionally encountered in the form of sequences (one-dimensional arrays) and images (two-dimensional arrays). However, these familiar structures are merely highly constrained special cases of a far more powerful and universal mathematical abstraction: the graph.
By defining a graph as , where represents a set of objects (nodes or vertices) and represents their connections (edges or links), an astonishing variety of phenomena can be modeled. The true power of this formulation lies in its capacity for feature allocation: both nodes and edges can hold complex, multi-dimensional data attributes, either discrete or continuous.
Understanding how diverse data modalities map into this framework is the foundational step of Geometric Deep Learning. This mapping can be broadly divided into two regimes:
- data with implicit (regular) topologies
- data with explicit (heterogeneous) topologies.
1. The Euclidean Disguise: Implicit Graphs
Conceptualizing text or images as graphs is often counterintuitive. Because their structure is strictly regular and predictable, their underlying graph nature is usually hidden behind dense multidimensional arrays (tensors). Unmasking this structure, however, establishes the exact intuition required to comprehend more complex, non-grid-like domains.
1.1 Images as Grid Graphs
An image is typically conceptualized as a rectangular grid of pixels, represented programmatically as a tensor (for instance, a standard array of floats). Viewed through the lens of graph theory, an image is a purely regular grid graph.
- Nodes (): Every individual pixel acts as a node. The information stored within each node is a feature vector; for a standard color image, this is a 3-dimensional vector encoding the RGB values.
- Edges (): Adjacency is defined by spatial proximity. Excluding the borders, every node is connected via undirected edges to exactly 8 neighboring pixels (horizontal, vertical, and diagonal).
The Adjacency Matrix of an Image
If a simple image (25 pixels) is analyzed and its nodes are strictly ordered, an adjacency matrix (where ) can be constructed. Since every pixel connects to its neighbors in a strictly repeating pattern, the resulting matrix exhibits a distinct, highly redundant banded structure. This rigid regularity constitutes the precise mathematical prior that Convolutional Neural Networks (CNNs) exploit.
1.2 Text and Sequences as Line Graphs
Digitizing text involves mapping characters, words, or sub-word tokens to specific indices. This creates a sequence, which translates topologically to a simple, unbranching directed graph.
- Nodes (): The individual indices representing tokens or characters.
- Edges (): Directed links pointing strictly forward from one token to the immediate next in the temporal or logical sequence.
- Topology: The adjacency matrix for text is extraordinarily sparse, consisting merely of a single diagonal line of s shifted by one position, reflecting that word only connects to word .
Architectural Implications: RNNs vs. Transformers
The linear, sequential graph representation perfectly mirrors the inductive bias of Recurrent Neural Networks (RNNs). However, modern architectures approach text differently. Transformers discard rigid sequential edges and instead treat text as a fully connected graph, where the model actively learns the interaction weights (attention) between all possible token pairs, regardless of their physical distance in the sentence.
2. In-the-Wild Data: Explicit and Heterogeneous Graphs
While grid and sequence graphs possess fixed neighborhood sizes, data “in the wild” is vastly more heterogeneous. The number of neighbors (the degree of a node) varies drastically across the topology, making dense tensors highly inefficient and establishing the graph as the only natural descriptive tool.
2.1 Molecular and Chemical Topologies
Molecules are the building blocks of matter, existing as atoms and electrons interacting in 3D space. When a pair of atoms stabilizes at a specific distance, a covalent bond is shared.
- Nodes: Atoms. In computational chemistry (e.g., modeling Caffeine or Citronellal), node attributes are usually discrete variables indicating the atom type (carbon, nitrogen, hydrogen, oxygen).
- Edges: Chemical bonds. These edges also carry discrete feature variables representing the bond type (single, double, aromatic).
2.2 Infrastructure and Continuous Variables
Graphs perfectly model physical networks, often requiring the edges to carry continuous, rather than discrete, information.
- Rail Networks: When mapping a transportation system, cities act as nodes and the railway lines as edges. Crucially, each edge can be weighted by a continuous variable, such as the average journey time between two cities. Assuming the journey from London to Cambridge takes the same time as the reverse, these edges are symmetrical (undirected).
- Electrical Circuits: A hardware schematic operates as a graph where the nodes are electronic components (resistors, capacitors) and the edges are the conductive wires enabling the flow of current.
2.3 Directed Information and Citation Networks
In numerous domains, relationships are inherently asymmetrical, requiring directed graphs.
- The World Wide Web: Web pages act as nodes and hyperlinks as edges. A link from Page A to Page B provides no guarantee that Page B links back, dictating a strictly directed topology.
- Citation Networks: In academia, scientists routinely cite previous work. A citation network treats published papers as nodes—often embedding rich features like the word embeddings of abstracts—and citations as directed edges pointing from a recent paper to an older, foundational one.
2.4 Biological and Social Interaction
When studying collective behavior, the interaction patterns themselves constitute the primary object of study.
- Social Networks: People or institutions are modeled as nodes, and their relationships (friendships, collaborations) as edges. Whether analyzing interactions in a Karate club or the dialogue exchanges between characters in Shakespeare’s Othello, social graphs lack the identical, repeating adjacency matrices of images, showcasing dense clusters (communities) and isolated hubs instead.
- Protein Interaction Networks: Nodes represent individual proteins, and the edges express a continuous variable indicating the biochemical strength of the interaction between protein pairs.
3. Advanced Abstractions: Multi-Relational and Computational Graphs
The abstraction extends even further when analyzing environments that contain multiple distinct ontological categories within the same structure.
- Corporate Knowledge Graphs: A complex organizational database is a highly heterogeneous graph. It comprises multiple kinds of nodes (e.g., people, digital documents, physical meetings) connected by multiple kinds of edges capturing different relational properties (e.g., an edge denoting a person “was present at” a meeting, versus a document “referencing” another document).
- Dataflow and Visual Scene Graphs: In computer vision, an image can be parsed into a “scene graph” where tagged objects are nodes and their spatial/semantic relationships are edges. Similarly, programming code, machine learning architectures, and raw mathematical equations are fundamentally dataflow graphs, where variables are nodes and the mathematical operations bridging them are edges.
4. The Scale and Diversity of Graph Topologies
Because real-world graphs model arbitrary phenomena, their structural statistics vary by orders of magnitude. Connectivity density, measured by the average number of edges per node (degree),fundamentally changes the required computational approach.
| Dataset | Domain | Total Graphs | Nodes | Edges | Min Degree | Mean Degree | Max Degree |
|---|---|---|---|---|---|---|---|
| Karate Club | Social network | 1 | 34 | 78 | 4.5 | 17 | - |
| QM9 | Small molecules | 134k | 1 | 2 | 5 | ||
| Cora | Citation network | 1 | 23,166 | 91,500 | 1 | 7.8 | 379 |
| Wikipedia (EN) | Knowledge graph | 1 | M | M | 62.24 | M | - |
Note
These summary statistics illustrate the vastness of the domain. Exact numbers are highly dependent on the specific featurization and thresholding decisions made during dataset construction.