Graph offers a powerful framework for mapping interconnectivy across various domains, including human relationships, routing and bioinformatics. While Graph Neural Networks (GNNs) have recently set a new standard for processing complex data and recommendation systems, truly mastering them requires looking back. In this post, we’ll explore the “traditional” framework of graph modeling to understand how to view data through a relational lens and build a robust machine learning solutions.
Table of Contents
Open Table of Contents
Graph
A graph consists of several nodes , where each node represent object/entity (For example, a person, an image patch, or a word token). Nodes may be connected to one another through edges, forming a set of edges , which describe the relationships or interactions between nodes.
In this example, the graph has nodes: . A value of in means there is an edge between node and node , while a value of means no edge exists. For instance, indicates that the node is connected to node , and . Since the matrix is symmetric and all diagonal entries are zero, this graph is undirected and contains no self-loop.
Thanks to the kind people in the internet you can explore graph types and the meaning “self-loop” here.
In some other cases, each edge may have a weight that represents the strength, distance, or importance of the connection. For instance, in a routing problem, a weight could represent the physical distance between two cities.
In this weighted matrix , the connection between and has a strength of , while the connection between and is much weaker at . Note that , meaning there is no direct path between and .
Tasks
Okay now that we understand the simplest definition of graphs, we can identify the core tasks for machine learning on graphs. These tasks are defined by the element of the graph we wish to predict.
Node Classification
Node classification is the task of predicting a label for each node in a graph. Formally, given a graph with adjacency matrix , and a set of known labels for a subset of nodes , the goal is to predict the labels for the remaining unlabeled nodes .
A model learns a function so that:
Here, represents any initial node features.
Using the example graph with three nodes and adjacency matrix , if we know the labels for nodes and , we can predict the label for . This prediction leverages the homophily principle: connected nodes often share similar properties.
Edge Classification
Edge classification is the task of predicting a label or property for each edge in a graph. Formally, given a graph , we aim to assign a label to each edge . In many applications, we may also predict the existence of a missing edge.
A model for edge classification learns a function that maps pairs of node features and the graph structure to edge labels:
where are the model parameters.
In the example graph with adjacency matrix , we might predict the type of relationship between connected nodes. For instance, in a weighted adjacency matrix , the weight could be the target label, or we might classify edges as “strong” or “weak” based on their weights.
Community Detection
Community detection, or graph clustering, aims to partition the node set into disjoint subsets such that nodes within the same community are densely connected, while nodes in different communities are sparsely connected.
Given the adjacency matrix , the goal is to find a mapping that assigns each node to a community. This is often an unsupervised task, but can be guided by known labels or constraints.
In the example graph with three nodes and symmetric adjacency matrix , note that all nodes are mutually connected. Therefore, the entire graph might be considered a single community. However, if we remove the edge between and (setting ), we might partition the graph into two communities: and , or and , depending on the algorithm.
Graph Classification
Graph classification is the task of predicting a label for an entire graph . This is common in applications where each graph represents a distinct entity, such as a molecule in chemistry or a document in natural language processing.
Given a set of graphs and their corresponding labels , the goal is to learn a function that maps a graph to a label:
where are the model parameters.
For example, consider a set of graphs, each representing a molecule. The adjacency matrix might indicate bonds between atoms (nodes), and node features could include atom types. The task could be to classify each molecule as toxic or non-toxic.
In the context of our example, if we consider the three-node graph as one entity (e.g., a small molecule), we would assign a single label to the entire structure, regardless of the individual node labels.
Embeddings
Now, how do we turn the graph into a numerical representation that a model can learn from?
The answer is embeddings.
Unlike modern GNNs, where embeddings are learned and adaptive, they used to be handcrafted. This meant they were carefully designed by combining node/edge features with domain-specific knowledge.
There are many different strategies for this. I haven’t dived deep into this yet. While some strategies, like degree-centrality and betweenness-centrality seem relatively easy to understand (I think?), others require a deeper mathematical background (in my opinion — or is it a skill issue? Maybe).
I will be back for the next post in this graph series once I understand them better.
See you. I hope this personal note is helpful.