Skip to content
Go back

My Graph Notes #1: Learning the Basics Before the Hard Stuff

Graph offers a powerful framework for mapping interconnectivy across various domains, including human relationships, routing and bioinformatics. While Graph Neural Networks (GNNs) have recently set a new standard for processing complex data and recommendation systems, truly mastering them requires looking back. In this post, we’ll explore the “traditional” framework of graph modeling to understand how to view data through a relational lens and build a robust machine learning solutions.

Table of Contents

Open Table of Contents

Graph

A graph G\mathcal{G} consists of several nodes N={n0,n1,n2,,nN1}\mathcal{N} = \{n_0, n_1, n_2, \dots, n_{N-1}\}, where each node represent object/entity (For example, a person, an image patch, or a word token). Nodes may be connected to one another through edges, forming a set of edges EN×N\mathcal{E} \subseteq \mathcal{N}\times\mathcal{N}, which describe the relationships or interactions between nodes.

A=[011101110]A = \begin{bmatrix} 0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 0 \end{bmatrix}

In this example, the graph has N=3N = 3 nodes: {n0,n1,n2}\{ n_0, n_1, n_2 \}. A value of 11 in AijA_{ij} means there is an edge between node nin_i and node njn_j, while a value of 00 means no edge exists. For instance, A01=1A_{01} = 1 indicates that the node n0n_0 is connected to node n1n_1, and A02=1A_{02} = 1. Since the matrix is symmetric (Aij=Aji)(A_{ij} = A_{ji}) and all diagonal entries are zero, this graph is undirected and contains no self-loop.

Thanks to the kind people in the internet you can explore graph types and the meaning “self-loop” here.

In some other cases, each edge may have a weight w(e)w(e) that represents the strength, distance, or importance of the connection. For instance, in a routing problem, a weight could represent the physical distance between two cities.

W=[02.50.82.5000.800]W = \begin{bmatrix} 0 & 2.5 & 0.8 \\ 2.5 & 0 & 0 \\ 0.8 & 0 & 0 \end{bmatrix}

In this weighted matrix WW, the connection between n0n_0 and n1n_1 has a strength of 2.52.5, while the connection between n0n_0 and n2n_2 is much weaker at 0.80.8. Note that W12=0W_{12} = 0, meaning there is no direct path between n1n_1 and n2n_2.

Tasks

Okay now that we understand the simplest definition of graphs, we can identify the core tasks for machine learning on graphs. These tasks are defined by the element of the graph we wish to predict.

Node Classification

Node classification is the task of predicting a label for each node in a graph. Formally, given a graph G=(N,E)\mathcal{G} = (\mathcal{N}, \mathcal{E}) with adjacency matrix AA, and a set of known labels yi{y_i} for a subset of nodes NLN\mathcal{N}_L \subset \mathcal{N}, the goal is to predict the labels for the remaining unlabeled nodes NU=NNL\mathcal{N}_U = \mathcal{N} \setminus \mathcal{N}_L.

A model learns a function ff so that:

yj^=f(A,X)jfornjNU\hat{y_j} = f(A, X)_{j} \quad \text{for} \quad n_j \in \mathcal{N}_U

Here, XX represents any initial node features.

Using the example graph with three nodes and adjacency matrix AA, if we know the labels for nodes n0n_0 and n1n_1, we can predict the label for n2n_2. This prediction leverages the homophily principle: connected nodes often share similar properties.

Edge Classification

Edge classification is the task of predicting a label or property for each edge in a graph. Formally, given a graph G=(N,E)\mathcal{G} = (\mathcal{N}, \mathcal{E}), we aim to assign a label yijy_{ij} to each edge (ni,nj)E(n_i, n_j) \in \mathcal{E}. In many applications, we may also predict the existence of a missing edge.

A model for edge classification learns a function gg that maps pairs of node features and the graph structure to edge labels:

y^ij=g(A,X,ni,nj;θ)\hat{y}_{ij} = g(A, X, n_i, n_j; \theta)

where θ\theta are the model parameters.

In the example graph with adjacency matrix AA, we might predict the type of relationship between connected nodes. For instance, in a weighted adjacency matrix WW, the weight w(e)w(e) could be the target label, or we might classify edges as “strong” or “weak” based on their weights.

Community Detection

Community detection, or graph clustering, aims to partition the node set N\mathcal{N} into kk disjoint subsets {C1,C2,,Ck}\{C_1, C_2, \dots, C_k\} such that nodes within the same community are densely connected, while nodes in different communities are sparsely connected.

Given the adjacency matrix AA, the goal is to find a mapping h:N{1,2,,k}h: \mathcal{N} \rightarrow \{1, 2, \dots, k\} that assigns each node to a community. This is often an unsupervised task, but can be guided by known labels or constraints.

In the example graph with three nodes and symmetric adjacency matrix AA, note that all nodes are mutually connected. Therefore, the entire graph might be considered a single community. However, if we remove the edge between n1n_1 and n2n_2 (setting A12=A21=0A_{12} = A_{21} = 0), we might partition the graph into two communities: {n0,n1}\{n_0, n_1\} and {n2}\{n_2\}, or {n0,n2}\{n_0, n_2\} and {n1}\{n_1\}, depending on the algorithm.

Graph Classification

Graph classification is the task of predicting a label for an entire graph G\mathcal{G}. This is common in applications where each graph represents a distinct entity, such as a molecule in chemistry or a document in natural language processing.

Given a set of graphs {G1,G2,,GM}\{\mathcal{G}_1, \mathcal{G}_2, \dots, \mathcal{G}_M\} and their corresponding labels {y1,y2,,yM}\{y_1, y_2, \dots, y_M\}, the goal is to learn a function pp that maps a graph to a label:

y^m=p(Gm;ϕ)\hat{y}_m = p(\mathcal{G}_m; \phi)

where ϕ\phi are the model parameters.

For example, consider a set of graphs, each representing a molecule. The adjacency matrix might indicate bonds between atoms (nodes), and node features could include atom types. The task could be to classify each molecule as toxic or non-toxic.

In the context of our example, if we consider the three-node graph as one entity (e.g., a small molecule), we would assign a single label to the entire structure, regardless of the individual node labels.

Embeddings

Now, how do we turn the graph into a numerical representation that a model can learn from?

The answer is embeddings.

Unlike modern GNNs, where embeddings are learned and adaptive, they used to be handcrafted. This meant they were carefully designed by combining node/edge features with domain-specific knowledge.

There are many different strategies for this. I haven’t dived deep into this yet. While some strategies, like degree-centrality and betweenness-centrality seem relatively easy to understand (I think?), others require a deeper mathematical background (in my opinion — or is it a skill issue? Maybe).

I will be back for the next post in this graph series once I understand them better.

See you. I hope this personal note is helpful.


Share this post on:

Next Post
Hello world