Skip to main content

Disjoint Sets

Track which elements belong to the same group, and merge groups -- in nearly O(1)O(1) per operation.

Disjoint Sets
Union-Find
find
walk up to root
Fast Find
goal: near O(1) per call
Path Compression
rewire nodes to root on return
union
merge two groups
Fast Union
goal: keep tree shallow
Union by Rank
attach shorter under taller
Optimal
O(α(n)) amortized per operation
+

When to use

  • Connected components in an undirected graph
  • Cycle detection in an undirected graph
  • Kruskal's minimum spanning tree
  • Grouping elements that share a property transitively

Find with Path Compression

Every element points to a parent. find walks up until it hits a root -- a node whose parent is itself. That root is the group's representative.

Two elements are in the same group iff find(a) == find(b).

Without optimization a chain of nn elements makes find O(n)O(n). Path compression fixes this: on the way back from the root, rewire every visited node to point directly to it.

The tree flattens itself. Future calls on any of those nodes cost O(1)O(1).

InitializeInitial state: nodes 0–4 form a chain. find(4) must walk 4 hops to reach root 0.
01234
parent[]0001122334
1 / 6
Step 1 of 60%

Union by Rank

Find the roots of a and b. If they differ, make one root the parent of the other. Returns True if a merge happened, False if they were already connected.

Naively always attaching one tree under the other can grow a tall chain. Union by rank always attaches the shorter tree under the taller one, keeping height small. rank only increments when two trees of equal rank merge.

InitializeTwo groups: A={0,1,2} root=0 rank=1, B={3,4} root=3 rank=0. union(2, 4) will merge them.
01234
rank[]1001020304parent[]0001023334
1 / 13
Step 1 of 130%

Metadata

Size: how many elements in each group?

Each element starts with size = 1. On merge, size[root_a] += size[root_b]. size[x] is only valid when x is a root -- call find(x) first. Use it when a problem asks for the largest component or the size of a specific group.

Count: how many groups remain?

Starts at nn. Decrements by 1 on each successful union. getCount() directly answers: how many connected components are there right now?

Union return value: cycle detection

union returns False when a and b already share a root. In an undirected graph, that means the edge (u, v) connects two already-connected nodes -- a cycle exists.

Array vs. HashMap internally

Array vs HashMap
Full
Quiz
are elements integers in [0, n)?
yes -- dense indices
array
no
can you map them to a small integer range yourself?
yes
array
no -- strings, tuples, sparse coords
hashmap

When to use HashMap

Implementation

Analysis

class DisjointSets:
def __init__(self, size):
self.parent = list(range(size)) # each element is its own root initially
self.rank = [0] * size # upper bound on tree height per root
self.size = [1] * size # component size (1 per node initially)
self.count = size # number of disjoint components
 
def find(self, element):
if self.parent[element] == element:
return element
# path compression: flatten the chain to the root on the way back
self.parent[element] = self.find(self.parent[element])
return self.parent[element]
 
def union(self, a, b):
root_a = self.find(a)
root_b = self.find(b)
if root_a != root_b:
# union by rank: attach shorter tree under taller to keep height small
if self.rank[root_a] > self.rank[root_b]:
self.parent[root_b] = root_a
self.size[root_a] += self.size[root_b]
elif self.rank[root_a] < self.rank[root_b]:
self.parent[root_a] = root_b
self.size[root_b] += self.size[root_a]
else:
self.parent[root_b] = root_a
self.size[root_a] += self.size[root_b]
self.rank[root_a] += 1 # only grows when both trees have equal rank
self.count -= 1
return True # merged: two components became one
return False # already connected: no merge happened
 
def getCount(self):
return self.count
 
def getSize(self, a):
return self.size[self.find(a)]
 
def getSizes(self):
for i in range(len(self.parent)):
if self.parent[i] == i:
yield self.parent[i], self.size[i]