Disjoint Sets
Track which elements belong to the same group, and merge groups -- in nearly per operation.
When to use
- Connected components in an undirected graph
- Cycle detection in an undirected graph
- Kruskal's minimum spanning tree
- Grouping elements that share a property transitively
Find with Path Compression
Every element points to a parent. find walks up until it hits a root -- a node whose parent is itself. That root is the group's representative.
Two elements are in the same group iff find(a) == find(b).
Without optimization a chain of elements makes find . Path compression fixes this: on the way back from the root, rewire every visited node to point directly to it.
The tree flattens itself. Future calls on any of those nodes cost .
Union by Rank
Find the roots of a and b. If they differ, make one root the parent of the other. Returns True if a merge happened, False if they were already connected.
Naively always attaching one tree under the other can grow a tall chain. Union by rank always attaches the shorter tree under the taller one, keeping height small. rank only increments when two trees of equal rank merge.
Metadata
Size: how many elements in each group?
Each element starts with size = 1. On merge, size[root_a] += size[root_b]. size[x] is only valid when x is a root -- call find(x) first. Use it when a problem asks for the largest component or the size of a specific group.
Count: how many groups remain?
Starts at . Decrements by 1 on each successful union. getCount() directly answers: how many connected components are there right now?
Union return value: cycle detection
union returns False when a and b already share a root. In an undirected graph, that means the edge (u, v) connects two already-connected nodes -- a cycle exists.