Optimal(min bits) loss-less variable length compression algorithm.
+-------+ +-----+
| Alice | --------------------------------------> | Bob |
+-------+ +-----+
1. Generate Huffman codes from message 1. Extract Huffman codes
2. Encode message using codes 2. Decode message using codes
3. Send encoded message + codes
Operation | Big O |
---|---|
Generating huffman codes | O(n log n) |
Encoding | O(n) |
Decoding | O(n) |
Example message: optimal compression scheme
o p t i m a l c r e s n h
3 2 1 2 3 1 1 2 1 2 3 1 1
t a l r n h p i c e s o m
1 1 1 1 1 1 2 2 2 2 3 3 3
Iteration 1:
l r n h p i c e ta s o m
1 1 1 1 2 2 2 2 2 3 3 3
Iteration 2:
n h p i c e ta lr s o m
1 1 2 2 2 2 2 2 3 3 3
Iteration 3:
p i c e ta lr nh s o m
2 2 2 2 2 2 2 3 3 3
Iteration 4:
c e ta lr nh s o m pi
2 2 2 2 2 3 3 3 4
Iteration 5:
ta lr nh s o m pi ce
2 2 2 3 3 3 4 4
Iteration 6:
o m pi ce talr nhs
3 3 4 4 4 5
Iteration 7:
pi ce talr nhs om
4 4 4 5 6
Iteration 8:
talr nhs om pice
4 5 6 8
Iteration 9:
om pice talrnhs
6 8 9
Iteration 10:
talrnhs ompice
9 14
Iteration 11:
talrnhsompice
23
23
/ \
9 14
/ \ / \
4 5 8 |
/ \ / \ / \ |
2 2 2 | 4 4 6
/ \ / \ / \ | / \ / \ / \
t a l r n h s p i c e o m
1 1 1 1 1 1 3 2 2 2 2 3 3
23
0/ \1
9 14
0/ \1 0/ \1
4 5 8 |
0/ \1 0/ \1 0/ \1 |
2 2 2 | 4 4 6
0/ \1 0/ \1 0/ \1 | 0/ \1 0/ \1 0/ \1
t a l r n h s p i c e o m
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
0000 | | | | | | | | | | | |
0001 | | | | | | | | | | |
0010 | | | | | | | | | |
0011 | | | | | | | | |
0100 | | | | | | | |
0101 | | | | | | |
011 | | | | | |
1000 | | | | |
1001 | | | |
1010 | | |
1011 | |
110 |
111
o p t i m a l c o m p r e s s i o n
110 1000 0000 1001 111 0001 0010 1010 110 111 1000 0011 1011 011 011 1001 110 0100
unordered_map<char, string> huffmanCodes(string_view message) {
// Count the occurrences of each letter
unordered_map<char, int> occurrences;
for (char c : occurrences) {
occurrences[c]++;
}
// Create priority queue
priority_queue<node<string, int>> ;
for (char c : value) {
if () // c is in occurences
// +1 to occurences
else {
pair<string, int> newOccurrence(c, 1);
occurrences.push_back(newOccurrence);
}
}
// Create binary tree
// Create codes from binary tree
}
string huffmanEncode(unordered_map<char, int>, string message);
string huffmanDecode(unordered_map<char, int>, string message);
char | Code |
---|---|
space | 111 |
e | 010 |
t | 1101 |
a | 1011 |
o | 1001 |
i | 1000 |
n | 0111 |
s | 0011 |
h | 0010 |
r | 0001 |
l | 10101 |
d | 01101 |
c | 00001 |
u | 00000 |
f | 110011 |
m | 110010 |
w | 110001 |
y | 101001 |
p | 101000 |
g | 011001 |
b | 011000 |
v | 1100000 |
k | 11000011 |
x | 110000100 |
j | 1100001011 |
q | 11000010101 |
z | 11000010100 |
Using english Huffman Codes is more efficient for short messages because you don’t need to send the codes along with the message. It’s only worth encoding a message with its own Huffman codes if custom encoding length + custom codes length < english encoding length