This module implements functionalities relating to Huffman encoding and decoding.
AUTHOR:
Bases: sage.structure.sage_object.SageObject
This class implements the basic functionalities of Huffman codes.
It can build a Huffman code from a given string, or from the information of a dictionary associating to each key (the elements of the alphabet) a weight (most of the time, a probability value or a number of occurrences).
INPUT:
source – can be either
- A string from which the Huffman encoding should be created.
- A dictionary that associates to each symbol of an alphabet a numeric value. If we consider the frequency of each alphabetic symbol, then source is considered as the frequency table of the alphabet with each numeric (non-negative integer) value being the number of occurrences of a symbol. The numeric values can also represent weights of the symbols. In that case, the numeric values are not necessarily integers, but can be real numbers.
In order to construct a Huffman code for an alphabet, we use exactly one of the following methods:
Examples:
sage: from sage.coding.source_coding.huffman import Huffman, frequency_table
sage: h1 = Huffman("There once was a french fry")
sage: for letter, code in h1.encoding_table().iteritems():
... print "'"+ letter + "' : " + code
'a' : 0111
' ' : 00
'c' : 1010
'e' : 100
'f' : 1011
'h' : 1100
'o' : 11100
'n' : 1101
's' : 11101
'r' : 010
'T' : 11110
'w' : 11111
'y' : 0110
We can obtain the same result by “training” the Huffman code with the following table of frequency:
sage: ft = frequency_table("There once was a french fry"); ft
{'a': 2, ' ': 5, 'c': 2, 'e': 4, 'f': 2, 'h': 2, 'o': 1, 'n': 2, 's': 1, 'r': 3, 'T': 1, 'w': 1, 'y': 1}
sage: h2 = Huffman(ft)
Once h1 has been trained, and hence possesses an encoding table, it is possible to obtain the Huffman encoding of any string (possibly the same) using this code:
sage: encoded = h1.encode("There once was a french fry"); encoded
'11110110010001010000111001101101010000111110111111010001110010110101001101101011000010110100110'
We can decode the above encoded string in the following way:
sage: h1.decode(encoded)
'There once was a french fry'
Obviously, if we try to decode a string using a Huffman instance which has been trained on a different sample (and hence has a different encoding table), we are likely to get some random-looking string:
sage: h3 = Huffman("There once were two french fries")
sage: h3.decode(encoded)
' wehnefetrhft ne ewrowrirTc'
This does not look like our original string.
Instead of using frequency, we can assign weights to each alphabetic symbol:
sage: from sage.coding.source_coding.huffman import Huffman
sage: T = {"a":45, "b":13, "c":12, "d":16, "e":9, "f":5}
sage: H = Huffman(T)
sage: L = ["deaf", "bead", "fab", "bee"]
sage: E = []
sage: for e in L:
... E.append(H.encode(e))
... print E[-1]
...
111110101100
10111010111
11000101
10111011101
sage: D = []
sage: for e in E:
... D.append(H.decode(e))
... print D[-1]
...
deaf
bead
fab
bee
sage: D == L
True
Decode the given string using the current encoding table.
INPUT:
OUTPUT:
EXAMPLES:
This is how a string is encoded and then decoded:
sage: from sage.coding.source_coding.huffman import Huffman
sage: str = "Sage is my most favorite general purpose computer algebra system"
sage: h = Huffman(str)
sage: encoded = h.encode(str); encoded
'00000110100010101011000011101010011100101010011011011100111101110010110100001011011111000001110101010001010110011010111111011001110100101000111110010011011100101011100000110001100101000101110101111101110110011000101011000111111101101111010010111001110100011'
sage: h.decode(encoded)
'Sage is my most favorite general purpose computer algebra system'
TESTS:
Of course, the string one tries to decode has to be a binary one. If not, an exception is raised:
sage: h.decode('I clearly am not a binary string')
Traceback (most recent call last):
...
ValueError: Input must be a binary string.
Encode the given string based on the current encoding table.
INPUT:
OUTPUT:
EXAMPLES:
This is how a string is encoded and then decoded:
sage: from sage.coding.source_coding.huffman import Huffman
sage: str = "Sage is my most favorite general purpose computer algebra system"
sage: h = Huffman(str)
sage: encoded = h.encode(str); encoded
'00000110100010101011000011101010011100101010011011011100111101110010110100001011011111000001110101010001010110011010111111011001110100101000111110010011011100101011100000110001100101000101110101111101110110011000101011000111111101101111010010111001110100011'
sage: h.decode(encoded)
'Sage is my most favorite general purpose computer algebra system'
Returns the current encoding table.
INPUT:
OUTPUT:
EXAMPLES:
sage: from sage.coding.source_coding.huffman import Huffman
sage: str = "Sage is my most favorite general purpose computer algebra system"
sage: h = Huffman(str)
sage: T = sorted(h.encoding_table().items())
sage: for symbol, code in T:
... print symbol, code
...
101
S 00000
a 1101
b 110001
c 110000
e 010
f 110010
g 0001
i 10000
l 10011
m 0011
n 110011
o 0110
p 0010
r 1111
s 1110
t 0111
u 10001
v 00001
y 10010
Returns the Huffman tree corresponding to the current encoding.
INPUT:
OUTPUT:
EXAMPLES:
sage: from sage.coding.source_coding.huffman import Huffman
sage: str = "Sage is my most favorite general purpose computer algebra system"
sage: h = Huffman(str)
sage: T = h.tree(); T
Digraph on 39 vertices
sage: T.show(figsize=[20,20])
Return the frequency table corresponding to the given string.
INPUT:
OUTPUT:
EXAMPLES:
The frequency table of a non-empty string:
sage: from sage.coding.source_coding.huffman import frequency_table
sage: str = "Stop counting my characters!"
sage: T = sorted(frequency_table(str).items())
sage: for symbol, code in T:
... print symbol, code
...
3
! 1
S 1
a 2
c 3
e 1
g 1
h 1
i 1
m 1
n 2
o 2
p 1
r 2
s 1
t 3
u 1
y 1
The frequency of an empty string:
sage: frequency_table("")
{}