Build a data compression system using the Huffman’s algorithm. In other words, y
ID: 3591430 • Letter: B
Question
Build a data compression system using the Huffman’s algorithm. In other words, you are given a sequence of alphabet characters and you are asked to construct the Huffman’s code. Your “output.txt” file should have the bit representation for every character that appears in the file (and should not contain characters that are NOT present in the input file). All characters in the input file will be normal English lower-case letters.
Sample output.txt (any valid Huffman code in this format):
a:0
b:101
c:100
d:111
e:1101
f:1100
Sample input.txt (all characters on a single line with no spaces between):
ffffeeeeeeeeefccccccccccccbbbbbbbbbbbbbddddddddddddddddaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Explanation / Answer
def huffman(p):
assert(sum(p.values()) == 1.0)
if(len(p) == 2):
return dict(zip(p.keys(), ['0', '1']))
Prime = p.copy()
a1, a2 = LowestProbPair(p)
p1, p2 = Prime.pop(a1), Prime.pop(a2)
Prime[a1 + a2] = p1 + p2
c = huffman(Prime)
ca1a2 = c.pop(a1 + a2)
c[a1], c[a2] = ca1a2 + '0', ca1a2 + '1'
return c
def LowestProbPair(p):
assert(len(p) >= 2)
sorted_p = sorted(p.items(), key=lambda (i,pi): pi)
return sorted_p[0][0], sorted_p[1][0]
charDict = {}
inputFile = open("test.txt","r")
b = inputFile.read()
for i in range(len(b)):
if (b[i] not in charDict.keys()):
charDict[b[i]] = 1
else:
charDict[b[i]] = charDict[b[i]] + 1
outputFile = open("output.txt","a")
Huffman = huffman(charDict)
for key in Huffman:
line = "%s : %d " % (key,Huffman[key])
outputFile.write(line)
outputFile.close()
inputFile.close()
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.