Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Python 3.x Huffman Trees This is what needs to be completed HuffmanHeap.py: Enco

ID: 3920061 • Letter: P

Question

Python 3.x Huffman Trees

This is what needs to be completed HuffmanHeap.py:

Encoder.py

Thanks for the help if you could explain how you did what you did that would be great too

Question 1 (17 points): Purpose: To implement an ADT to support efficient execution of a HuffmanTree encoder Degree of Difficulty: Easy to Moderate Phoenix Wright, ace attorney, loves being lazy - erm, efficient. Phoenix is also short on money. He has heard of text compression, and thinks it might be a good idea to compress all his text files to save on valuable disk space (disk space is expensive for him!). Larry Butz, his useless IT friend, wrote most of the code for text compression using Huffman trees, but did not finish the job (surprising no one). On the Assignment 10 Moodle page, you'll find a program named Encoder.py. The program follows a simple design, and is well-documented. All the pieces should be familiar to you, as they are similar to discussion in class. The Encoder.py program imports two supporting ADTs » HuffmanTree.py Defines the Huf fmanTree class . HuffmanHeap.py Defines the HuffmanHeap class. The HuffmanTree class was discussed in lecture, and its definition should be familiar. There is one change that you should note: the function build_codec that builds the codec from a HuffmanTree object is novw a method in the Huf fmanTree class The ADT HuffmanHeap, was also discussed in lecture, though the code was not given. A heap is a kind of queue that guarantees that the dequeue operation always removes and returns the smallest value in the queue (this is not the FIFO protoco). The HuffmanHeap stores HuffmanTree objects, and has the following methods . __init_.(self, leafs): Creates a new HuffmanHeap object. . enqueue(self, tree): Adds the given HuffmanTree to the heap . dequeue(self): Removes and returns the smallest HuffmanTree in the heap ·-len--(self): Returns the number of HuffmanTrees in the heap. Defining this method allows pro- grammers to call len ) on HuffmanHeap objects In lecture we saw that we could implement these operations so that they all have O(n) worst case time complexity, where n is the number of HuffmanTree objects in the HuffmanHeap. However, we also de- scribed in lecture how these operations could be implemented so that they all have worst case time com plexity of O(1). Hint: use 2 lists The HuffmanHeap class is incomplete. Your job is to complete the 4 methods described above, so that the Encoder.py program correctly encodes text files. (When you've completed Questions 1 and 2, you should be able to encode and decode any text files. f you want to test your encoder, you can try decoding with the program given for Question 2.) NOTE: some symbols do not work for our encoding solution. apostrophes, semi-colons and non-standard symbols may have errors

Explanation / Answer

import heapq

import os

from functools import total_ordering

class HeapNode:

def __init__(self, char, freq):

self.char = char

self.freq = freq

self.left = None

self.right = None

# defining comparators less_than and equals

def __lt__(self, other):

return self.freq < other.freq

def __eq__(self, other):

if(other == None):

return False

if(not isinstance(other, HeapNode)):

return False

return self.freq == other.freq

class HuffmanCoding:

def __init__(self, path):

self.path = path

self.heap = []

self.codes = {}

self.reverse_mapping = {}

# functions for compression:

def make_frequency_dict(self, text):

frequency = {}

for character in text:

if not character in frequency:

frequency[character] = 0

frequency[character] += 1

return frequency

def make_heap(self, frequency):

for key in frequency:

node = HeapNode(key, frequency[key])

heapq.heappush(self.heap, node)

def merge_nodes(self):

while(len(self.heap)>1):

node1 = heapq.heappop(self.heap)

node2 = heapq.heappop(self.heap)

merged = HeapNode(None, node1.freq + node2.freq)

merged.left = node1

merged.right = node2

heapq.heappush(self.heap, merged)

def make_codes_helper(self, root, current_code):

if(root == None):

return

if(root.char != None):

self.codes[root.char] = current_code

self.reverse_mapping[current_code] = root.char

return

self.make_codes_helper(root.left, current_code + "0")

self.make_codes_helper(root.right, current_code + "1")

def make_codes(self):

root = heapq.heappop(self.heap)

current_code = ""

self.make_codes_helper(root, current_code)

def get_encoded_text(self, text):

encoded_text = ""

for character in text:

encoded_text += self.codes[character]

return encoded_text

def pad_encoded_text(self, encoded_text):

extra_padding = 8 - len(encoded_text) % 8

for i in range(extra_padding):

encoded_text += "0"

padded_info = "{0:08b}".format(extra_padding)

encoded_text = padded_info + encoded_text

return encoded_text

def get_byte_array(self, padded_encoded_text):

if(len(padded_encoded_text) % 8 != 0):

print("Encoded text not padded properly")

exit(0)

b = bytearray()

for i in range(0, len(padded_encoded_text), 8):

byte = padded_encoded_text[i:i+8]

b.append(int(byte, 2))

return b

def compress(self):

filename, file_extension = os.path.splitext(self.path)

output_path = filename + ".bin"

with open(self.path, 'r+') as file, open(output_path, 'wb') as output:

text = file.read()

text = text.rstrip()

frequency = self.make_frequency_dict(text)

self.make_heap(frequency)

self.merge_nodes()

self.make_codes()

encoded_text = self.get_encoded_text(text)

padded_encoded_text = self.pad_encoded_text(encoded_text)

b = self.get_byte_array(padded_encoded_text)

output.write(bytes(b))

print("Compressed")

return output_path

""" functions for decompression: """

def remove_padding(self, padded_encoded_text):

padded_info = padded_encoded_text[:8]

extra_padding = int(padded_info, 2)

padded_encoded_text = padded_encoded_text[8:]

encoded_text = padded_encoded_text[:-1*extra_padding]

return encoded_text

def decode_text(self, encoded_text):

current_code = ""

decoded_text = ""

for bit in encoded_text:

current_code += bit

if(current_code in self.reverse_mapping):

character = self.reverse_mapping[current_code]

decoded_text += character

current_code = ""

return decoded_text

def decompress(self, input_path):

filename, file_extension = os.path.splitext(self.path)

output_path = filename + "_decompressed" + ".txt"

with open(input_path, 'rb') as file, open(output_path, 'w') as output:

bit_string = ""

byte = file.read(1)

while(len(byte) > 0):

byte = ord(byte)

bits = bin(byte)[2:].rjust(8, '0')

bit_string += bits

byte = file.read(1)

encoded_text = self.remove_padding(bit_string)

decompressed_text = self.decode_text(encoded_text)

output.write(decompressed_text)

print("Decompressed")

return output_path