Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

(Python) George Kingsley Zipf (1902–1950) observed that the frequency of the kth

ID: 3760263 • Letter: #

Question

(Python)

George Kingsley Zipf (1902–1950) observed that the frequency of the kth most common word in a text is roughly proportional to 1/k. This means that there is a constant value C such that for most words w in the text the following is true:

If word w is kth most common then freq(w) k C

Here, by frequency of word w, freq(w), we mean the number of times the word occurs in the text divided by the total number of words in the text.

Implement function zipf() that takes a file name as input and verifies Zipf’s observation by printing the value freq(w) k for the first 10 most frequent words w in the file. Ignore capitalization and punctuation when processing the file.

The outcome should be look like this You need to make text file called frankenstein.txt

>>> zipf('frankenstein.txt')

0.0557319552019

0.0790477076165

0.113270715149

0.140452498306

0.139097394747

0.141648177917

0.129359248582

0.119993091629

0.122078888284

0.13497894275

Explanation / Answer

In the below program i am using the value of K as .05 you can change it accordingly

import re
import string
from collections import Counter

def ZipF(frankenstein):
k=.005
with open(frankenstein) as f:
passage=f.read()
words = re.findall(r'w+', passage)
cap_words = [word.upper() for word in words]
word_counts = Counter(cap_words)
newstr=word_counts.most_common(10)
for word,occurence in newstr:
print(occurence*k)
  
ZipF('c:\temp\frankenstein.txt');