(Python) George Kingsley Zipf (1902–1950) observed that the frequency of the kth
ID: 3760263 • Letter: #
Question
(Python)
George Kingsley Zipf (1902–1950) observed that the frequency of the kth most common word in a text is roughly proportional to 1/k. This means that there is a constant value C such that for most words w in the text the following is true:
If word w is kth most common then freq(w) k C
Here, by frequency of word w, freq(w), we mean the number of times the word occurs in the text divided by the total number of words in the text.
Implement function zipf() that takes a file name as input and verifies Zipf’s observation by printing the value freq(w) k for the first 10 most frequent words w in the file. Ignore capitalization and punctuation when processing the file.
The outcome should be look like this You need to make text file called frankenstein.txt
>>> zipf('frankenstein.txt')
0.0557319552019
0.0790477076165
0.113270715149
0.140452498306
0.139097394747
0.141648177917
0.129359248582
0.119993091629
0.122078888284
0.13497894275
Explanation / Answer
In the below program i am using the value of K as .05 you can change it accordingly
import re
import string
from collections import Counter
def ZipF(frankenstein):
k=.005
with open(frankenstein) as f:
passage=f.read()
words = re.findall(r'w+', passage)
cap_words = [word.upper() for word in words]
word_counts = Counter(cap_words)
newstr=word_counts.most_common(10)
for word,occurence in newstr:
print(occurence*k)
ZipF('c:\temp\frankenstein.txt');
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.