Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Some Problems about Python How to search certain words in .json file ? There is

ID: 3571915 • Letter: S

Question

Some Problems about Python

How to search certain words in .json file ?

There is a project required me to get the tweets from Twitter. Then I will get a .json file like this:

{"text": "IMG_8271 https://t.co/PZKIIfh0Ym", "coordinates": [-123.01179819, 37.69899379]}
{"text": "We're #hiring! Click to apply: Barista - https://t.co/6ljgCrRYX0 #Job #barista #Hospitality #Framingham, MA #Jobs", "coordinates": [-71.4161565, 42.279286]}
{"text": "Can you recommend anyone for this #job in #Miami, FL? https://t.co/pLHHmAqzfW #Hospitality #Hiring", "coordinates": [-80.4326521, 25.8433674]}
{"text": "See our latest #DesMoines, IA #job and click to apply: Lab Support Outside Sales Representative (Account... - https://t.co/AVhEJE4NGg", "coordinates": [-93.6091064, 41.6005448]}
{"text": "@dmvgayevents #WetNWildWednesdays WEDNESDAY, NOVEMBER 9TH CLUB BUNNS 608 WEST LEXINGTONu2026 https://t.co/tqot5bbvS2", "coordinates": [-76.62400928, 39.29153126]}
{"text": "Hello (@ Mercy Medical Center - @mercycr in Cedar Rapids, IA) https://t.co/nqvx2KhLdl", "coordinates": [-91.65579585, 41.97795944]}
{"text": "Autumn colors. @ City of Oakland https://t.co/agREI75ceA", "coordinates": [-122.19445454, 37.74303723]}
{"text": "Want to work in #Livermore, CA? View our latest opening: https://t.co/3Cs9PoVOWM #Job #SupplyChain #Jobs #Hiring https://t.co/Jfd9g52B1v", "coordinates": [-121.7680088, 37.6818745]}
{"text": "@TestSheepNZ @ilarihenrik @mubbashir I generally lack respect for simplistic solutions", "coordinates": [-122.04138774, 37.31605322]}
{"text": "You are important. Act like it. #ctinjurylawyer @ Connecticut Trial Firm, LLC https://t.co/GKrJdtMNQ4", "coordinates": [-72.6260605, 41.7226105]}
{"text": "6 direction of space ud83dude80", "coordinates": [-99.6300776, 36.8417518]}

And then I need to search certain word like biber and seperate each tweets like this:


justin bieber...doesn't deserve the award..eminem deserves it.
The words of the tweet should be considered:
['justin', 'bieber', 'doesn', 't', 'deserve', 'the', 'award', 'eminem', 'deserv
es', 'it']

So I don''t know how to deal with the json, please give me some sample codes to help me approach, Thank You!

Explanation / Answer

! /usr/bin/env python
import codecs

from datetime import datetime
import json

import requests
import os
import string
import sys
import time

def parse_json_tweet(line):
tweet = json.loads(line)
#print line
if tweet['lang'] != 'en':
    #print "non-english tweet:", tweet['lang'], tweet
    return ['', '', '', [], [], []]

date = tweet['created_at']
id = tweet['id']
nfollowers = tweet['user']['followers_count']
nfriends = tweet['user']['friends_count']

if 'retweeted_status' in tweet:
   text = tweet['retweeted_status']['text']
else:
   text = tweet['text']

hashtags = [hashtag['text'] for hashtag in tweet['entities']['hashtags']]
users = [user_mention['screen_name'] for user_mention in tweet['entities']['user_mentions']]
urls = [url['expanded_url'] for url in tweet['entities']['urls']]
  
media_urls = []
if 'media' in tweet['entities']:
   media_urls = [media['media_url'] for media in tweet['entities']['media']]  

return [date, id, text, hashtags, users, urls, media_urls, nfollowers, nfriends]  
  
# def follow_shortlinks(shortlinks):
# """Follow redirects in list of shortlinks, return dict of resulting long URLs"""
# links_followed = {}
# for shortlink in shortlinks:
# request_result = requests.get(shortlink)
# links_followed[shortlink] = request_result.url
# return links_followed
  
'''start main'''
if __name__ == "__main__":
   file_timeordered_json_tweets = codecs.open(sys.argv[1], 'r', 'utf-8')
   fout = codecs.open(sys.argv[2], 'w', 'utf-8')

   #efficient line-by-line read of big files  
   for line in file_timeordered_json_tweets:
       try:
           [tweet_gmttime, tweet_id, text, hashtags, users, urls, media_urls, nfollowers, nfriends] = parse_json_tweet(line)
#        if not tweet_gmttime: continue
#        fout.write(line)
        #"created_at":"Mon Feb 17 14:14:44 +0000 2014"
           try:
               c = time.strptime(tweet_gmttime.replace("+0000",''), '%a %b %d %H:%M:%S %Y')
           except:
               print "pb with tweet_gmttime", tweet_gmttime, line
               pass  
           tweet_unixtime = int(time.mktime(c))
#           fout.write(line)
           fout.write(str([tweet_unixtime, tweet_gmttime, tweet_id, text, hashtags, users, urls, media_urls, nfollowers, nfriends]) + " ")
       except:
           #print "pb with tweet:", line
#           print sys.exc_info()[0], line
           pass
    file_timeordered_json_tweets.close()
    fout.close()

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote