如何从Twitter Search API创建熊猫数据框?
我正在使用Twitter搜索API,它返回字典的词典。我的目标是从响应字典中的键列表中创建一个数据框。如何从Twitter Search API创建熊猫数据框?
API响应的例子在这里:Example Response
我的状态字典中的密钥列表
keys = ["created_at", "text", "in_reply_to_screen_name", "source"]
我想通过的状态字典中返回的每个键值循环并把它们在以键为列的数据框。
当前有代码循环遍历一个单独的键,并分配给列表,然后追加到数据框,但想要一种方法一次执行多个键。当前代码如下:
#w is the word to be queired w = 'keyword'
#count of tweets to return
count = 1000
#API call
query = twitter.search.tweets(q= w, count = count)
def data_l2 (q, k1, k2):
data = []
for results in q[k1]:
data.append(results[k2])
return(data)
screen_names = data_l3(query, "statuses", "user", "screen_name")
data = {'screen_names':screen_names,
'tweets':tweets}
frame=pd.DataFrame(data)
frame
回答:
我会分享一个更通用的解决方案,因为我正在使用Twitter API。比方说,你必须要在一个名为my_ids
列表获取微博的ID的:
# Fetch tweets from the twitter API using the following loop: list_of_tweets = []
# Tweets that can't be found are saved in the list below:
cant_find_tweets_for_those_ids = []
for each_id in my_ids:
try:
list_of_tweets.append(api.get_status(each_id))
except Exception as e:
cant_find_tweets_for_those_ids.append(each_id)
然后在此代码块中,我们分离,我们已经下载的每tweepy状态对象的JSON一部分,我们添加所有到列表....
my_list_of_dicts = [] for each_json_tweet in list_of_tweets:
my_list_of_dicts.append(each_json_tweet._json)
...我们写这个列表到一个txt文件:
with open('tweet_json.txt', 'w') as file: file.write(json.dumps(my_list_of_dicts, indent=4))
现在我们要创建一个从tweet_json.txt文件中的数据帧(I添加了一些按键那名相关的,我是工作在我的使用情况,但您可以添加,而不是特定的键):
my_demo_list = [] with open('tweet_json.txt', encoding='utf-8') as json_file:
all_data = json.load(json_file)
for each_dictionary in all_data:
tweet_id = each_dictionary['id']
whole_tweet = each_dictionary['text']
only_url = whole_tweet[whole_tweet.find('https'):]
favorite_count = each_dictionary['favorite_count']
retweet_count = each_dictionary['retweet_count']
created_at = each_dictionary['created_at']
whole_source = each_dictionary['source']
only_device = whole_source[whole_source.find('rel="nofollow">') + 15:-4]
source = only_device
retweeted_status = each_dictionary['retweeted_status'] = each_dictionary.get('retweeted_status', 'Original tweet')
if retweeted_status == 'Original tweet':
url = only_url
else:
retweeted_status = 'This is a retweet'
url = 'This is a retweet'
my_demo_list.append({'tweet_id': str(tweet_id),
'favorite_count': int(favorite_count),
'retweet_count': int(retweet_count),
'url': url,
'created_at': created_at,
'source': source,
'retweeted_status': retweeted_status,
})
tweet_json = pd.DataFrame(my_demo_list, columns = ['tweet_id', 'favorite_count',
'retweet_count', 'created_at',
'source', 'retweeted_status', 'url'])
以上是 如何从Twitter Search API创建熊猫数据框? 的全部内容, 来源链接: utcz.com/qa/258424.html