用python写了个批量关注知乎用户的程序，但是关注的数量（网页显示）与代码中提示的数量明显不一样。

Z时代
2024-01-10
分类：技术分享

自己手动大量关注用户提示
点击header提示
#-- coding: utf-8 --

__author__ = 'Wu_cf'
from auth import islogin
from auth import Logging
import requests,cookielib,sys,urllib,urllib2,os,json,re,threading,Queue,time
from bs4 import BeautifulSoup
requests = requests.Session()
requests.cookies = cookielib.LWPCookieJar('cookies')
try:
    requests.cookies.load(ignore_discard=True)
except:
    print u"尚未登录知乎=="
if islogin() != True:
    print u"请重新登录=="
#字符编码设置
reload(sys)
sys.setdefaultencoding('utf8')
'''
1:每一个用户都有一个hash_id应该是其用户标志，可以在chorme浏览器看得到这个，然后在html页面全局搜索即可用正则获得该值
2：可以在批量获得某个话题下面的用户。
3：更换话题，假如想关注NBA底下的所有用户，需要首先获得nba这个话题的link-id
4:start关键字这么来的 t = int(time.time()) 十位数的时间戳
'''
topic_url = 'http://www.zhihu.com/topic/19579266/followers'
#得到xsrf
def getXsrf():
    r = requests.get(topic_url)
    raw_xsrf = re.findall('xsrf(.*)', r.text)
    _xsrf = raw_xsrf[0][9:-3]
    return _xsrf
#得到要关注的用户的hash_id
def getHash():
    global header_info
    hash_id_all = []
    post_url = topic_url
    xsrf = getXsrf()
    header_info = {
    "Accept":"*/*",
    "Accept-Encoding":"gzip,deflate,sdch",
    "Accept-Language":"zh-CN,zh;q=0.8",
    "Connection":"keep-alive",
    "Content-Length":"127",
    "Content-Type":"application/x-www-form-urlencoded; charset=UTF-8",
    "DNT":"1",
    "Host":"www.zhihu.com",
    "Origin":"http://www.zhihu.com",
    "Referer":topic_url,
    "User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36",
    "X-Requested-With":"XMLHttpRequest",
    }
    for i in range(15):
         x = 0 + i * 20
         start = int(time.time())
         payload={"offset":x, "start":start, "_xsrf":xsrf,}
        # time.sleep(3)
         result = requests.get(post_url, data=payload, headers=header_info)
         #print result.text
         raw_hash_id = re.findall('<a href=.*? name=.*? class=.*? id=(.*?)>.*?</a>', result.text)
         for item in raw_hash_id:
             if item[4:36] > 32:
               hash_id_all.append(item[4:36])
         print "get hash_id_page",i
    return hash_id_all
#进行关注操作
def getFocus():
    hash_id = getHash()
    xsrf = getXsrf()
    i = 0
    for x in hash_id:
        i = i + 1
        params = json.dumps({"hash_id":x})
        payload={"method":"follow_member", "params":params, "_xsrf":xsrf,}
        click_url = 'http://www.zhihu.com/node/MemberFollowBaseV2'
        try:
            #延时
            #time.sleep(1)
            result = requests.post(click_url, data=payload, headers=header_info)
        except Exception as e:
            print u"不能关注了"
        # 这个参数表示返回msg 如果r为0即表示关注成功
        response = json.loads(result.content)
        if response["r"] == 0:
            print u"关注成功"," ",response["r"]," ",i
        else:
            print u"fucking"
    print u"就这么多了！！！！"
def main():
   getFocus()
if __name__ == '__main__':
    main()

回答：

猜测知乎有部分用户列表是ajax获取的，因此抓主html看不到

回答：

getFocus()方法中，每次循环开始时都把i加1（不管有没有关注成功）。把它改成关注成功时再加1试试？

有些地方可以改进一下，比如获取hash_id只要一步就够了：

ids = re.compile(r'id=\\"pp-(.*?)\\"', result.text)

以上是用python写了个批量关注知乎用户的程序，但是关注的数量（网页显示）与代码中提示的数量明显不一样。的全部内容，来源链接： utcz.com/a/164882.html

用python写了个批量关注知乎用户的程序，但是关注的数量（网页显示）与代码中提示的数量明显不一样。

回答：

回答：

其他人也看了：