在scrapy spider中访问会话cookie

Z时代
2024-01-10
分类：问答

我试图访问spider中的会话cookie。我首先使用Spider登录到社交网络：

    def parse(self, response):
        return [FormRequest.from_response(response,
                formname='login_form',
                formdata={'email': '...', 'pass':'...'},
                callback=self.after_login)]

在中after_login，我想访问会话cookie，以便将它们传递到另一个模块（此处为selenium）以进一步处理经过身份验证的会话的页面。

我想要这样的东西：

     def after_login(self, response):
        # process response
        .....
        # access the cookies of that session to access another URL in the
        # same domain with the autehnticated session.
        # Something like:
        session_cookies = XXX.get_session_cookies()
        data = another_function(url,cookies)

不幸的是，response.cookies不返回会话cookie。

我如何获得会话cookie？我正在查看cookie中间件：scrapy.contrib.downloadermiddleware.cookies和scrapy.http.cookies，但是似乎没有任何直接的方法可以访问会话cookie。

有关我的原始问题，这里有更多详细信息：

不幸的是，我使用了你的想法，但没有看到cookie，尽管我确定它们存在，因为scrapy.contrib.downloadermiddleware.cookies中间件确实会打印出cookie！这些正是我要抓的cookie。

所以这是我在做什么：

after_login（self，response）方法在正确身份验证后接收响应变量，然后使用会话数据访问URL：

  def after_login(self, response):
        # testing to see if I can get the session cookies
        cookieJar = response.meta.setdefault('cookie_jar', CookieJar())
        cookieJar.extract_cookies(response, response.request)
        cookies_test = cookieJar._cookies
        print "cookies - test:",cookies_test
        # URL access with authenticated session
        url = "http://site.org/?id=XXXX"     
        request = Request(url=url,callback=self.get_pict)   
        return [request]

如下面的输出所示，确实存在cookie，但是我无法使用cookieJar捕获它们：

cookies - test: {}
2012-01-02 22:44:39-0800 [myspider] DEBUG: Sending cookies to: <GET http://www.facebook.com/profile.php?id=529907453>
    Cookie: xxx=3..........; yyy=34.............; zzz=.................; uuu=44..........

所以我想得到一个包含键xxx，yyy等以及相应值的字典。

回答：

一个经典的例子是拥有一个登录服务器，该服务器在成功登录后会提供一个新的会话ID。这个新的会话ID应该与另一个请求一起使用。

这是从源代码中挑选的代码，似乎对我有用。

print 'cookie from login', response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1]

码：

def check_logged(self, response):
tmpCookie = response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1]
print 'cookie from login', response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1]
cookieHolder=dict(SESSION_ID=tmpCookie)
#print response.body
if "my name" in response.body:
    yield Request(url="<<new url for another server>>",   
        cookies=cookieHolder,
        callback=self."<<another function here>>")
else:
    print "login failed"
        return

以上是在scrapy spider中访问会话cookie 的全部内容，来源链接： utcz.com/qa/425138.html

在scrapy spider中访问会话cookie

回答：

其他人也看了：