Home  >  Q&A  >  body text

python - 如何爬取登录后的socket连接

需要爬取一个登录以后的socket连接,这个socket会不定期的传数据到网页上,然后我目前只能通过不断刷新网页去完成,有没有什么好的办法可以去爬取需要网页登录后的socket呢?

===
继续描述:

已实现的部分

想要的结果(未实现)

他的socket数据是这样传给浏览器的,我就是想用python搞个socket client接进这个socket,然后等服务器推给我数据.

抽象化的问题

总的来说,爬一个时间点的数据并不困难,但是对于长连接的持续爬虫,持续监控,响应式监控我在网上找不到好的办法,如果是设置定时任务去爬虫,则当采样周期过小的时候(小于1秒),则运算等成本过高且容易被封掉,有没有什么好的办法呢

PHP中文网PHP中文网2741 days ago462

reply all(2)I'll reply

  • PHP中文网

    PHP中文网2017-04-18 10:32:17

    HTTP is stateless, so your 登陆以后status is determined by passing one or more special values ​​to the server (usually in the cookie field of the message header).
    Catch the HTTP packet, and then bring these special values ​​when simulating.


    Update content:
    I saw the Status Code. This should mean that the connection was changed to websocket, so this page must be provided by the other party. You can look at the source code of the page. There should be content like var ws = new WebSocket("ws://ip:3000");. var ws = new WebSocket("ws://ip:3000"); 之类的内容。
    看一下对方客户端的要求,然后改写ws.onmessage回调函数,这个函数的话内容就任你拿捏了,你可以用它来判断返回了新内容或者再去请求另一个服务器来处理这些新内容。
    你可以看一下这篇文章 网页实时聊天之PHP实现websocket 的客户端Look at the requirements of the other client, and then rewrite the ws.onmessage callback function. The content of this function is at your discretion. You can use it to determine whether new content has been returned or to request another server. Work on this new content.

    You can take a look at the client part of this article Real-time chat on web pages using PHP to implement websocket, and try to modify it to meet your needs. 🎜

    reply
    0
  • 怪我咯

    怪我咯2017-04-18 10:32:17

    Just find a websockt client library and connect to it

    reply
    0
  • Cancelreply