我在做一只淘宝的爬虫,但是用的是香港的服务器,但是比较困惑:
因为每次爬淘宝的首页时候,就自动给我跳转到香港淘宝~~
导致源代码和内容都不一样~
请问如果遇到这种情况要怎么处理呢?
简单来说,比如采集58同城
如果我是泉州的,我想采集北京的,要怎么采集?
因为我用我的ip打开会总跳转到北京,但是直接想采集58首页的
?
![图片上传中...]
PHP中文网2017-04-18 10:21:01
Disable redirection, take requests as an example:
r = requests.get('http://github.com/', allow_redirects=False)
r.status_code # 302
r.url # http://github.com, not https.
r.headers['Location'] # https://github.com/ -- the redirect destination
PHP中文网2017-04-18 10:21:01
If you want to collect from Beijing, just enter the city name, but it is protected by PGTID
http://bj.58.com/?PGTID=0d000...
Jianyi uses selenium
迷茫2017-04-18 10:21:01
Sometimes the server will redirect based on the geographical location information corresponding to your IP. You should have no other way except to find a proxy. .