search

Home  >  Q&A  >  body text

python - 为什么明明我可以访问的网站, urlopen却会报 404: Not Found

有的说是因为代理.
我的浏览器倒是经常开着代理, 但是我已经关闭了.
我特意查看了下HTTP报文, 也都是没经过代理的.
但还是会出错.

代码:

import urllib.request

url = "http://news.dbanotes.net/"
req = urllib.request.Request(url)

page = urllib.request.urlopen(req).read().decode("UTF-8")
print(page)

python版本: 3.5.1

报错信息:
urllib.error.HTTPError: HTTP Error 404: Not Found

应该可以排除以下的问题:

至于反爬虫这个我应该可能性也不大,
一来我试了很多个URL, 基本就是六四分, 有的能访问, 有的不能, 而且我们学校的官网那么, 我才不信他会...
二来我加了User-Agent的首部还是不能访问.

PHPzPHPz2905 days ago540

reply all(5)I'll reply

  • 大家讲道理

    大家讲道理2017-04-18 09:42:11

    There is no problem with my python 3.5.2 under windows.
    It is recommended that you capture the packet and compare it with the request when accessed by the browser.

    Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64 bit (AMD64)] on win32
    >>> 
    >>> 
    >>> 
    >>> import urllib.request
    >>> url = "http://news.dbanotes.net/"
    >>> req = urllib.request.Request(url)
    >>> page = urllib.request.urlopen(req).read()
    >>> page
    b'<html><head><link rel="stylesheet" type="text/css" href="http://news.dbanotes.net/news.css">\n<script type="text/javascript" src="http://news.dbanotes.net/jailbreak.js"></script>\n<link rel="shortcut icon" ...'
    

    reply
    0
  • 伊谢尔伦

    伊谢尔伦2017-04-18 09:42:11

    This may be related to the setting value of your agent, because some websites will check this to prevent non-browsers from crawling

    reply
    0
  • 巴扎黑

    巴扎黑2017-04-18 09:42:11

    You copy the headers and cookies from the browser and add them to the Request object of urllib.
    Simulated browser~~

    reply
    0
  • 天蓬老师

    天蓬老师2017-04-18 09:42:11

    A very important reason is that the agent header you requested in your program has been blocked by the other party. Try changing the agent header.

    reply
    0
  • 阿神

    阿神2017-04-18 09:42:11

    No need for Request, just urlopen directly

    reply
    0
  • Cancelreply