Home  >  Article  >  Web Front-end  >  The following exception always occurs when running the crawler program_html/css_WEB-ITnose

The following exception always occurs when running the crawler program_html/css_WEB-ITnose

WBOY
WBOYOriginal
2016-06-24 12:25:321149browse

The program I wrote can run, but the following exception always occurs to interrupt the operation. Then when I run the program again, there is no problem, many times

1.java.net.SocketTimeoutException: Read timed out

2.java.net.SocketTimeoutException: connect timed out

3.java.net.ConnectException: Connection timed out: connect

The above three exceptions were confirmed after tracking They all appear when executing this code:

doc = Jsoup.connect( url) .data("query", "Java")
.userAgent("Mozilla")
. cookie("auth", "token")
.timeout(300000)
.post();
Could you please tell me what is this situation and how to solve it? Because this program is used to crawl web page data, it has to load the URLs of tens of thousands of sub-web pages in a loop. Is this the reason? How can I solve this? Please give me some advice, I am crying, I am crying....


Reply to the discussion (solution)

The network is unstable or the requested web page is out of date

You are of no use Exception handling
You are now doing io operations and it is remote
You cannot completely confirm that there will be no problems in the entire operating environment

So you must at least catch the exception and restart the exception you just made Next interrupted task

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn