Home  >  Article  >  Backend Development  >  How can I crawl the website more quickly and effectively?

How can I crawl the website more quickly and effectively?

WBOY
WBOYOriginal
2016-08-31 08:41:071353browse

Hello everyone, I am actually a layman. When I was in school, I played Westward Journey and QQ Fantasy. Later, Rocky learned a little bit of Button Wizard (a programming language similar to VB) to help me play games. This is my programming foundation.

If I crawl other people’s websites, I will first save the URL that needs to be crawled in a TXT or Excel file.

Use the key wizard to open the browser and simulate manually (shortcut keys or mouse clicks) entering the TXT or Excel URL.

Then simulate manual selection, and then use string processing functions, mid, right, left, len, instr to extract the required strings.

Then save to Excel, or txt.

This actually consumes a lot of machine power, uses a lot of CPU, and also takes up a lot of network speed. Because there are many unnecessary images to be loaded, such as pictures, flash, mpg files, etc.
And errors often occur. It may be an excel error or a script error. Many times it is a browser error.

Please tell me, how did you do it?

I currently know php, MySQL, JavaScript, jQuery, ajax and other programming languages. I also understand the data of json, xml, and html.

I hope everyone can combine what I know. Of course, convenience is the main thing. If you have something more convenient, you can tell me.

In addition, for the debugging information of the browser, that is, the f12 panel, I will look at the output of js.

As long as you have ideas, you are welcome to answer. I have a low starting point, and basically any answer will help me. Thank you!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn