怪我咯2017-04-17 17:35:35
Add robots.txt to tell the crawler not to crawl my website, but it will not be forcibly banned. This is just an agreement that both parties need to abide by.
巴扎黑2017-04-17 17:35:35
I don’t know if the crawler you are talking about refers to Baidu crawler or the crawler we wrote ourselves.
Baidu crawlers can just follow the method above. There are many ways to prevent other people's crawlers, such as dynamically generating all classes or ids. Because crawlers usually parse HTML to get what they want through class or id.
大家讲道理2017-04-17 17:35:35
It also depends on what kind of reptile it is
A gentleman? Miniature?
If this crawler can abide by the robots.txt agreement, then it’s fine
But this is just a gentleman’s agreement
If it encounters a villain, then it’s okay
迷茫2017-04-17 17:35:35
1) You can try gzip compression for JS. Many crawlers will not crawl gzip-compressed js.
2) Use log to analyze the logs of the web server. If it is malicious access to your key resources, and the other party is a fixed IP , you can try to ban the other party’s IP
天蓬老师2017-04-17 17:35:35
It’s useless. First of all, if your website is open to people, it will naturally be open to crawlers, unless it is changed to an internal network. If you focus on preventing crawlers from getting up, you might as well improve the quality. Now it is a classified information website It’s all crawling around, but the user experience is basically not improved.
迷茫2017-04-17 17:35:35
Pfft, you can mess up the class and id so that there is no pattern and even the regular rules will not match
阿神2017-04-17 17:35:35
I don’t know if it’s possible to dynamically generate all web content using js
巴扎黑2017-04-17 17:35:35
First of all, it is difficult for you to prevent 100% crawlers from being crawled, unless it is an internal network as mentioned above.
But you can take some measures to prevent some low-tech crawlers from crawling your website.
For specific measures, you can go to Zhihu. To read this article, click here
Hope it helps you