What is robots.txt?-SEO-php.cn

Home

Topics

SEO

What is robots.txt?

藏色散人

May 23, 2019 am 11:01 AM

Robots.txt is the first file that search engines look at when they visit a website. It is a text file used to specify the scope of crawling of website content by search engines. When a search spider visits a site, it will first check whether robots.txt exists in the root directory of the site. If it exists, it will determine the scope of the visit based on the content in the file.

What is robots.txt?

In the process of website construction, we will have some content that we do not want to be crawled by search engines or do not want it to appear on the Internet, so what should we do? ? How do I tell search engines not to crawl my xx content? This is where robots come in handy.

Robots.txt is the first file that search engines look at when visiting a website. The Robots.txt file tells the spider what files on the server can be viewed.

When a search spider visits a site, it will first check whether robots.txt exists in the root directory of the site. If it exists, the search robot will determine the scope of access based on the contents of the file; if If the file does not exist, all search spiders will be able to access all pages on the website that are not password protected.

Syntax: The simplest robots.txt file uses two rules:

• User-Agent: The robot to which the following rules apply

• Disallow: The web page to be blocked

But we need to pay attention to a few points:

1.robots.txt must be stored in the root directory of the website,

2. Its naming Must be robots.txt, and the file name must be all lowercase.

3.Robots.txt is the first page that search engines visit the website

4.Robots.txt must specify user-agent

robots.txt Misunderstandings

Misunderstanding 1: All files on my website need to be crawled by spiders, so there is no need for me to add the robots.txt file. Anyway, if the file does not exist, all search spiders will be able to access all pages on the website that are not password protected by default.

Whenever a user attempts to access a URL that does not exist, the server will record a 404 error (file cannot be found) in the log. Whenever a search spider looks for a robots.txt file that does not exist, the server will also record a 404 error in the log, so you should add a robots.txt to your website.

Misunderstanding 2: Setting all files in the robots.txt file to be crawled by search spiders can increase the inclusion rate of the website.

Even if the program scripts, style sheets and other files in the website are included by spiders, it will not increase the website's inclusion rate and will only waste server resources. Therefore, you must set it in the robots.txt file not to allow search spiders to index these files.

Specific files that need to be excluded are detailed in the article Tips on Using Robots.txt.

Misunderstanding 3: Search spiders waste server resources when crawling web pages. All search spiders set in the robots.txt file cannot crawl all web pages.

If this is the case, the entire website will not be indexed by search engines.

robots.txt usage tips

1. Whenever a user tries to access a URL that does not exist, the server will record a 404 error (File cannot be found) in the log ). Whenever a search spider looks for a robots.txt file that doesn't exist, the server will also record a 404 error in the log, so you should add a robots.txt to your site.

2. Website administrators must keep spider programs away from certain directories on the server - to ensure server performance. For example: most website servers have programs stored in the "cgi-bin" directory, so it is a good idea to add "Disallow: /cgi-bin" to the robots.txt file to prevent all program files from being indexed by spiders. Can save server resources. Files that do not need to be crawled by spiders in general websites include: background management files, program scripts, attachments, database files, encoding files, style sheet files, template files, navigation pictures and background pictures, etc.

The following is the robots.txt file in VeryCMS:

User-agent: *

Disallow: /admin/ Background management file

Disallow: / require/ Program file

Disallow: /attachment/ Attachment

Disallow: /images/ Picture

Disallow: /data/ Database file

Disallow: / template/ template file

Disallow: /css/ style sheet file

Disallow: /lang/ encoding file

Disallow: /script/ script file

3. If your website has dynamic web pages, and you create static copies of these dynamic web pages to make them easier for search spiders to crawl. Then you need to set up settings in the robots.txt file to prevent dynamic web pages from being indexed by spiders to ensure that these web pages will not be regarded as containing duplicate content.

4. The robots.txt file can also directly include links to the sitemap file. Like this:

Sitemap: http://www.***.com/sitemap.xml

The search engine companies that currently support this include Google, Yahoo, Ask and MSN. Chinese search engine companies are obviously not in this circle. The advantage of this is that the webmaster does not need to go to the webmaster tools or similar webmaster sections of each search engine to submit his own sitemap file. The search engine spider will crawl the robots.txt file and read the content in it. sitemap path, and then crawl the linked web pages.

5. Proper use of the robots.txt file can also avoid errors during access. For example, you can’t let searchers go directly to the shopping cart page. Since there is no reason for the shopping cart to be included, you can set it in the robots.txt file to prevent searchers from entering the shopping cart page directly

The above is the detailed content of What is robots.txt?. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Is rank tracking dead? Why Google's new rules are changing the gameApr 26, 2025 am 09:45 AM

Google's recent JavaScript mandate for search result rendering has sent ripples through the SEO world, impacting rank tracking tools and potentially altering how search visibility is measured. This January change significantly affects tools relying

Google bug cause reviews to drop out of local listingsApr 25, 2025 am 10:10 AM

Many local businesses have recently experienced a significant drop in the number of reviews displayed on their Google Business Profiles. This issue, affecting both Google Search and Google Maps listings, began surfacing on Friday and has prompted wi

Technical SEO post-migration: How to find and fix hidden errorsApr 25, 2025 am 10:08 AM

Website migrations: A minefield of SEO surprises Website migrations are notoriously challenging, even for seasoned technical SEOs. No matter how meticulous your planning, unexpected issues inevitably arise. Post-migration monitoring, therefore, is

7 ways to increase SEO revenue without losing clientsApr 25, 2025 am 09:19 AM

Boosting SEO Revenue: Seven Proven Strategies to Increase Client Payments With marketing budgets tightening and Google reporting strong earnings, selling SEO services effectively is more crucial than ever. This article outlines seven proven methods t

Google lawyer: Less than 1% of YouTube views come from searchApr 24, 2025 am 10:45 AM

A Google legal representative recently revealed that less than 1% of YouTube views originate from Google search clicks. The Statement: During a court proceeding, Attorney John Schmidtlein, representing Google, stated that "roughly less than 1%

Google Search now uses Speculation Rules API to make search fasterApr 24, 2025 am 09:19 AM

Chrome browser improves Google search speed: use the Speculation Rules API to speed up search results Google announced that Chrome browser users will be able to search faster when using Google search. This improvement is driven by the utilization of the Speculation Rules API, which accelerates the overall search experience by prefetching search results. Google wrote in the Chrome Developer Blog: "Google Search has always used the Speculation Rules API to improve navigation speeds from search results pages to result links, and we have used some API features that can be used to create a new look at the latest update.

Google News automated publication pages to start in MarchApr 24, 2025 am 09:18 AM

Google News is going fully automatic. Starting in March, Google will automatically generate all publication pages, ending manual customization options in Publisher Center. This follows an April 2024 announcement phasing out manual publication additi

7 reasons why we love SEOApr 23, 2025 am 10:38 AM

SEO: A Love Story for Digital Marketers Many shy away from SEO's complexity and constant evolution. But for others, it's an all-consuming passion, a dynamic puzzle that keeps them engaged. This Valentine's Day, let's explore why SEO transcends algor

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

1 months agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

1 months agoByDDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks agoByDDD

InZoi: How To Apply To School And University

3 weeks agoByDDD

Hot Tools

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

Hot Topics

Where is the login entrance for gmail email?

7750

1643

1397

1293

1234