Home >Backend Development >PHP Tutorial >WordPress specific articles are hidden from search engines or only allowed to be viewed by search engines, wordpress search engine_PHP tutorial
Hide specific articles from search engines
The source of this question is this:
As we all know, as search engines improve, they will increasingly exclude collection and pseudo-original content. In particular, Baidu has also launched an origin algorithm and implemented measures such as K-sites for collection sites. If it is labeled as a collection site, all your efforts may be in vain.
I believe that many webmasters also want more original content, and do not want to rely entirely on collecting other people’s articles. However, a new website, especially a personal webmaster, must be very slow to enrich its content. When building a website, we must not only please search engines, but also please readers. If readers cannot get more substantial and rich information on your site, the experience will definitely not be good. In fact, all famous and old websites also have a considerable proportion of collected or adapted content, which is in line with the sharing spirit of the Internet. Most of the major TV stations and newspapers reprint and excerpt them. As long as they are well extracted and meet the content needs of specific needs, they are valuable.
The key is: don’t use collected articles to defraud search traffic for your website. This should be in line with Internet ethics and consensus. If only original content is allowed to participate in search engine game rules, the non-original content is blocked from search engines. In this way, the interests of search engines, website owners, and users can be equally satisfied.
So the problem boils down to one point: how to effectively and reliably block "some articles from search engines"?
I don’t know if this is a common problem. If a website hopes to satisfy the audience through rich articles, but is afraid of being judged as a collection site by search engines, then this is a real problem that must be faced. , critical, core, major issues related to the survival and development of the website.
I have been learning related knowledge recently. In my personal opinion, there are several ways to block search engines:
1. Use robots.txt
2. WP’s website can determine user characteristics (I thought of it after reading your blog post)
3. Encapsulate links through JS
4. Through redirection, such as short links, PHP background redirection, etc.
Compare the above methods,
The first method: robots.txt is like putting a seal on the door: "Hey, Spider, I have some content here that you are not allowed to search." This is the so-called gentleman's agreement. The search engine must have the ability to see what is inside your sealed door, but it will not include it. In order to determine whether a site has a large amount of collected content, the spider may have the motive of snooping.
This method has the lowest technical implementation cost and should be able to satisfy most situations. It seems that Baidu's ethics in this regard can be assured. For example, it does not index Taobao content, and it also hates 360's indexing of Baidu content.
A further problem with this method is:
In a website built by WP, how can you efficiently block "some articles from search engines"?
1. Add features to article titles: For example, add a special character to the title of each article. Is this method feasible? Is it possible to use disallow:*special characters* in robots.txt?
2. Tag identification of articles: This seems to be the most convenient at the operational level, but tags seem to be dynamic tags and cannot be filtered in robotx.txt?
3. Put the article into a specific directory: This robots.txt is easier to write, but how to operate it easily when managing WP article content?
The second method is like checking the ID card of the person entering the door. If the visitor is a search engine, then access is prohibited. This method is specific to WP, and its advantage is that it can be treated very differently. For example, Baidu has a tight attitude towards collection, but Google is different. Then some articles can be closed to Baidu and open to Google. Another big advantage is that judgment can be integrated into the WP environment, such as automating operations through plug-ins or themes.
The third method: It’s like changing a house number on the door. The search engine only knows to mechanically track the number on the house number, but the browser points the house number to another correct entrance through JS. However: Search engines may be getting better at analyzing JS, and judging from some of Google’s statements, search engines don’t like your content the same way for people and search engines.
This method is widely used to hide Taobao customer links. The validity period of this method is not very long, and the operation is more troublesome. It is more suitable for static individual pages, but is not suitable for the structure of databases such as WP to organize articles.
The fourth method: It’s like encrypting the house number. Only when you knock on the door (click) will it be replaced with the correct house number. Ordinary visitors will definitely click, but search engines will not simulate this action.
This method is relatively thorough and "safe". The disadvantages are:
1. Like the third method, the operation is somewhat complicated. It is suitable for static individual pages or partial links in the page, but is not suitable for the WP environment.
2. Too many redirects will consume the computing resources of the server. If a large number of articles have to be redirected at once, the server may be overwhelmed.
Implementation code
How do you specifically implement WordPress to hide specific articles from search engines? Without further ado, just upload the PHP code and put it into functions.php of the current theme to use it (save it as UTF-8 encoding):
// 需要说明的是,如果你的WordPress站点开启了页面缓存,此功能无效 function ludouse_add_custom_box() { if (function_exists('add_meta_box')) { add_meta_box('ludou_allow_se', '搜索引擎', 'ludou_allow_se', 'post', 'side', 'low'); add_meta_box('ludou_allow_se', '搜索引擎', 'ludou_allow_se', 'page', 'side', 'low'); } } add_action('add_meta_boxes', 'ludouse_add_custom_box'); function ludou_allow_se() { global $post; //添加验证字段 wp_nonce_field('ludou_allow_se', 'ludou_allow_se_nonce'); $meta_value = get_post_meta($post->ID, 'ludou_allow_se', true); if($meta_value) echo '<input name="ludou-allow-se" type="checkbox" checked="checked" value="1" /> 屏蔽搜索引擎'; else echo '<input name="ludou-allow-se" type="checkbox" value="1" /> 屏蔽搜索引擎'; } // 保存选项设置 function ludouse_save_postdata($post_id) { // 验证 if ( !isset( $_POST['ludou_allow_se_nonce'])) return $post_id; $nonce = $_POST['ludou_allow_se_nonce']; // 验证字段是否合法 if (!wp_verify_nonce( $nonce, 'ludou_allow_se')) return $post_id; // 判断是否自动保存 if (defined('DOING_AUTOSAVE') && DOING_AUTOSAVE) return $post_id; // 验证用户权限 if ('page' == $_POST['post_type']) { if ( !current_user_can('edit_page', $post_id)) return $post_id; } else { if (!current_user_can('edit_post', $post_id)) return $post_id; } // 更新设置 if(!empty($_POST['ludou-allow-se'])) update_post_meta($post_id, 'ludou_allow_se', '1'); else update_post_meta($post_id, 'ludou_allow_se', '0'); } add_action('save_post', 'ludouse_save_postdata'); // 对于设置不允许抓取文章和页面 // 禁止搜索引擎抓取,返回404 function do_ludou_allow_se() { // 本功能只对文章和页面有效 if(is_singular()) { global $post; $is_robots = 0; $ludou_allow_se = get_post_meta($post->ID, 'ludou_allow_se', true); if(!empty($ludou_allow_se)) { // 下面是爬虫Agent判断关键字数组 // 有点简单,自己优化一下吧 $bots = array( 'spider', 'bot', 'crawl', 'Slurp', 'yahoo-blogs', 'Yandex', 'Yeti', 'blogsearch', 'ia_archive', 'Google', 'baidu' ); $useragent = $_SERVER['HTTP_USER_AGENT']; if(!empty($useragent)) { foreach ($bots as $lookfor) { if (stristr($useragent, $lookfor) !== false) { $is_robots = 1; break; } } } // 如果当前文章/页面禁止搜索引擎抓取,返回404 // 当然你可以改成403 if($is_robots) { status_header(404); exit; } } } } add_action('wp', 'do_ludou_allow_se');
How to use
After successfully adding the above code to the functions.php of the current theme, we can use it normally, completely foolproof. On the editing page of WordPress backend articles and pages, we can see this check box at the bottom of the right column:
If the current article/page needs to be prohibited from being crawled by search engines, just check it. After checking, when this article/page is accessed by a search engine, a 404 status will be returned without any content. If you don’t like to return 404 to search engines and are worried that too many dead links will affect SEO, you can change the following in the code:
status_header(404); exit;
changed to:
echo "<meta name=\"robots\" content=\"noindex,noarchive\" />\n";
Come again:
add_action('wp', 'do_ludou_allow_se');
changed to:
add_action('wp_head', 'do_ludou_allow_se');
In this way, add the meta statement directly to the head part of the web page:
<meta name="robots" content="noindex,noarchive" />
Tell search engines not to index this page or display snapshots. It should be noted that the following code must be in header.php in your theme directory:
wp_head();
Set the article to only be viewed by search engines
Some articles are published only for SEO. I want these articles to be crawled only by search engines and cannot be viewed by ordinary visitors. How to do this in WordPress?
Implementation code
If page caching is not enabled on your WordPress site, this requirement is not difficult to achieve. We can refer to the code in hiding specific articles from search engines above and make slight modifications. Add the following php code in functions.php of the current theme and save it in UTF8 encoding:
// 给文章和页面的编辑页添加选项 function ludouseo_add_custom_box() { add_meta_box('ludou_se_only', '搜索引擎专属', 'ludou_se_only', 'post', 'side', 'low'); add_meta_box('ludou_se_only', '搜索引擎专属', 'ludou_se_only', 'page', 'side', 'low'); } add_action('add_meta_boxes', 'ludouseo_add_custom_box'); function ludou_se_only() { global $post; //添加验证字段 wp_nonce_field('ludou_se_only', 'ludou_se_only_nonce'); $meta_value = get_post_meta($post->ID, 'ludou_se_only', true); if($meta_value) echo '<input name="ludou-se-only" type="checkbox" checked="checked" value="1" /> 只允许搜索引擎查看'; else echo '<input name="ludou-se-only" type="checkbox" value="1" /> 只允许搜索引擎查看'; } // 保存选项设置 function ludouseo_save_postdata($post_id) { // 验证 if ( !isset( $_POST['ludou_se_only_nonce'])) return $post_id; $nonce = $_POST['ludou_se_only_nonce']; // 验证字段是否合法 if (!wp_verify_nonce( $nonce, 'ludou_se_only')) return $post_id; // 判断是否自动保存 if (defined('DOING_AUTOSAVE') && DOING_AUTOSAVE) return $post_id; // 验证用户权限 if ('page' == $_POST['post_type']) { if ( !current_user_can('edit_page', $post_id)) return $post_id; } else { if (!current_user_can('edit_post', $post_id)) return $post_id; } // 更新设置 if(!empty($_POST['ludou-se-only'])) update_post_meta($post_id, 'ludou_se_only', '1'); else delete_post_meta($post_id, 'ludou_se_only'); } add_action('save_post', 'ludouseo_save_postdata'); function do_ludou_se_only() { // 本功能只对文章和页面有效 if(is_singular()) { global $post; $is_robots = 0; $ludou_se_only = get_post_meta($post->ID, 'ludou_se_only', true); if(!empty($ludou_se_only)) { // 下面是搜索引擎Agent判断关键字数组 // 有点简单,自己优化一下吧 $bots = array( 'spider', 'bot', 'crawl', 'Slurp', 'yahoo-blogs', 'Yandex', 'Yeti', 'blogsearch', 'ia_archive', 'Google' ); $useragent = $_SERVER['HTTP_USER_AGENT']; if(!empty($useragent)) { foreach ($bots as $lookfor) { if (stristr($useragent, $lookfor) !== false) { $is_robots = 1; break; } } } // 如果不是搜索引擎,就显示错误信息 // 已登录的用户不受影响 if(!$is_robots && !is_user_logged_in()) { wp_die('您无权查看此文!'); } } } } add_action('wp', 'do_ludou_se_only');
How to use
After successfully adding the above code to the functions.php of the current theme, we can use it normally, completely foolproof. On the editing page of WordPress backend articles and pages, we can see this check box at the bottom of the right column:
If the current article/page needs to be prohibited from being crawled by search engines, just check it. After checking, the following error message will be displayed when this article/page is accessed by ordinary visitors (search engines and logged-in users will not be affected):