Home > Article > CMS Tutorial > How to use wordpress automatic collection plug-in
WP-AutoPost is currently the most useful WordPress automatic collection and publishing plug-in. Its biggest feature is that it can collect content from any website and automatically publish it to your WordPress site.
Unlike most other WordPress collection plug-ins, which can only collect based on feeds, there are big disadvantages in using feed collection. First, you must find the full-text feed. However, there are very few full-text feeds online now, and most of them can only be collected. When it comes to article abstracts, even if you use Feed to collect article abstracts, you still need to click on the link to view the original text, which is equivalent to providing external links to other websites.
WP-Robot, which is widely used as an English garbage station, only has more than 20 collection sources, and the source of articles is relatively single and limited.
However, WP-AutoPost does not have the above disadvantages. It can truly collect the content of any website and automatically publish it. The collection process is completely automatic without manual intervention and provides content. Filtering, HTML tag filtering, keyword replacement, automatic linking, automatic tagging, automatic downloading of remote images to the local server, automatic addition of article prefixes and suffixes, and the ability to use the Microsoft translation engine to automatically translate collected articles into various languages for publication.
WP-AutoPost Chinese free download address: https://www.xuewangzhan.net/cj/11379.html (Official website address: http://wp-autopost.org/zh)
1. Installing WP-AutoPost
is the same as installing other WordPress plug-ins. You can directly upload it to the plug-in directory, activate it and use it without any additional settings or code modifications.
2. Create a collection task
After clicking "New Task", enter the task name to create a new task. After creating the new task, you can add it to the task list After viewing the task, you can make more settings for the task.
3. Basic settings function
Under the basic settings tab, you can The settings are as follows:
Task name: You can modify the task name. Category directory: The category directory where the articles collected by this task are published. Author: The published author of the articles collected by this task must be a registered user in WordPress. Update time interval: interval How often to check whether there are new articles under this collection task that can update the character set: Collect the character set encoding of the target website. The default is UTF8. If the character set encoding of the target web page is not UTF8, the captured web page will be garbled. Set the correct Character set can solve this problem (how to set the character set correctly) Download remote pictures: If the articles collected under this task contain pictures, you can choose whether to download the remote pictures to the local server. If you choose to download the remote pictures, you can further choose whether to download them. Save the image information to the WordPress media library. Automatic tags: Choose whether to use automatic tags. Tag list: After using automatic tags, if the article contains keywords in the list, tags will be automatically added to match the complete words: This setting is valid for English articles and Chinese. Please do not enable this setting for articles
4. Article source settings
Under this tab we need to set the article list URL of the article source and the matching rules of specific articles
We take the collection of "Sina Internet News" as an example. The article list URL is http://roll.tech.sina.com.cn/internet_worldlist/index.shtml. Therefore, enter this URL in the manually specified article list URL. That's it, as shown below:
After that, you need to set the matching rules for the specific article URL under the article list URL
Related recommendations: "WordPress Tutorial》
5. Article URL matching rules
The setting of article URL matching rules is very simple and does not require complicated settings. Two matching modes are provided and can be used URL wildcard matching, you can also use CSS selectors for matching, usually using URL wildcard matching is simpler.
1. Use URL wildcard matching
By clicking on the article on the list URL http://roll.tech.sina.com.cn/internet_worldlist/index.shtml, we can find each article The URLs of the articles all have the following structure
http://tech.sina.com.cn/i/2013-06-27/16328485884.shtml
Therefore, the changing numbers in the URL or Just replace the letters with wildcard characters (*), such as: http://tech.sina.com.cn/i/(*)/(*).shtml
2. Use CSS selectors for matching
Use CSS selector to match, we only need to set the CSS selector of the article URL (I don’t know what the CSS selector is, learn how to set the CSS selector in one minute), by viewing the list URL http://roll The source code of .tech.sina.com.cn/internet_worldlist/index.shtml can be easily set up. Find the code of the hyperlink of the specific article under the list URL, as shown below:
You can see that the hyperlink a tag of the article is inside the tag with class "contList", so the CSS selector of the article URL only needs to be set to .contList a, as shown below:
After the settings are completed, if you don’t know whether the settings are correct, you can click the test button in the picture above. If the settings are correct, all article names and corresponding web page addresses under the list URL will be listed, as follows Shown:
6. Article crawling settings
Under this tab, we need to set the article title and article content There are two ways to set the matching rules. It is recommended to use the CSS selector method, which is simpler and more accurate. (I don’t know what a CSS selector is, so I’ll learn how to set it up in a minute)
We only need to set the article title CSS selector and article content CSS selector to accurately capture the article title and article content. .
In the article source settings, we take the collection of "Sina Internet News" as an example. Here we will still use this example to explain, by viewing the list URL http://roll.tech.sina.com.cn/internet_worldlist/ You can easily set the source code of an article under index.shtml. For example, we can check a specific article http://tech.sina.com.cn/n/i/2013-06-10/06308430630.shtml The source code is as follows:
You can see that the article title is inside the tag with the id "artibodyTitle", so the article title CSS selector only needs to be set to # artibodyTitle is enough;
Similarly, find the relevant code of the article content:
You can see that the article content is inside the tag with the id of "artibody" , so the article content CSS selector only needs to be set to #artibody; as shown below:
After the setting is completed, if you don’t know whether the setting is correct, you can click the test button. Enter the test address. If the settings are correct, the article title and article content will be displayed for easy checking of settings
## 7. Capture the article pagination content
If the article content is too long and there are multiple pages, the entire content can also be captured. In this case, you need to set the article page link CSS selector and find the page link by viewing the source code of the specific article URL, such as a certain article. The paging link code is as follows: You can see that the paging link A tag is inside the tag with class "page-link"Therefore, the article paging link Just set the CSS selector to .page-link a, as shown below: If you check Also paginate when publishing, the published article will also be paginated. If your WordPress theme does not support the tag, please do not check it.8. Article content filtering function
Article content filtering function can filter out content that you do not want to publish in the text (such as advertising codes, copyright information, etc.). Set two keywords and delete the content between the two keywords. Keyword 2 can be empty, which means that all content after keyword 1 will be deleted. As shown below, after we crawled the article through testing, we found that there was content in the article that we did not want to publish. We switched to HTML display, found the HTML code of the content, and set two keywords to filter out the content. content. If you need to filter out multiple content, you can add multiple sets of settings.9. HTML tag filtering function
HTML tag filtering function can filter out hyperlinks (such as a tags) in collected articles.
The above is the detailed content of How to use wordpress automatic collection plug-in. For more information, please follow other related articles on the PHP Chinese website!