Home  >  Article  >  Backend Development  >  Briefly introduce the Baidu News Open Protocol XML document production method

Briefly introduce the Baidu News Open Protocol XML document production method

黄舟
黄舟Original
2017-03-14 16:06:162224browse

Open protocol overview Using this open protocol will bring more traffic to your website! The "Internet News Open Protocol" is a search engine news source inclusion standard formulated by Baidu NewsSearch. Websites can produce published news content into XML that follows this open protocol. Web pages in the format (independent of the original news release format) are provided for search engine indexing, and the news information released by the website is automatically and timely Overview of the open protocolUsing this open protocol will bring more traffic to your website!
The "Internet News Open Protocol" is the search engine news source inclusion standard formulated by Baidu News Search , the website can produce the published news content into an XML format web page that follows this open protocol (independent of the original news release form) for search engine indexing, and actively and timely inform the Baidu search engine of the news information published by the website.
The adoption of the "Internet News Open Protocol" is equivalent to the website's news being subscribed by search engines. Through the platform of Baidu, the world's largest Chinese search engine, netizens will be able to access it in a wider range and with higher frequency. News to your website, thereby bringing potential traffic to your website.

Open Protocol is very simple! You can use it easily with our help.
Open protocol content The picture below shows a web page in XML format produced in compliance with the "Internet News Open Protocol". The web page lists relevant information about news published by the website in a standard format.
XML web page example:

Briefly introduce the Baidu News Open Protocol XML document production methodXML tag description:
Those marked with an asterisk are required, and those marked without an asterisk are optional. . *f6249141b7466b2eaa7903c29c6b8d20——Mark the beginning and end of the entire XML file content. *6d88e32f12c595d0a92c0477538a6c33——Site address. *f23cbc012a16cf9c5773f9cfa7d6c5ad——
Email
of the person in charge. We will contact you at this address when necessary. *b468d8d4db106fadee2beda28bd37f72——Update period, in minutes. Search engines will follow this cycle to access the page, so that the news on the page will appear in Baidu News in a more timely manner. *5083cbefc9e5095dae6431462e2af988——Mark the beginning and end of each news information. The mark contains single news information, excluding news topics. *b2386ffb911b14667cb8f0f91ea547a7——News title.
*2cdf5bf648cf2f33323966d7f58a7f3f——The news URL address corresponds to a single news article one-to-one; if there are multiple URLs for the news in
pagination
, it is equivalent to multiple news articles. 8b55addfb40ddf4a384b1010d729e503——News contentIntroduction
. *28f128881ce1cdc57a572953e91f7d0f——Complete news text (only contains text, does not include HTML language and other other
characters). The purpose of this item is to make the news appear more and more accurately in the search results. *dc0870658837139040642baa5555a380——Relevant picture
in the news text, using absolute address. If the news article has no related pictures, it can be empty; if it contains multiple pictures, please reuse this tag. The purpose of this item is to display relevant images of this news article in the search results. 57b8c5035cd2c39d678d4faa860a5063——Headline image produced for news that may become headlines, using absolute address. 835bab4a15487168a884a7f648a001c5——One or more keywords reflecting the subject content of the news, separated by spaces. This item is for reference only, and the search results do not entirely depend on the content in this tag. c58a1130350e5f417b7f5c3a9765ab7e——News classification can follow the website’s own classification system, and it is best to use the first-level classification. 48fe722b397613e801e59f453d6c9330——News author, which can be an institution or an individual.
e02da388656c3265154666b7c71a8ddc——News source, that is, original media or other institutions.
*986e6b71e5a3a4a0e77dc3e4175cc787——News release time, consistent with the release time on the news HTML page. Please be accurate to the minute; if the publishing time of your website does not record hours and minutes, just provide the year, month and day.
Recommended time format: year, month, day, hour, minute, second 2005年11月09日10:37:00|Fri, 09 Nov 2005 10:37:00 GMT
Open protocol usage
Before use, You need to know the following points: Whether your website has become a Baidu news source or has not yet been included in Baidu News Search, you can use this open protocol. The content you provide using open protocols should all comply with the following "News Source Inclusion Standards". The "Internet News Openness Agreement" is only an assistance and useful supplement to the original news source collection method, rather than a complete replacement.
News source inclusion standards:
Baidu hopes to diversify news sources and encourage original news content. If it is a regular, legal media website with a large amount of valuable news content and can be updated in a timely manner, and the website server is stable and high-speed, it is in line with Baidu's basic principles of including news sources. Baidu News Search includes content including current affairs, entertainment, sports, finance, science, education, culture, social life and other news reports and media comments, market information and reviews of digital products, real estate, automobiles, etc., and trends and trends in various industries. , organizational work updates, etc., are Chinese information written or edited by professionals, excluding personal information, forums, blogs, advertisements, humorous jokes, emotional stories, erotica, photos, stills, celebrity profiles, recipes, downloads , Multimedia
and other types of Internet information in other languages. You shall bear all legal responsibilities for all content you provide, ensure the authenticity and legality of the content you provide, and shall not infringe the rights of any third party.
Let’s get started!

Step One: Create an XML file
Be sure to read the news source inclusion standards of Baidu News Search before creating an XML file, and
pay special attention
: 1. News source websites included in Baidu News Search must comply with and strictly abide by the country's "Internet News Information Service Management Regulations" and respect the copyrights of the creators and source websites during the news release and reprint process. 2. Website types that are not suitable for inclusion in Baidu News Search include: forums, blogs, company websites, etc.
3. Baidu News Search does not include personal information, advertisements, tenders, tutorials, humorous jokes, emotional stories, erotica, photos, stills, celebrity files, recipes, downloads, multimedia and other types of Internet information in other languages.
4. Baidu News Search hopes to include high-quality Chinese news, but does not include English and other non-Chinese news.
5. Please create an XML file according to the open protocol content published above.

other instructions:
Supported encoding formats include GB2312, GB18030, UTF-8, and BIG5. It is recommended to use GB18030 or UTF-8 format. You can put all the news released by the website in a certain period of time in one XML file, or you can put it in multiple XML files by channel or column. Please keep each XML file in a state of continuous automatic update
according to the update cycle. The update cycle can be adjusted at any time according to your needs. Each XML file can store at most 100 latest news releases, there is no need to save previous news. Please sort the released news by time, that is, the latest news is at the top, otherwise some news may be missed. XML tag content cannot contain any other codes except literal text. The special characters in the following table must be converted into escape characters defined by XML. Otherwise, an error will occur and the search engine will not be able to obtain the news on the page.

 

        

            

                字符

                转义后的字符

            

            

                HTML字符

                字符编码

            

            

                和(and)

                &

                &

                &

            

            

                单引号

                '

                '

                '

            

            

                双引号

                "

                "

                "

            

            

                大于号

                >

                >

                >

            

            

                小于号

                <

                &lt;

                &#60;

            

        

    

    转义字符中的"&"无需再转。

       建议您使用CDATA 部件。一个 CDATA 部件以"3005ee0b51d684587eb0f7123f28134a"标记结束。将包含代码或特殊字符的文本置于CDATA 部件内 ,就无需再对特殊字符进行转义。
第二步:验证XML文件
下面的地址提供了帮助您验证XML文件结构的多种工具:
http://www.php.cn/
http://www.php.cn/
通过验证的XML文件能够使您提供的信息更加标准,确保您发布的新闻信息不被搜索引擎遗漏。
第三步:提交XML网址
提交前请将XML
文件上传到您的网站服务器,将XML文件的URL地址及其它信息输入下面对应的方框内。搜索引擎将定向访问该URL地址,当URL地址发生变化时需要重新提交。
若您的网站符合新闻源收录标准,百度新闻搜索将对您提交的数据进行测试和观察一周。如果XML文件基本按照 《互联网新闻开放协议》的要求制作但存在问题,我们会根据XML页面上提供的邮件地址与您联系。
注意:
1、 我们将对您提交的XML文件进行审核,百度新闻搜索不保证一定能收录您提交的全部内容。
2、站点名、地址为必填项,同一站点一天之内最多提交5个不同的XML文件地址。
3、提交地址后请您注意查看弹出窗口中的信息,以确认是否提交成功。
第四步:查询XML文件状态
您可以在下面的方框中输入您提交的XML文件地址,查询该文件的处理进度和反馈信息。
注意:输入的地址必须完整,即与您提交时的地址完全一致。                                                

The above is the detailed content of Briefly introduce the Baidu News Open Protocol XML document production method. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn