search
HomeBackend DevelopmentPHP TutorialHow to obtain WeChat public account article page
How to obtain WeChat public account article pageMay 17, 2018 am 09:45 AM
articlemethodObtain

Let me analyze how to use PHP to write a method of collecting WeChat public account articles and explain the code in detail. Friends who need it can learn from it.

There are several problems in collecting historical messages of public accounts through Sogou search:

1. There is a verification code;

2. The historical message list only has the latest 10 group messages;

3. The article address has a validity period;

4. It is said that batch collection requires changing IP;

There are no such problems through the method in my previous article, although the collection system The construction is not as simple as traditional collectors, just write the rules and crawl them. However, the efficiency of batch collection after being set up once is still acceptable. Moreover, the collected article addresses are permanently valid, and all historical messages of a public account can be collected.
Let’s start with the link address of a public account article:

1. Copy the link address from the menu in the upper right corner of WeChat:

http:/ /mp.weixin.qq.com/s/fF34bERZ0je_8RWEJjoZ5A

2. The address obtained from the historical message list:

http://mp. weixin.qq.com/s?__biz=MjM5NDAwMTA2MA==&mid=2695729619&idx=1&sn=8be0b6bd0210cee0d492ebdf20f7371f&chksm=83d74818b4a0c10ef286b33bb7deb73226125f866ddb5 b2781166066a69afef3705eabdb3b85&scene=4#wechat_redirect

##3. Complete real address:

https://mp.weixin.qq.com/s?__biz=MjM5NDAwMTA2MA==&mid=2695729619&idx=1&sn=8be0b6bd0210cee0d492ebdf20f7371f&chksm=83d74818b4a0c10ef286b33bb7deb7322612 5f866ddb5b2781166066a69afef3705eabdb3b85&scene=37&key=c81d77271180a0e6ce32be2d9dcaa2a7436aeba2c1d47a20d02194d1c944a8286a8eded93495eeadd05da412bbfaa 638a379750aeaa4cf5c00e4d7851c5710d9b9736b80e3c72770a57a515c23ff2400&ascene=3&uin=MzUyOTIyNQ==&devicetype=iOS10.1.1&version=16050120&nettype =WIFI&fontScale=100&pass_ticket=FGRyGfXLPEa4AeOsIZu7KFJo6CiXOZex83Y5YBRglW4=&wx_header=1


The above three addresses are the addresses of the same article. If obtained from different locations, three completely different results will be obtained.

Like the historical message page, WeChat has a mechanism for automatically supplementing parameters. The first address is obtained by copying the link and seems to be a disguised encoding. In fact, it’s useless and we won’t consider it. The second address is the link address obtained from the json article list of historical messages through the method introduced in the previous article. We can save this address to the database. Afterwards, the article content can be obtained from the server through this address. After the parameters are added to the third link, the purpose is to allow the reading js in the article page to obtain the json result of the reading likes. In the method of our previous article, the article page is opened and displayed by the client. Because of these parameters, the js in the article page automatically obtains the reading volume, so we can obtain the reading volume of this article through the proxy service. .

The content of this article is to study in detail how to obtain article content and other useful information based on the method introduced in the previous article of this column.

(list of articles saved in my database, some fields)

1. Get the source code of the article:

You can read the article source code into a variable through the PHP function file_get_content(). Since the source code of the WeChat article can be opened from the browser, I will not paste it here to avoid wasting page space.

<?
//$content_url 变量的值为文章地址
$html = file_get_contents($content_url);
?>

2. Useful information in the source code:

1) Original content:

The original content is contained in a

tag and is obtained through the php code:

<?
preg_match_all("/id=\"js_content\">(.*)<script/iUs",$html,$content,PREG_PATTERN_ORDER);
$content = "<p id=&#39;js_content&#39;>".$content[1][0];
?>

The beginning of the regular pattern identifies

, and the end identifies

Also note: This matching rule may change after a period of time. This article will be kept updated as much as possible. If you make a collection system based on my article and it fails one day, don’t forget to come back and check if the article has been updated.

2) Content processing:


Through the above method, we obtained the html of the article content, but after you display the article content, you will find that the pictures and videos cannot be displayed normally. . Because this HTML still needs some processing:

First of all, the picture, the src attribute in the How to obtain WeChat public account article page tag in the WeChat article has all been replaced with the src attribute. It will only be replaced when displayed. So we also have two options, replace the source code directly, or use js to replace it during display. Let me first introduce the method of directly replacing html:

<?
//$content变量的值是前面获取到的文章内容html
$content = str_replace("src","src",$content);
?>

然后是视频,视频的显示不正常,经过长期测试后发现只要替换一个页面地址就能解决,过程就不说了,直接说结果:

<?
//$content变量的值是前面获取到的文章内容html
$content = str_replace("preview.html","player.html",$content);
?>

通过这两个替换之后,文章内容html中的图片和视频就都正常了。

3) 公众号相关信息:

通过本专栏之前的文章,介绍了我们使用微信客户端,任意打开一个公众号的历史消息页之后。系统从数据库中识别biz的值,发现数据库中没有记录,就会插入一条新的纪录。之后的采集队列就会定期根据这个biz来获取这个公众号的历史消息列表。

但是我们只获得了这个公众号的biz,公众号的名称,头像这两个重要信息还是没有获取到。主要原因是历史消息页面中没有这两个信息。但是我们可以从文章页面中获取到。

在微信文章页面html的底部,有一些js的变量赋值的代码,通过正则匹配之后我们就可以获得这两个公众号的信息:

<?
//$html变量的值是前面获取到的文章全部html
preg_match_all(&#39;/var nickname = \"(.*?)\";/si&#39;,$html,$m);
$nickname = $m[1][0];//公众号昵称
preg_match_all(&#39;/var round_head_img = \"(.*?)\";/si&#39;,$html,$m);
$head_img = $m[1][0];//公众号头像
?>

通过这两个正则匹配,我们就能获取到公众号的头像和昵称,然后根据文章地址中的biz,可以保存到对应的微信号数据表中。

3、文章的保存和处理

前面的代码已经将文章内容获取到变量中了。如何保存其实每个人也许都有自己的想法。我这里介绍一下我的保存内容的方法:

将文章内容的html以数据库id为文件名保存成html文件,以biz字段为目录。

<?
$dir = "./".$biz."/";
$filename = $dir.$id.".html";
if(!is_dir($dir)) {
  mkdir($cache_dir);
  chmod($cache_dir,0777);
}
$file = fopen($filename, "w");
fwrite($file, $content);
fclose($file);
?>

以上代码是一个标准的php建立文件夹保存文件的代码,大家可以根据自己的实际情况安排保存方法。

在这之后我们就可以在自己的服务器上得到一个html文件,内容就是公众号的文章内容。我们可以从浏览器中打开看一下。这时你也许会发现图片防盗链了!无法正常显示!包括数据库中保存的文章封面图,公众号的头像都是防盗链的。

别急,这个问题很好解决,只需要将图片也保存到自己的服务器,无非是将来会占用自己的服务器空间和带宽。

图片防盗链的原理是当图片在网页中显示的时候,图片服务器会检测到引用这张图片的服务器域名,当发现服务器域名不包含http://qq.com或http://qpic.cn的时候就会被替换成防盗链图片。

但是如果检测不到引用页面的域名就会正常显示,所以我们通过php的函数file_get_content()就可以将图片的二进制代码获取过来,然后根据自己的想法起个文件名保存到自己的服务器上。在这里再介绍一个保存图片的方法,我目前使用了腾讯云的“万象优图”,通过它们提供的api将图片保存到云空间,这样的好处是读取图片时直接在图片的链接地址加上希望得到的图片尺寸大小参数,就可以直接得到一张缩略图。比存在自己的服务器方便得多。阿里云也应该有同样的产品,好像名叫对象存储。

另外,我采集公众号内容的目的是制作成一个新闻app,在app中将html代码显示出来之后,因为app同样没有域名,防盗链服务器也同样不会认为图片被盗链了。这样就可以直接显示图片出来。

以上就是我总结的公众号文章内容的采集与存储方法,希望能够帮到你。

相关推荐:

php微信生成微信公众号二维码扫描进入公众号带参数

php创建微信公众号管理系统

thinkphp5微信公众号token认证

The above is the detailed content of How to obtain WeChat public account article page. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
演示win7调整屏幕亮度的方法演示win7调整屏幕亮度的方法Jul 08, 2023 pm 07:49 PM

不同的电脑系统在调整屏幕亮度的操作方法上会有些不同,最近就有使用win7系统的网友不知道win7怎么调整屏幕亮度,看久了电脑眼睛比较酸痛。下面小编就教下大家win7调整屏幕亮度的方法。具体的操作步骤如下:1、点击win7电脑左下角的“开始”,在弹出的开始菜单中选择“控制面板”打开。2、在打开的控制面板中找到“电源选项”打开。3、也可以用鼠标右键电脑右下角的电源图标,在弹出的菜单中,点击“调整屏幕亮度”,如下图所示。两种方法都可以用。4、在打开的电源选项窗口的最下面可以看到屏幕亮度调整的滚动条,直

win10监控摄像头打开照片的方法win10监控摄像头打开照片的方法Jul 10, 2023 pm 09:41 PM

如果我们手头没有手机,只有电脑,但我们必须拍照,我们可以使用电脑内置的监控摄像头拍照,那么如何打开win10监控摄像头,事实上,我们只需要下载一个相机应用程序。打开win10监控摄像头的具体方法。win10监控摄像头打开照片的方法:1.首先,盘快捷键Win+i打开设置。2.打开后,进入个人隐私设置。3.然后在相机手机权限下打开访问限制。4.打开后,您只需打开相机应用软件。(如果没有,可以去微软店下载一个)5.打开后,如果计算机内置监控摄像头或组装了外部监控摄像头,则可以拍照。(因为人们没有安装摄

基于Java的机器视觉实践和方法介绍基于Java的机器视觉实践和方法介绍Jun 18, 2023 am 11:21 AM

随着科技的不断发展,机器视觉技术在各个领域得到了广泛应用,如工业自动化、医疗诊断、安防监控等。Java作为一种流行的编程语言,其在机器视觉领域也有着重要的应用。本文将介绍基于Java的机器视觉实践和相关方法。一、Java在机器视觉中的应用Java作为一种跨平台的编程语言,具有跨操作系统、易于维护、高度可扩展等优点,对于机器视觉的应用具有一定的优越性。Java

win7怎么调屏幕亮度的两种简单方法win7怎么调屏幕亮度的两种简单方法Jul 08, 2023 pm 06:33 PM

目前有很多屏幕亮度调整软件,我们可以通过使用软件进行快速调整或者通过显示器上自带的亮度功能进行调整。以下是详细的Win7屏幕亮度调整方式,您可以通过教程中的方法进行快速调整即可。Win7系统电脑怎么调节屏幕亮度教程:1、依次点击“计算机—右键—控制面板”,如果没有也可以在搜索框中进行搜索。2、点击控制面板下的“硬件和声音”,或者点击“外观和个性化”都可以。3、点击“NVIDIA控制面板”,有些显卡可能是AMD或者Intel的,请根据实际情况选择。4、调节图示中亮度滑块即可。5、还有一种方法,就是

Go 语言中的方法是怎样定义和使用的?Go 语言中的方法是怎样定义和使用的?Jun 10, 2023 am 08:16 AM

Go语言是近年来备受青睐的编程语言,因其简洁、高效、并发等特点而备受开发者喜爱。其中,方法(Method)也是Go语言中非常重要的概念。接下来,本文就将详细介绍Go语言中方法的定义和使用。一、方法的定义Go语言中的方法是带有接收器(Receiver)的函数,它是一个与某个类型绑定的函数。接收器可以是值类型或者指针类型。用于接收者的参数可以在方法名

PHP文件下载方法及常见问题解答PHP文件下载方法及常见问题解答Jun 09, 2023 pm 12:37 PM

PHP是一个广泛使用的服务器端编程语言,它的许多功能和特性可以将其用于各种任务,包括文件下载。在本文中,我们将了解如何使用PHP创建文件下载脚本,并解决文件下载过程中可能出现的常见问题。一、文件下载方法要在PHP中下载文件,我们需要创建一个PHP脚本。让我们看一下如何实现这一点。创建下载文件的链接通过HTML或PHP在页面上创建一个链接,让用户能够下载文件。

图文详解如何下载win10系统方法图文详解如何下载win10系统方法Jul 16, 2023 pm 01:25 PM

如今微软的Windows系统已经更新换代到了Windows10版本。很多以前还在使用Windows7系统的用户都想体验这个新版本Windows10系统。下面小编就来说说如何下载win10系统下载的方法,大家快来看看。1、首先下载一个小白重装系统软件,然后点击在线重装,下载win10系统。2、然后就开始系统镜像的下载了。3、系统镜像下载完成就是环境部署了。然后win10系统就下载完成啦。4、重启之后开始安装系统,安装完成就能进入桌面咯。以上就是如何下载win10系统的方法介绍啦,希望能帮助到大家。

Vue 中的 createApp 方法是什么?Vue 中的 createApp 方法是什么?Jun 11, 2023 am 11:25 AM

随着前端开发的快速发展,越来越多的框架被用来构建复杂的Web应用程序。Vue.js是流行的前端框架之一,它提供了许多功能和工具来简化开发人员构建高质量的Web应用程序。createApp()方法是Vue.js中的一个核心方法之一,它提供了一种简单的方式来创建Vue实例和应用程序。本文将深入探讨Vue中createApp方法的作用,其如何使用以及使用时需要了解

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Hot Tools

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),