Home  >  Article  >  WeChat Applet  >  Alarm troubleshooting for WeChat development

Alarm troubleshooting for WeChat development

Y2J
Y2JOriginal
2017-05-16 11:07:518279browse

Summary description

WeChat public platform has been opened to the publicInterfaceAlarm, when the number of failed attempts by the WeChat server to push messages to developers reaches a predetermined threshold, the alarm message will be sent to the designated WeChat alarm In the group (setting method: Public Platform -> Developer Center -> Interface Alarm), developers are asked to actively pay attention to the alarm, solve the fault immediately, and improve the service quality of the WeChat official account.

In order to better troubleshoot problems based on the examples at the end of the alarm information (openid and timestampstamp are provided), developers need to Plus detailed logs containing key information to help quickly locate problems.

There are currently two types of alarms:

1. General alarms, which all developers need to pay attention to.

Alarm troubleshooting for WeChat development

#2. If the public account third-party platform calls the police, you can only apply to become a developer of the public account third-party platform on the WeChat open platform (open.weixin.qq.com). You need to pay attention to this alarm.

Alarm troubleshooting for WeChat development

The following are examples of specific alarms and troubleshooting guidelines.

Alarm content description

Alarm content description:

a)appid: public account appid
b)nickname: public account nickname
c) Time: All alarms will provide the time when the abnormality first occurred. (Such as the time when the first timeout occurs, the time when the first response failure occurs)
d) Content: specific description of the error
e) Number of times: the number of failures
f) Error sample: in the error sample Notes some information to help troubleshoot the problem. For example: the IP and push message type of the developer that timed out for the first time. If the response fails, the error example will also indicate the developer's response when the first response failed.

Generally, through the IP, time, and message type provided by the alarm, the cause of the third-party problem can be quickly located.

Alarm example 1: Timeout alarm

Appid: wxxxxxx
Nickname: WxNickName
Time: 2014-12-01 20:12:00
Content: After the WeChat server pushed a message or event to the official account, the developer did not return
within 5 seconds: 1272 times in 5 minutes
Error example: [IP=203.205.140.29][Event= UnSubscribe]

This alarm indicates that when the WeChat server pushed the unfollow event to the developer, the developer did not return the result within 5 seconds. It happened 1272 times in the 5 minutes from 2014-12-01 20:12:00 to 2014-12-01 20:17:00. The first timeout occurred within 5 minutes was: 2014-12-01 20:12:00, the developer's IP was: 203.205.140.29, and the event type was an unfollow event.

Alarm example 2: Response failure

Appid: wxxxx
Nickname: WxNickName
Time: 2014-12-01 20:12:00
Content: After the WeChat server pushed a message or event to the official account, the response received was illegal
Number of times: 1320 times in 5 minutes
Error example: [Event=Click] [ip=58.248.9.218][response_length= 10][response_content=Error 500:]

This alarm means: When the WeChat server pushes the custom menu click event to the developer, the developer's The returned result is illegal. It happened 1320 times within 5 minutes from 2014-12-01 20:12:00 to 2014-12-01 20:17:00. The first response failure within these 5 minutes was: 2014-12-01 20:12:00, the developer's IP was: 58.248.9.218, the event type was a click menu event, and the length of the content returned by the third party was 10 bytes, the content is "Error 500:".

Alarm example 3: Connection timeout

Appid: wxxxx
Nickname: WxNickName
Time: 2015-02-04 20:13:09
Content: A timeout occurred when the WeChat server connected to the public account developer server. The timeout period is 5 seconds
Number of times: 7289 times in 5 minutes
Error example: [IP=180.150.190.135][Msg=Text]

This alarm means: When the WeChat server pushes text messages from fans to the developer, it cannot connect to the server address filled in by the developer. It occurred 7289 times within 5 minutes from 2015-02-04 20:13:09 to 2015-02-04 20:18:00. The first time a connection timeout occurred within these 5 minutes was: 2015-02-04 20:13:09, the developer's IP is: 180.150.190.135, and the event type is a message pushed by the user.

Troubleshooting methods for various alarms

1.DNS failure

This error means that the WeChat server failed to resolve DNS when pushing messages to developers. If you encounter this alarm, please confirm:

a) Whether the filled in URL and domain name are correct;
b) Whether the domain name has changed, if it has expired, Updatewait.

If the above two questions are not the ones, please contact the WeChat public platform.

2.Dns timeout

Currently there will be no such error.

3. Connection timeout

This error means that the WeChat server and the developer server did not successfully connect within 3 seconds. The alarm message will provide the time when the first connection failure occurred and the IP address of the connection. If this alarm occurs, the developer is asked to confirm:

a) Whether the IP is incorrect.
b) Whether the IP machine is overloaded and has too many connections.
c) If a third party provides server hosting, is there any fault with the hosting provider?
d) Whether the network operator is faulty.

4. Request timeout

The WeChat server pushes messages or events to the developer server, but the developer does not return within 5 seconds. When the request times out, the alarm message will provide the time when the request timeout occurred for the first time, the developer IP and the message type. Developers please confirm:

a) Whether the IP is wrong
b) Whether the IP receives the request of the message type given in the alarm message
c) Whether the request is processed The time is too long

5. Response failure

The developer does not reply to the message according to the reply message format in the wiki, or a network error occurs, an alarm will be issued if the response fails, and the alarm will be issued. The message will provide the time when the request response failed for the first time, the developer's IP, the message type and the response message content. The developer is asked to confirm:

a) Whether the IP is incorrect
b) Whether a network error occurred on the IP
c) Whether the business processing logic did not reply to the message in accordance with the wiki specifications, or entered abnormal logic.

6.MarkFail (automatic blocking)

The WeChat background will count the number of developers' failures in real time. When a large number of failures occur in pushing messages to developers, the WeChat server will automatically block the developer, stop pushing any messages within 1 minute, and send an alarm to the WeChat group. This alarm is the highest level alarm. When developers receive this alarm, please handle the background failure as soon as possible and restore services. In fact, before receiving this alarm, developers will inevitably receive alarms such as connection timeout, request timeout or response failure. Developers need to solve these faults immediately to avoid being blocked by the WeChat server and seriously affecting public account services!

7. Pushing component_verify_ticket timed out & 8. Pushing component_verify_ticket failed & 9. Pushing component message timed out & 10. Pushing component message failed

Only third-party platform developers with public accounts will respond to the above 4 alarms Received, other public account developers do not need to pay attention. Since the third-party platform for public accounts carries more public accounts, the service quality of the third-party platform for public accounts requires stricter requirements and alarms, so these four special events are reported separately. The specific problem finding method is the same as 4 and 5, so I won’t go into details here. For specific application and development implementation of the public account third-party platform, please go to the WeChat Open Platform (open.weixin.qq.com)

FAQ

1. How Troubleshooting DNS failure?

1. Ping test the domain name in the URL configured on your MP to confirm whether you can get the correct IP. If it cannot be obtained or is wrong, please check the configuration on your domain name hosting provider's management system.
2. If 1 can get the correct IP, but there is an alarm of DNS failure; please use DNS server 182.254.116.116 to test and verify again. Linux : dig @182.254.116.116 domain name; windows Modify the DNS server address in the network configuration, and then ping the domain name. If the IP obtained is incorrect or cannot be obtained, please contact the WeChat team.

2. How to solve the connection timeout problem?

1. Check whether there is a problem with the network environment.
(1) Use the public platform interface to obtain the IP of the WeChat callback server, api.weixin.qq.com/cgi-bin/getcallbackip?access_token=ACCESS_TOKEN,
(2) In your service Run a ping test to check the network quality from your server to the WeChat callback server. If you have network problems, please contact your server provider to resolve them.

2. Check the number of access layer server connections, load, nginx configuration, and the number of allowed connections. Check the nginx error log to see if there is a "Connection reset by peer" or "Connection timed out" error log. If so, it indicates that the number of nginx connections is too high and is overloaded.
3. It is recommended to build a testing tool to perform heartbeat checks on the system, and conduct real-time monitoring and alarming of system load, number of connections, number of processes, and processing time.
For nginx configuration, here are the official documents and a simple configuration introduction link. I hope it will be helpful: nginx.org/en/docs/, focusing on the connection number configuration, log configuration, etc. Some important configuration reference examples of nginx are as follows:

worker_processes  16;          //CPU核数
error_log  logs/error.log  info;   //错误日志log
worker_rlimit_nofile 102400;     //打开最大句柄数
events {
    worker_connections  102400;   //允许最大连接数
}
//请求日志记录,关键字段:request_time-请求总时间,upstream_response_time后端处理时 间
log_format  main  '$remote_addr  - $remote_user [$time_local] "$request" '
                 '$status $body_bytes_sent "$http_referer" '
                  '"$http_user_agent" "$http_x_forwarded_for" "$host"  "$cookie_ssl_edition" '
                 '"$upstream_addr"   "$upstream_status"  "$request_time"  '
                 '"$upstream_response_time" ';
   access_log  logs/access.log  main;

3. How to solve the request timeout problem?

Each module needs to have a complete log, which can find out the time-consuming information of each request in each module. With the information provided by WeChat alarm, it is easy to locate which server has the problem. Common reasons are:

1) The machine load is too high, which increases the time consumption.
2) The machine handles abnormally and messages are lost.
3) The machine handles abnormally. For machine processing exceptions, it is recommended to do so as soon as possible. Fix bugs. For machine abnormalities, please block the problematic machine as soon as possible. The load on the machine here is too high, simply provide a feasible solution. Option 1: Optimize performance and expand capacity. Check the load (cpu, memory, io, network, see the appendix for details), and adopt different optimization methods according to the specific performance bottlenecks. Option 2: Asynchronous processing. If the message pushed by the WeChat server cannot be processed in real time, the message can be stored first, and the success can be returned to the WeChat server first. The message can be processed later in the background. If you need to reply to the user message, you can call the Customer Service Message interface API. Reply to user message.

4. How to solve the access_token storage and usage problem?

Often third parties report that access_token causes service interruption. When troubleshooting the problem on the public platform, it is found that most third parties are frantically refreshing access_token, causing the access_token to exceed the interface frequency limit and become invalid. Here is a simpler access_token storage and usage solution.

1) The central control server calls the WeChat api regularly (1 hour is recommended), refreshes the access_token, and stores the new access_token into mysql (or other storage),
2 ) Other working servers obtain the access_token from mysql (or other storage) every time they call the WeChat api, and can cache in memory for a period of time (1 minute is recommended).

The public platform will ensure that after the access_token is refreshed, the old access_token can still be used within 5 minutes to ensure that the third party will not fail to call the WeChat API when updating the access_token. .

【Related Recommendations】

1. Special Recommendation: "php Programmer Toolbox" V0.1 version download

2. WeChat public account platform source code download

3. WeChat voting source code download

The above is the detailed content of Alarm troubleshooting for WeChat development. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn