Home >Backend Development >PHP Tutorial >WeChat Public Account Development Tutorial Part 6 - Revealing the Content Length Limit of Text Messages_PHP Tutorial

WeChat Public Account Development Tutorial Part 6 - Revealing the Content Length Limit of Text Messages_PHP Tutorial

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOriginal: 2016-07-20 11:12:181166browse

I believe many friends have encountered this problem: when the content of the text message sent is too long, WeChat will not respond. So what is the maximum length of text messages allowed by WeChat? How do we calculate the length of text? Why do some people still report that the maximum length of text messages supported by WeChat is more than 1,300? This article will completely clear everyone’s doubts.

The message length limit in the interface document is 2048

As you can see, it is very clear in the interface document: the length of the reply message content shall not exceed 2048 bytes. So why does WeChat not respond when many people test that the length of the message is more than 1,300 bytes? I think the problem is that some people don't understand how to calculate the number of bytes of text.

How to correctly calculate the number of bytes occupied by text

To calculate the number of bytes occupied by a text (string), the first thing that comes to mind is the getBytes() method of the String class. This method returns the byte array corresponding to the string. Then calculate the length of the array. Can get the number of bytes occupied by the string. For example:

public static void main(String []args)  {
	// 运行结果：4
	System.out.println("柳峰".getBytes().length);
}

In the above example, the number of bytes occupied by two Chinese characters is calculated to be 4, that is, one Chinese character occupies 2 bytes. Is this really the case? In fact, we have overlooked a problem: for different encoding methods, the number of bytes occupied by Chinese is also different! What on earth is going to happen? In the above example, we did not specify the encoding method, so the default encoding method of the operating system will be used. Let’s first look at the three conclusions I came to:

1) If the above example is run on an operating system platform with the default encoding method of ISO8859-1, the calculation result is 2;

2) If the above example is run on an operating system platform whose default encoding is gb2312 or gbk, the calculation result is 4;

3) If the above example is run on an operating system platform where the default encoding is utf-8, the calculation result is 6;

If this is true, does it mean that the String.getBytes() method uses gb2312 or gbk encoding by default on our system platform? Let’s look at another example:

public static void main(String []args) throws UnsupportedEncodingException  {
	// 运行结果：2
	System.out.println("柳峰".getBytes("ISO8859-1").length);
	// 运行结果：4
	System.out.println("柳峰".getBytes("GB2312").length);
	// 运行结果：4
	System.out.println("柳峰".getBytes("GBK").length);
	// 运行结果：6
	System.out.println("柳峰".getBytes("UTF-8").length);
}

Does this example well prove the three conclusions I gave above? That is to say, when using the ISO8859-1 encoding method, each Chinese/English character only occupies one byte; when using the GB2312 or GBK encoding method, one Chinese character occupies two bytes; and when using the UTF-8 encoding method, one Chinese character occupies only one byte. three bytes.

The encoding method used by the WeChat platform and the calculation of the number of bytes occupied by the string

So, what encoding method should be used when returning messages to the WeChat server? Of course it is UTF-8, because we have used the following code in the doPost method to avoid Chinese garbled characters:

// 将请求、响应的编码均设置为UTF-8（防止中文乱码）
request.setCharacterEncoding("UTF-8");
response.setCharacterEncoding("UTF-8");

To verify what I said, I wrote an example to test:

private static String getMsgContent() {
	StringBuffer buffer = new StringBuffer();
	// 每行70个汉字，共682个汉字加1个英文的感叹号
	buffer.append("不知道什么时候开始喜欢这里每个夜里都会来这里看你你长得多么美丽叫我不能不看你看不到你我就迷失了自己好想牵你的手走过风风雨雨有什么困难我都陪你");
	buffer.append("不知道什么时候开始喜欢这里每个夜里都会来这里看你你长得多么美丽叫我不能不看你看不到你我就迷失了自己好想牵你的手走过风风雨雨有什么困难我都陪你");
	buffer.append("不知道什么时候开始喜欢这里每个夜里都会来这里看你你长得多么美丽叫我不能不看你看不到你我就迷失了自己好想牵你的手走过风风雨雨有什么困难我都陪你");
	buffer.append("不知道什么时候开始喜欢这里每个夜里都会来这里看你你长得多么美丽叫我不能不看你看不到你我就迷失了自己好想牵你的手走过风风雨雨有什么困难我都陪你");
	buffer.append("不知道什么时候开始喜欢这里每个夜里都会来这里看你你长得多么美丽叫我不能不看你看不到你我就迷失了自己好想牵你的手走过风风雨雨有什么困难我都陪你");
	buffer.append("不知道什么时候开始喜欢这里每个夜里都会来这里看你你长得多么美丽叫我不能不看你看不到你我就迷失了自己好想牵你的手走过风风雨雨有什么困难我都陪你");
	buffer.append("不知道什么时候开始喜欢这里每个夜里都会来这里看你你长得多么美丽叫我不能不看你看不到你我就迷失了自己好想牵你的手走过风风雨雨有什么困难我都陪你");
	buffer.append("不知道什么时候开始喜欢这里每个夜里都会来这里看你你长得多么美丽叫我不能不看你看不到你我就迷失了自己好想牵你的手走过风风雨雨有什么困难我都陪你");
	buffer.append("不知道什么时候开始喜欢这里每个夜里都会来这里看你你长得多么美丽叫我不能不看你看不到你我就迷失了自己好想牵你的手走过风风雨雨有什么困难我都陪你");
	buffer.append("不知道什么时候开始喜欢这里每个夜里都会来这里看你你长得多么美丽叫我不能不看你看不到你我就迷失了自己好想牵!");
	return buffer.toString();
}

public static void main(String []args) throws Exception  {
	// 采用gb2312编码方式时占1365个字节
	System.out.println(getMsgContent().getBytes("gb2312").length);
	// 采用utf-8编码方式时占2047个字节
	System.out.println(getMsgContent().getBytes("utf-8").length);
}

The content returned by the getMsgContent() method is the longest that WeChat’s text message can support. That is, when using UTF-8 encoding, the text message content supports up to 2047 bytes, which is what is stated in the WeChat public platform interface document. The reply message content length does not exceed 2048 bytes, even if it is equal to 2048 bytes, you can try to add an extra English symbol to the content in the getMsgContent() method. At this time, WeChat will not respond.

At the same time, we also found that if we use gb2312 encoding to calculate the number of bytes of the text returned by the getMsgContent() method, the result is 1365. This is why many friends say that WeChat’s text message maximum length seems to only support 1300 Multi-byte is not the 2048 bytes mentioned in the interface document. In fact, the encoding method is ignored, and the getBytes() method of the String class is simply used instead of the getBytes("utf-8") method to calculate the occupancy. Number of bytes.

Encapsulation of the calculation method of the number of bytes occupied by utf-8 encoding in Java

/**
 * 计算采用utf-8编码方式时字符串所占字节数
 * 
 * @param content
 * @return
 */
public static int getByteSize(String content) {
	int size = 0;
	if (null != content) {
		try {
			// 汉字采用utf-8编码时占3个字节
			size = content.getBytes("utf-8").length;
		} catch (UnsupportedEncodingException e) {
			e.printStackTrace();
		}
	}
	return size;
}

Okay, that’s it for this chapter. I think what everyone has learned through this article is not only the number 2047, but also a new understanding of character encoding methods.

Statement：

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Previous article：Understanding PHP garbage collection mechanism_PHP tutorialNext article：Understanding PHP garbage collection mechanism_PHP tutorial

See more

WeChat Public Account Development Tutorial Part 6 - Revealing the Content Length Limit of Text Messages_PHP Tutorial

Related articles