Home  >  Article  >  Backend Development  >  Dewu client live broadcast room APM stress testing practice

Dewu client live broadcast room APM stress testing practice

WBOY
WBOYforward
2023-04-12 21:01:012106browse

1. Background

With the rapid development of the live broadcast industry, more and more companies are involved in this field. The stability and user experience of the live broadcast room have become important factors in the competition of live broadcast platforms. . However, since the live broadcast room involves many complex technical links, such as video transmission, network communication, data processing, etc., the performance stress test of the live broadcast room is particularly important. In the practice of stress testing in client live broadcast rooms, APM stress testing technology is a commonly used performance testing method. Through real-time monitoring and diagnosis of application performance, performance bottlenecks can be quickly located and solved, and the stability of the live broadcast room can be improved. and user experience.

The importance of APM stress testing

  1. Detecting the stability of the system: APM stress testing can help testers evaluate the performance of the live broadcast room under high concurrency conditions and stability to ensure that the system operates properly and does not crash or malfunction.
  2. Improve user experience: A high APM value usually means that the live broadcast room can handle more operations smoothly, thus improving the user experience. If the APM value is low, it may cause users to encounter freezes and delays in the live broadcast room, affecting the user experience.
  3. Discover system bottlenecks: APM stress testing can help testers and developers discover system bottlenecks and problems, so that targeted optimization and improvements can be made. For example, if a problem with database read and write performance is discovered during the APM stress test, the system performance can be improved by upgrading the database or taking other optimization measures.
  4. Optimize system performance: Through APM stress testing, developers can identify system performance issues and perform targeted optimizations. For example, load balancing technology can be used to disperse traffic, caching technology can be used to reduce database load, or asynchronous processing can be used to improve the concurrency capability of the system.

It can be seen that APM stress testing is very important to ensure the stability of the live broadcast room, improve user experience, discover system bottlenecks and optimize system performance.

2. Common stress testing methods in live broadcast rooms

  1. Load test: By simulating a large number of users accessing the live broadcast room, test the performance of the live broadcast room under high concurrency conditions Performance and stability. You can use tools such as JMeter or LoadRunner to simulate user requests to evaluate the performance of the live broadcast room under different loads.
  2. Bandwidth test: The live broadcast room needs to ensure sufficient bandwidth to support the real-time transmission of high-definition video, so a bandwidth test is required to ensure that the live broadcast room has sufficient bandwidth. You can use network speed test tools to evaluate the actual bandwidth and stability of your bandwidth.
  3. Performance test: Test the performance of the live broadcast room in different scenarios by simulating user access in different scenarios, such as watching live broadcasts at the same time and sending barrages at the same time. You can use performance testing tools such as WebLOAD to simulate concurrent requests to evaluate the performance of the live broadcast room in different scenarios.
  4. Security testing: The live broadcast room needs to ensure the security of user information and privacy, so security testing is required to ensure that there are no security loopholes in the live broadcast room. You can use tools such as Burp Suite to conduct penetration testing to evaluate the security of the live broadcast room.
  5. Reliability test: By simulating different faults and abnormal situations, test the performance and recovery ability of the live broadcast room under abnormal circumstances. Tools such as Chaos Monkey can be used to simulate abnormal situations to evaluate the reliability and recovery capabilities of the live broadcast room.

In summary, through stress testing methods such as load testing, bandwidth testing, performance testing, security testing and reliability testing, the performance and stability of the live broadcast room can be comprehensively evaluated. safety, security and reliability to ensure that the live broadcast room can meet the needs and expectations of users.

#The main stress testing methods used in Dewu Live Broadcast Room are load testing and performance testing.

3. Implementation method

First of all, the goal of our stress test is [IM performance stress test based on the live broadcast room]. The main purpose of the stress test is to monitor When a live broadcast room on the client receives a large number of IM messages for a long time, will performance problems such as lag, crash, or OOM occur? Run a round of stress testing before each release to expose performance problems in the live broadcast room offline in advance to prevent performance problems from being brought online.

In terms of specific stress testing methods, we hope to meet the following conditions:

  1. Try to cover as many IM message types as possible
  2. The stress test is highly automated, eliminating more manual operation trouble
  3. Low maintenance cost
  4. Stress testing does not rely on the server as much as possible and can directly implement message stress testing on the local side

Based on the above requirements, while exploring the method of stress testing, our live broadcast business group has probably gone through the following three stages:

4. Stress testing stage

4.1 Chapter The first stage

The first stage of the live broadcast room stress test adopts a relatively simple method. A script is used to simulate users sending comments, likes, etc. IM to the room that needs to be stress tested. You need to write the corresponding python code yourself and send the corresponding IM message to a live broadcast room. The following is part of the Python script:

class APIUtils:
""" 仅适用于测试环境 """


@staticmethod
def token(user_id: int):
resp = requests.get('https://xxxx.com', params={'user_id': user_id})
return resp.json().get('token')


@staticmethod
def change_rc_im(user_id: int):
try:
im_info = requests.post(
'http://xxxx.com',
headers={'userId': '1'},
data={'kolUserId': user_id}
)
im_id = im_info.json().get('data', {}).get('list', [{}])[0].get('id', 0)
requests.post(
'http://xxxx.com',
headers={'userId': '1'},
data={'kolUserId': user_id, 'id': im_id}
)
except:
pass


time.sleep(3)


data = {
"startTime": int(time.time()) + 1,
"endTime": int(time.time()) + 600 * 6,
"kolUserId": user_id,
"imSwitch": 1,
"id": 0
}
requests.post('xxxx.com',
headers={'userId': '1'}, data=data)


@staticmethod
def get_topic(user_id: int, room_id: int):
""" 获取房间号 """
headers = {
'POIZON-USERID': str(user_id),
'POIZON-ISGUEST': 'false',
'platform': 'iPhone',
'v': '4.78.0'
}
try:
resp = requests.get('xxxx.com',
headers=headers, params={'roomId': room_id})
return resp.json().get('data').get('room').get('imInfo').get('chatRoomId')
except Exception as e:
raise e

Main process As shown below:

Dewu client live broadcast room APM stress testing practice


## The pressure test implemented in this way is relatively simple, and It can cover some important IM messages, but it also has several obvious shortcomings:

  1. To pressure test a live broadcast room, you need to know the room ID or IM topic. To obtain this information, you have to capture packets or check the broadcast records, which is quite troublesome.
  2. Every time the client code adds an IM message, it needs to manually maintain the python script to add the corresponding IM number. There are certain requirements for later maintenance. Students who need maintenance You can write python, and in the future, maintainers are required to take the initiative to understand the new IM messages added in each version iteration, and actively update the IM message types of the script. This undoubtedly increases the maintenance cost.
4.2 The second phase

In this phase, we focus on solving the problems left over from the previous phase. For the problem of obtaining the room ID, this only needs to be done after Just provide the corresponding broadcast list interface on the client. The question is how to make the stress testing process more convenient to operate? Here we think of visualization. Isn’t it very simple to be able to perform stress testing with just one mouse click? So based on front-end technology, we used Vue3 to build a simple IM message operation page. You can select the room and IM number you want to send on this visual interface. While making this tool, we enriched some logic for sending IM messages. It can be personalized for message priority, room messages or site-wide messages, and by the way, it has done some work for IM mock debugging.

Dewu client live broadcast room APM stress testing practice

Then based on this, adjust the interface to tell the backend the room that needs to be pressure tested, and then let the backend call it The first stage script goes to stress test the corresponding room.

Dewu client live broadcast room APM stress testing practice

This method saves the trouble of manually obtaining the room ID by yourself, and is making this visual Mock platform The function of mock IM added at the time has little to do with stress testing. It is essentially the same as the stress testing method implemented by scripts.

4.3 The third phase

This phase solves the above-mentioned problem of message type coverage with function iteration, and at the same time, in order to further liberate manual intervention, based on Teslalab automation The platform uses UI scripts to regularly run our stress testing function, realizing a truly automated stress testing function. The specific operations of each step are explained below

4.3.1 Message type coverage

Each IM message type on the client has a corresponding IM message Java Class, every time an IM message type is added, there will be an entity class to correspond to it. These classes all inherit from the base class BaseLiveChatMessage, so we add an interface abstract method in BaseLiveChatMessage to generate mock data of this message type.

Dewu client live broadcast room APM stress testing practice

那么我们在新加IM数据的时候,继承BaseLiveChatMessage,就需要强制覆盖这个方法,去生成自己的mock消息,非常好的解决了维护性的问题,因为不覆盖这个mock方法是无法通过编译的。

下面是警告消息和抽奖消息的Mock代码:

Dewu client live broadcast room APM stress testing practice

Dewu client live broadcast room APM stress testing practice

有了上面的基础,在测试工程里面加一个IMTest测试类,主要逻辑是扫描所有继承BaseLiveChatMessage类的子类,然后反射构造函数,调用mock接口即可获取到相应IM类的mock消息实体,伪代码如下:

//获取BaseLiveChatMessage子类
if (allSubClass == null) {
allSubClass = ClassUtils.getAllSubClass(BaseApplication.getInstance(), BaseLiveChatMessage::class.java)
val iterator = allSubClass?.iterator()
while (iterator?.hasNext() == true) {
val next = iterator.next()
try {
next.getDeclaredMethod("mock", Int::class.java)
} catch (e: NoSuchMethodException) {
}
}
}
// ....
allSubClass?.forEach {
val o = constructorMap[it]?.newInstance() as BaseLiveChatMessage
var message: BaseLiveChatMessage? = null
message = o.mock(0)
justPostIM(message) //发送IM
}

之后的压测就是控制发送频率、压测时间即可实现本地的压测,无需依赖服务端实现。

Dewu client live broadcast room APM stress testing practice

到此为止,基本已经解决了文章最开始的几个问题,IM消息的覆盖率和可维护性也得到了保证。

4.3.2  自动化

在现有的基础上,为了使得压测更加自动化,我们接入了Teslab自动化测试平台,可以定时启动自动化UI脚本,提升压测效率,自动化脚本是基于UiAutomator,语法非常简易,维护成本很低。

Dewu client live broadcast room APM stress testing practice

  1. 客户端内部备齐所有的IM压测类型。在进行IM压测时,客户端应当支持各种类型的IM消息,例如文本消息、语音消息、Dewu client live broadcast room APM stress testing practice消息、礼物消息等等。同时,客户端还应当支持各种不同的IM操作,如点赞、评论、送礼等,以全面测试IM功能的稳定性和性能。
  2. 直播debug工具接通了kylin,kylin组件已经打通了amp平台。为了更好地收集和记录压测指标,我们需要将直播debug工具与kylin组件和amp平台进行打通,确保能够快速地收集和分析压测数据。在这个过程中,kylin组件将负责接收客户端发送的压测数据,并将这些数据传递给amp平台进行进一步处理和分析。
  3. apm平台收到了直播IM压测记录飞书通知到固定的群。为了及时发现和解决潜在的性能问题,我们需要将压测记录及时通知到相应的人员,例如开发人员、测试人员等。在这个过程中,我们可以利用飞书等即时通讯工具,将压测记录发送到固定的群,以便相关人员及时查看并进行分析。

综上,第三阶段的压测策略通过客户端发起的方式,实现了IM压测使用方式方便、支持多设备压测和压测指标有记录的目标。同时,我们还需要在实际实施过程中不断优化和改进,以进一步提高压测效率和结果的可靠性。

压测流程图:

Dewu client live broadcast room APM stress testing practice

五、压测效果

Dewu client live broadcast room APM stress testing practice

六、收益

压测只是一个手段,最重要的是发现问题,解决问题,通过三个阶段的压测也发现了不少问题。

Through three stages of stress testing, the team successfully discovered and solved some iOS issues. Among them, the most important thing is that when the stress test lasted for more than 20 minutes, the CPU was abnormally high and the interface was stuck. After investigation, it was found that the problem stemmed from the distribution of messages to the business layer one by one, resulting in excessive CPU consumption and too frequent UI refreshes (up to dozens of times per second). To address this problem, the team adopted two solutions: one is to distribute message groups to the business layer through timers instead of distributing messages one by one; the other is to perform thread switching within the timer to ensure that there is only one thread switching within a period of time. .

In addition, the team also discovered the OOM situation caused by the continuous increase of memory during the stress test. The reason is that some IMs have animation execution time and will only be executed once in a period of time. In the case of concurrency, it will continue to accumulate and cause memory overflow. To solve this problem, the team adopted an optimization solution for animation execution to avoid memory overflow.

In addition, through the kylin component, the team also discovered several memory leak problems and solved them in time to ensure the stability and reliability of the live broadcast application. In short, through the three stages of stress testing, the team successfully discovered and solved multiple problems, which not only improved the performance and stability of the application, but also provided useful experience and inspiration for the team's technology accumulation and development.

7. Conclusion

Performance stress testing is indeed an important means to ensure the stable and efficient operation of the live broadcast room, but we cannot regard it as the end point of code development. Good code should be maintainable by the entire team. The readability, maintainability and scalability of the code are equally important. Only by continuously focusing on code quality and team collaboration during the development and maintenance process can the live broadcast room continue to provide users with high-quality services.

While conducting performance stress testing in the live broadcast room, you also need to pay attention to the readability and maintainability of the code. We should establish a strict code review mechanism to monitor and control code quality to ensure the reliability and scalability of the code. At the same time, we focus on team collaboration and establish a communication and cooperation mechanism within the team so that team members can jointly maintain the live broadcast room and provide a better user experience.


The above is the detailed content of Dewu client live broadcast room APM stress testing practice. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete