Home > Article > Backend Development > Dewu client live broadcast room APM stress testing practice
With the rapid development of the live broadcast industry, more and more companies are involved in this field. The stability and user experience of the live broadcast room have become important factors in the competition of live broadcast platforms. . However, since the live broadcast room involves many complex technical links, such as video transmission, network communication, data processing, etc., the performance stress test of the live broadcast room is particularly important. In the practice of stress testing in client live broadcast rooms, APM stress testing technology is a commonly used performance testing method. Through real-time monitoring and diagnosis of application performance, performance bottlenecks can be quickly located and solved, and the stability of the live broadcast room can be improved. and user experience.
It can be seen that APM stress testing is very important to ensure the stability of the live broadcast room, improve user experience, discover system bottlenecks and optimize system performance.
In summary, through stress testing methods such as load testing, bandwidth testing, performance testing, security testing and reliability testing, the performance and stability of the live broadcast room can be comprehensively evaluated. safety, security and reliability to ensure that the live broadcast room can meet the needs and expectations of users.
#The main stress testing methods used in Dewu Live Broadcast Room are load testing and performance testing.
First of all, the goal of our stress test is [IM performance stress test based on the live broadcast room]. The main purpose of the stress test is to monitor When a live broadcast room on the client receives a large number of IM messages for a long time, will performance problems such as lag, crash, or OOM occur? Run a round of stress testing before each release to expose performance problems in the live broadcast room offline in advance to prevent performance problems from being brought online.
In terms of specific stress testing methods, we hope to meet the following conditions:
Based on the above requirements, while exploring the method of stress testing, our live broadcast business group has probably gone through the following three stages:
The first stage of the live broadcast room stress test adopts a relatively simple method. A script is used to simulate users sending comments, likes, etc. IM to the room that needs to be stress tested. You need to write the corresponding python code yourself and send the corresponding IM message to a live broadcast room. The following is part of the Python script:
class APIUtils: """ 仅适用于测试环境 """ @staticmethod def token(user_id: int): resp = requests.get('https://xxxx.com', params={'user_id': user_id}) return resp.json().get('token') @staticmethod def change_rc_im(user_id: int): try: im_info = requests.post( 'http://xxxx.com', headers={'userId': '1'}, data={'kolUserId': user_id} ) im_id = im_info.json().get('data', {}).get('list', [{}])[0].get('id', 0) requests.post( 'http://xxxx.com', headers={'userId': '1'}, data={'kolUserId': user_id, 'id': im_id} ) except: pass time.sleep(3) data = { "startTime": int(time.time()) + 1, "endTime": int(time.time()) + 600 * 6, "kolUserId": user_id, "imSwitch": 1, "id": 0 } requests.post('xxxx.com', headers={'userId': '1'}, data=data) @staticmethod def get_topic(user_id: int, room_id: int): """ 获取房间号 """ headers = { 'POIZON-USERID': str(user_id), 'POIZON-ISGUEST': 'false', 'platform': 'iPhone', 'v': '4.78.0' } try: resp = requests.get('xxxx.com', headers=headers, params={'roomId': room_id}) return resp.json().get('data').get('room').get('imInfo').get('chatRoomId') except Exception as e: raise e
Main process As shown below:
## The pressure test implemented in this way is relatively simple, and It can cover some important IM messages, but it also has several obvious shortcomings:
In this phase, we focus on solving the problems left over from the previous phase. For the problem of obtaining the room ID, this only needs to be done after Just provide the corresponding broadcast list interface on the client. The question is how to make the stress testing process more convenient to operate? Here we think of visualization. Isn’t it very simple to be able to perform stress testing with just one mouse click? So based on front-end technology, we used Vue3 to build a simple IM message operation page. You can select the room and IM number you want to send on this visual interface. While making this tool, we enriched some logic for sending IM messages. It can be personalized for message priority, room messages or site-wide messages, and by the way, it has done some work for IM mock debugging.
Then based on this, adjust the interface to tell the backend the room that needs to be pressure tested, and then let the backend call it The first stage script goes to stress test the corresponding room.
This method saves the trouble of manually obtaining the room ID by yourself, and is making this visual Mock platform The function of mock IM added at the time has little to do with stress testing. It is essentially the same as the stress testing method implemented by scripts.
4.3 The third phaseThis phase solves the above-mentioned problem of message type coverage with function iteration, and at the same time, in order to further liberate manual intervention, based on Teslalab automation The platform uses UI scripts to regularly run our stress testing function, realizing a truly automated stress testing function. The specific operations of each step are explained below
4.3.1 Message type coverageEach IM message type on the client has a corresponding IM message Java Class, every time an IM message type is added, there will be an entity class to correspond to it. These classes all inherit from the base class BaseLiveChatMessage, so we add an interface abstract method in BaseLiveChatMessage to generate mock data of this message type.
那么我们在新加IM数据的时候,继承BaseLiveChatMessage,就需要强制覆盖这个方法,去生成自己的mock消息,非常好的解决了维护性的问题,因为不覆盖这个mock方法是无法通过编译的。
下面是警告消息和抽奖消息的Mock代码:
有了上面的基础,在测试工程里面加一个IMTest测试类,主要逻辑是扫描所有继承BaseLiveChatMessage类的子类,然后反射构造函数,调用mock接口即可获取到相应IM类的mock消息实体,伪代码如下:
//获取BaseLiveChatMessage子类 if (allSubClass == null) { allSubClass = ClassUtils.getAllSubClass(BaseApplication.getInstance(), BaseLiveChatMessage::class.java) val iterator = allSubClass?.iterator() while (iterator?.hasNext() == true) { val next = iterator.next() try { next.getDeclaredMethod("mock", Int::class.java) } catch (e: NoSuchMethodException) { } } } // .... allSubClass?.forEach { val o = constructorMap[it]?.newInstance() as BaseLiveChatMessage var message: BaseLiveChatMessage? = null message = o.mock(0) justPostIM(message) //发送IM }
之后的压测就是控制发送频率、压测时间即可实现本地的压测,无需依赖服务端实现。
到此为止,基本已经解决了文章最开始的几个问题,IM消息的覆盖率和可维护性也得到了保证。
在现有的基础上,为了使得压测更加自动化,我们接入了Teslab自动化测试平台,可以定时启动自动化UI脚本,提升压测效率,自动化脚本是基于UiAutomator,语法非常简易,维护成本很低。
综上,第三阶段的压测策略通过客户端发起的方式,实现了IM压测使用方式方便、支持多设备压测和压测指标有记录的目标。同时,我们还需要在实际实施过程中不断优化和改进,以进一步提高压测效率和结果的可靠性。
压测流程图:
压测只是一个手段,最重要的是发现问题,解决问题,通过三个阶段的压测也发现了不少问题。
Through three stages of stress testing, the team successfully discovered and solved some iOS issues. Among them, the most important thing is that when the stress test lasted for more than 20 minutes, the CPU was abnormally high and the interface was stuck. After investigation, it was found that the problem stemmed from the distribution of messages to the business layer one by one, resulting in excessive CPU consumption and too frequent UI refreshes (up to dozens of times per second). To address this problem, the team adopted two solutions: one is to distribute message groups to the business layer through timers instead of distributing messages one by one; the other is to perform thread switching within the timer to ensure that there is only one thread switching within a period of time. .
In addition, the team also discovered the OOM situation caused by the continuous increase of memory during the stress test. The reason is that some IMs have animation execution time and will only be executed once in a period of time. In the case of concurrency, it will continue to accumulate and cause memory overflow. To solve this problem, the team adopted an optimization solution for animation execution to avoid memory overflow.
In addition, through the kylin component, the team also discovered several memory leak problems and solved them in time to ensure the stability and reliability of the live broadcast application. In short, through the three stages of stress testing, the team successfully discovered and solved multiple problems, which not only improved the performance and stability of the application, but also provided useful experience and inspiration for the team's technology accumulation and development.
Performance stress testing is indeed an important means to ensure the stable and efficient operation of the live broadcast room, but we cannot regard it as the end point of code development. Good code should be maintainable by the entire team. The readability, maintainability and scalability of the code are equally important. Only by continuously focusing on code quality and team collaboration during the development and maintenance process can the live broadcast room continue to provide users with high-quality services.
While conducting performance stress testing in the live broadcast room, you also need to pay attention to the readability and maintainability of the code. We should establish a strict code review mechanism to monitor and control code quality to ensure the reliability and scalability of the code. At the same time, we focus on team collaboration and establish a communication and cooperation mechanism within the team so that team members can jointly maintain the live broadcast room and provide a better user experience.
The above is the detailed content of Dewu client live broadcast room APM stress testing practice. For more information, please follow other related articles on the PHP Chinese website!