Home >Technology peripherals >AI >Let AI write effective UI automation with real-time debugging

Let AI write effective UI automation with real-time debugging

WBOY
WBOYforward
2024-03-15 15:46:111132browse

About the author

Thales Fu, senior R&D manager at Ctrip, is committed to finding better ways to combine AI and engineering to solve real-life problems.

Introduction

In the rapidly iterative software development cycle, automated testing of user interface (UI) has become the key to improving efficiency and ensuring product quality. However, as applications become increasingly complex, traditional UI automation methods gradually reveal their limitations. AI-driven UI automation is here, but it still faces accuracy and reliability challenges. In this context, this article proposes an innovative perspective: through real-time debugging technology, the effectiveness of UI automation scripts written by AI can be significantly improved.

This problem is not just a technical challenge, it is related to how to speed up the delivery of software while ensuring software quality. This article will explore how real-time debugging can help AI understand and execute UI test scripts more accurately, and how this method can bring revolutionary changes to software development.

1. Current status of UI automation

UI automation has experienced considerable development, starting from the simple recording and playback tools. to today’s complex scripting framework. Despite the continuous advancement of technology, traditional UI automation methods still face challenges when dealing with rapidly changing application interfaces. As applications become more complex and dynamic, traditional approaches may not be enough. Therefore, engineers are looking for more flexible and reliable solutions to improve the efficiency and reliability of UI automation. A new generation of UI automation tools and technologies are emerging to

According to industry surveys, manually writing test scripts is inefficient and takes a lot of time to re-work when updates are applied. Research shows that maintaining UI automation test scripts may account for 60% to 70% of the entire testing work. In an agile development environment, it can take more than 100 hours to rewrite and test existing automation scripts for each application update. This high maintenance cost highlights the inefficiency and resource consumption of traditional UI automation methods.

2. The introduction of behavior-driven development BDD

Behavior-driven development (BDD) is an agile software development practice that encourages software More effective communication between a project’s developers, testers, and non-technical stakeholders. Cucumber is a popular tool for implementing BDD methodology, which allows team members to write explicit, executable test cases using natural language.

Cucumber uses a domain-specific language (DSL) called Gherkin, which is extremely easy to read and allows non-technical people to understand the purpose and content of the test. Test scenarios are written in the form of a series of Given-When-Then statements, which clearly describe how the system should respond under specific conditions.

For example, the shopping cart function of an online shopping website may have the following Gherkin scenario:

Let AI write effective UI automation with real-time debugging

## This approach leverages natural language description capabilities to promote better communication and understanding between technical and non-technical teams. At the same time, the natural language test scenario also plays the role of project documentation, helping new team members quickly understand the project functions. This enables non-technical personnel to directly participate in the test case writing and verification process, ensuring that development work is closely aligned with business needs.

But it also has limitations. Although the test scenario is written in natural language, the implementation (step definition) behind each step still requires technical personnel to write it in a programming language. This means implementing test logic can involve complex coding efforts. As applications grow and change, maintaining and updating corresponding test steps can become tedious. Especially when the UI changes frequently, the relevant step definitions also need to be updated accordingly. There are also flexibility and adaptability limitations: Cucumber test scripts rely on predefined steps and structures, which can limit the flexibility of the test. For some complex test scenarios, implementing specific test logic may require creative ways to circumvent the limitations of the framework.

Let AI write effective UI automation with real-time debugging

3. Current application of AI in UI automation

In recent years, AI technology has been integrated In UI automation, especially after the emergence of large models represented by GPT, because it has its own code generation capabilities. The industry has also begun to try to directly generate Gherkin's test case description language into test code through large models.

Let AI write effective UI automation with real-time debugging

However, the test code generated by the current large model cannot fully meet expectations. There are several main problems: First, the generated script, because Syntax errors may prevent it from running; secondly, it may not accurately cover the checkpoints that the test case requires it to test. In our practice, the rate of success on the first try is no more than 5%.

After it fails to be generated, people will need to intervene to perform some remedial work. Including: debugging, modifying the use case to regenerate, or directly modifying the generated script.

Let AI write effective UI automation with real-time debugging

#And these tasks themselves also require a lot of manpower, which is contrary to the original intention of our system to automatically generate test scripts through AI.

4. AI fully automatically writes effective test scripts

In order to solve this problem, we have rethought the way AI generates test scripts the whole process.

Let AI write effective UI automation with real-time debugging

#We also consider people’s work together. People have done the debugging and modification work in the system, so can AI do this part of the work? Let the system run the generated code by itself, and let AI debug and modify the error codes it generates.

Therefore, we have adjusted the system design to allow AI to do these tasks autonomously instead of humans. In the end, for all the use cases of Ctrip's hotel order details page, 83.3% of the cases were successfully generated without anyone's participation. During the script generation process, bugs were discovered in 8% of the cases. We generated these use cases three times in a row, with success rates of 84.3%, 81.4% and 83.3% respectively. The system is stable and effective.

Let AI write effective UI automation with real-time debugging

The specific test cases and codes are as follows:

Let AI write effective UI automation with real-time debugging

First, you need to slide to the order details page and drop it User rights module, and then click on the booking optimization area to pop up the price floating layer.

Let AI write effective UI automation with real-time debugging

Then check to see if the fee details include Black Diamond VIP.

Let AI write effective UI automation with real-time debugging

The final generated test code is as follows:

Let AI write effective UI automation with real-time debugging

5. System Implementation

The core architecture diagram of the entire system is as follows. The core part of the system is a langchain framework program. It will access the large model, and we have equipped it with multiple tools, which are mainly divided into two categories, one is the tool for obtaining page information, and the other is the debugging tool.

Langchain will automatically use the page information acquisition tool to get the page data as needed to determine which specific control is needed for the current operation to generate code. Then use the debugging tool to actually execute the code on the mobile phone, and judge whether the code you generated is correct based on the debugging feedback.

Let AI write effective UI automation with real-time debugging

5.1 Prompt words

Yes After the basic architecture, we need prompt words to glue these tools together and let AI understand how it should work. Structurally, our prompt word contains several parts: first, tell the AI ​​how it should think and work, secondly, tell it to debug each of its generated statements through Debug, tell it again what the output format is, and finally tell it The complete text of the use case to be handled by the AI.

For telling AI how it should think and work, the expansion includes the following parts: First, look at what modules are on the page, which module should be the step I want to operate, and what are in this module Controls and components, which control or component I want to operate currently, what action I want to operate, and what is the special syntax I can use, and then generate statements.

Let AI write effective UI automation with real-time debugging

5.2 Debugging Tools

Debugging Tools The essence is to remotely connect to the phone through the adb tool. After connecting, we can send the instructions generated by the AI ​​to the mobile phone to run, and read the results after the operation and give them to the AI, allowing the AI ​​to judge whether the instructions it generated are correct.

5.3 Page information acquisition tool

The ultimate purpose of the page information acquisition tool is to help AI determine the content to be operated as written in the BDD use case, and what is the ID of the specific control it wants to operate. Only with the ID can the subsequent content be generated based on the ID. Program instructions. In order to get the ID, we need a control and component library. The core of this library is the ID of each control and component and their description. With these two contents, AI can be helped to guess which control is needed based on the description of the control after reading the BDD use case.

To achieve this goal, we established a page control library. In addition to the ID and description of each control on the page, this library also contains the relationship between the page and components, and the relationship between components and controls. It can facilitate AI to query step by step.

Let AI write effective UI automation with real-time debugging

#The control library itself is generated based on our static analysis of the code through the job. However, in actual applications, because the controls currently displayed on the page will differ depending on the scene state, the controls on the page will be hidden in some scenarios. Therefore, the page information acquisition tool will intersect the currently existing controls on the page with the controls queried in the control library, thereby obtaining the controls actually displayed on the current page and their description information.

5.4 Further Split AI

Let AI write effective UI automation with real-time debugging

## After completing these tasks, AI can basically do the yellow part of the picture above, which is the human work. The generation success rate has also increased from 5% to 55%, but the 55% success rate is still not enough.

We further analyzed the failed cases. It was found that the main problem was the hallucination of the AI. Although the prompt words were relatively detailed, the AI ​​sometimes did not process it as required, and sometimes talked nonsense on its own.

Our conclusion is that AI is given too much responsibility and it has too many things to consider. It’s not that it doesn’t have enough tokens, but that if it has to do too many things, it will be forgotten and unable to accurately complete the requirements. Therefore, we considered splitting, and still used the function of langchain. Since AI can complete functions through tools, why can't this tool itself be an AI?

Let AI write effective UI automation with real-time debugging

You can even split it again.

Let AI write effective UI automation with real-time debugging

Through these splits, we make the work that each AI needs to consider less and simpler, and also make it process more accurately. The final generation success rate increased to more than 80%.

6. Follow-up development

Currently, through our work, AI can be successfully used with about 80% success without human participation. It is exciting to generate automated test code at a high rate, but there are still many problems that need to be solved.

1) The cost of calling large models is still not low. Is there a better way to complete the work at a lower cost?

2) There are currently some operations or verifications that are difficult to handle. The success rate is 80% and there is still a lot of room for improvement. At present, people still need to review the generated results.

3) In addition, there is room for improvement in other aspects, which is worthy of our continued improvement.

The above is the detailed content of Let AI write effective UI automation with real-time debugging. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete