About the author
Thales Fu, senior R&D manager at Ctrip, is committed to finding better ways to combine AI and engineering to solve real-life problems.
Introduction
In the rapidly iterative software development cycle, automated testing of user interface (UI) has become the key to improving efficiency and ensuring product quality. However, as applications become increasingly complex, traditional UI automation methods gradually reveal their limitations. AI-driven UI automation is here, but it still faces accuracy and reliability challenges. In this context, this article proposes an innovative perspective: through real-time debugging technology, the effectiveness of UI automation scripts written by AI can be significantly improved.
This problem is not just a technical challenge, it is related to how to speed up the delivery of software while ensuring software quality. This article will explore how real-time debugging can help AI understand and execute UI test scripts more accurately, and how this method can bring revolutionary changes to software development.
1. Current status of UI automation
UI automation has experienced considerable development, starting from the simple recording and playback tools. to today’s complex scripting framework. Despite the continuous advancement of technology, traditional UI automation methods still face challenges when dealing with rapidly changing application interfaces. As applications become more complex and dynamic, traditional approaches may not be enough. Therefore, engineers are looking for more flexible and reliable solutions to improve the efficiency and reliability of UI automation. A new generation of UI automation tools and technologies are emerging to
According to industry surveys, manually writing test scripts is inefficient and takes a lot of time to re-work when updates are applied. Research shows that maintaining UI automation test scripts may account for 60% to 70% of the entire testing work. In an agile development environment, it can take more than 100 hours to rewrite and test existing automation scripts for each application update. This high maintenance cost highlights the inefficiency and resource consumption of traditional UI automation methods.
2. The introduction of behavior-driven development BDD
Behavior-driven development (BDD) is an agile software development practice that encourages software More effective communication between a project’s developers, testers, and non-technical stakeholders. Cucumber is a popular tool for implementing BDD methodology, which allows team members to write explicit, executable test cases using natural language.
Cucumber uses a domain-specific language (DSL) called Gherkin, which is extremely easy to read and allows non-technical people to understand the purpose and content of the test. Test scenarios are written in the form of a series of Given-When-Then statements, which clearly describe how the system should respond under specific conditions.
For example, the shopping cart function of an online shopping website may have the following Gherkin scenario:
## This approach leverages natural language description capabilities to promote better communication and understanding between technical and non-technical teams. At the same time, the natural language test scenario also plays the role of project documentation, helping new team members quickly understand the project functions. This enables non-technical personnel to directly participate in the test case writing and verification process, ensuring that development work is closely aligned with business needs.
But it also has limitations. Although the test scenario is written in natural language, the implementation (step definition) behind each step still requires technical personnel to write it in a programming language. This means implementing test logic can involve complex coding efforts. As applications grow and change, maintaining and updating corresponding test steps can become tedious. Especially when the UI changes frequently, the relevant step definitions also need to be updated accordingly. There are also flexibility and adaptability limitations: Cucumber test scripts rely on predefined steps and structures, which can limit the flexibility of the test. For some complex test scenarios, implementing specific test logic may require creative ways to circumvent the limitations of the framework.
3. Current application of AI in UI automation
In recent years, AI technology has been integrated In UI automation, especially after the emergence of large models represented by GPT, because it has its own code generation capabilities. The industry has also begun to try to directly generate Gherkin's test case description language into test code through large models.
However, the test code generated by the current large model cannot fully meet expectations. There are several main problems: First, the generated script, because Syntax errors may prevent it from running; secondly, it may not accurately cover the checkpoints that the test case requires it to test. In our practice, the rate of success on the first try is no more than 5%.
After it fails to be generated, people will need to intervene to perform some remedial work. Including: debugging, modifying the use case to regenerate, or directly modifying the generated script.
#And these tasks themselves also require a lot of manpower, which is contrary to the original intention of our system to automatically generate test scripts through AI.
4. AI fully automatically writes effective test scripts
In order to solve this problem, we have rethought the way AI generates test scripts the whole process.
#We also consider people’s work together. People have done the debugging and modification work in the system, so can AI do this part of the work? Let the system run the generated code by itself, and let AI debug and modify the error codes it generates.
Therefore, we have adjusted the system design to allow AI to do these tasks autonomously instead of humans. In the end, for all the use cases of Ctrip's hotel order details page, 83.3% of the cases were successfully generated without anyone's participation. During the script generation process, bugs were discovered in 8% of the cases. We generated these use cases three times in a row, with success rates of 84.3%, 81.4% and 83.3% respectively. The system is stable and effective.
The specific test cases and codes are as follows:
First, you need to slide to the order details page and drop it User rights module, and then click on the booking optimization area to pop up the price floating layer.
Then check to see if the fee details include Black Diamond VIP.
The final generated test code is as follows:
5. System Implementation
The core architecture diagram of the entire system is as follows. The core part of the system is a langchain framework program. It will access the large model, and we have equipped it with multiple tools, which are mainly divided into two categories, one is the tool for obtaining page information, and the other is the debugging tool.
Langchain will automatically use the page information acquisition tool to get the page data as needed to determine which specific control is needed for the current operation to generate code. Then use the debugging tool to actually execute the code on the mobile phone, and judge whether the code you generated is correct based on the debugging feedback.
5.1 Prompt words
Yes After the basic architecture, we need prompt words to glue these tools together and let AI understand how it should work. Structurally, our prompt word contains several parts: first, tell the AI how it should think and work, secondly, tell it to debug each of its generated statements through Debug, tell it again what the output format is, and finally tell it The complete text of the use case to be handled by the AI.
For telling AI how it should think and work, the expansion includes the following parts: First, look at what modules are on the page, which module should be the step I want to operate, and what are in this module Controls and components, which control or component I want to operate currently, what action I want to operate, and what is the special syntax I can use, and then generate statements.
5.2 Debugging Tools
Debugging Tools The essence is to remotely connect to the phone through the adb tool. After connecting, we can send the instructions generated by the AI to the mobile phone to run, and read the results after the operation and give them to the AI, allowing the AI to judge whether the instructions it generated are correct.
5.3 Page information acquisition tool
The ultimate purpose of the page information acquisition tool is to help AI determine the content to be operated as written in the BDD use case, and what is the ID of the specific control it wants to operate. Only with the ID can the subsequent content be generated based on the ID. Program instructions. In order to get the ID, we need a control and component library. The core of this library is the ID of each control and component and their description. With these two contents, AI can be helped to guess which control is needed based on the description of the control after reading the BDD use case.
To achieve this goal, we established a page control library. In addition to the ID and description of each control on the page, this library also contains the relationship between the page and components, and the relationship between components and controls. It can facilitate AI to query step by step.
#The control library itself is generated based on our static analysis of the code through the job. However, in actual applications, because the controls currently displayed on the page will differ depending on the scene state, the controls on the page will be hidden in some scenarios. Therefore, the page information acquisition tool will intersect the currently existing controls on the page with the controls queried in the control library, thereby obtaining the controls actually displayed on the current page and their description information.
5.4 Further Split AI
## After completing these tasks, AI can basically do the yellow part of the picture above, which is the human work. The generation success rate has also increased from 5% to 55%, but the 55% success rate is still not enough.
We further analyzed the failed cases. It was found that the main problem was the hallucination of the AI. Although the prompt words were relatively detailed, the AI sometimes did not process it as required, and sometimes talked nonsense on its own.
Our conclusion is that AI is given too much responsibility and it has too many things to consider. It’s not that it doesn’t have enough tokens, but that if it has to do too many things, it will be forgotten and unable to accurately complete the requirements. Therefore, we considered splitting, and still used the function of langchain. Since AI can complete functions through tools, why can't this tool itself be an AI?
You can even split it again.
Through these splits, we make the work that each AI needs to consider less and simpler, and also make it process more accurately. The final generation success rate increased to more than 80%.
6. Follow-up development
Currently, through our work, AI can be successfully used with about 80% success without human participation. It is exciting to generate automated test code at a high rate, but there are still many problems that need to be solved.
1) The cost of calling large models is still not low. Is there a better way to complete the work at a lower cost?
2) There are currently some operations or verifications that are difficult to handle. The success rate is 80% and there is still a lot of room for improvement. At present, people still need to review the generated results.
3) In addition, there is room for improvement in other aspects, which is worthy of our continued improvement.
The above is the detailed content of Let AI write effective UI automation with real-time debugging. For more information, please follow other related articles on the PHP Chinese website!

Since 2008, I've championed the shared-ride van—initially dubbed the "robotjitney," later the "vansit"—as the future of urban transportation. I foresee these vehicles as the 21st century's next-generation transit solution, surpas

Revolutionizing the Checkout Experience Sam's Club's innovative "Just Go" system builds on its existing AI-powered "Scan & Go" technology, allowing members to scan purchases via the Sam's Club app during their shopping trip.

Nvidia's Enhanced Predictability and New Product Lineup at GTC 2025 Nvidia, a key player in AI infrastructure, is focusing on increased predictability for its clients. This involves consistent product delivery, meeting performance expectations, and

Google's Gemma 2: A Powerful, Efficient Language Model Google's Gemma family of language models, celebrated for efficiency and performance, has expanded with the arrival of Gemma 2. This latest release comprises two models: a 27-billion parameter ver

This Leading with Data episode features Dr. Kirk Borne, a leading data scientist, astrophysicist, and TEDx speaker. A renowned expert in big data, AI, and machine learning, Dr. Borne offers invaluable insights into the current state and future traje

There were some very insightful perspectives in this speech—background information about engineering that showed us why artificial intelligence is so good at supporting people’s physical exercise. I will outline a core idea from each contributor’s perspective to demonstrate three design aspects that are an important part of our exploration of the application of artificial intelligence in sports. Edge devices and raw personal data This idea about artificial intelligence actually contains two components—one related to where we place large language models and the other is related to the differences between our human language and the language that our vital signs “express” when measured in real time. Alexander Amini knows a lot about running and tennis, but he still

Caterpillar's Chief Information Officer and Senior Vice President of IT, Jamie Engstrom, leads a global team of over 2,200 IT professionals across 28 countries. With 26 years at Caterpillar, including four and a half years in her current role, Engst

Google Photos' New Ultra HDR Tool: A Quick Guide Enhance your photos with Google Photos' new Ultra HDR tool, transforming standard images into vibrant, high-dynamic-range masterpieces. Ideal for social media, this tool boosts the impact of any photo,


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

Dreamweaver Mac version
Visual web development tools

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software