Home  >  Article  >  Technology peripherals  >  Optimize the splicing performance of Ctrip’s transfer transportation plan

Optimize the splicing performance of Ctrip’s transfer transportation plan

WBOY
WBOYforward
2023-04-25 18:31:091239browse

About the author

In short, Ctrip back-end development manager, focusing on technical architecture, performance optimization, transportation planning and other fields.

1. Background introduction

Due to the limitations of transportation planning and transportation capacity resources, there may be no direct transportation between the two places queried by the user, or during major holidays, direct transportationare all sold out. However, users can still reach their destination through two-way or multi-way transfers such as trains, planes, cars, ships, etc. In addition, transfer transportation is sometimes more advantageous in terms of price and time consumption. For example, from Shanghai to Yuncheng, connecting via train may be faster and cheaper than a direct train.

Optimize the splicing performance of Ctrip’s transfer transportation plan

Figure 1 Ctrip train transfer transportation list

When providing transfer transportation solutions, it is very important One link is to stitch together two or more journeys of trains, planes, cars, ships, etc. to form a feasible transfer plan. The first difficulty in splicing transit traffic is that the splicing space is huge. Just considering Shanghai as a transit city, nearly 100 million combinations can be generated. Another difficulty lies in the requirement of real-time performance, because the production line data changes at any time and needs to be constantly updated. Query data on trains, planes, cars, and ships. Transit traffic splicing requires a lot of computing resources and IO overhead, so optimizing its performance is particularly important.

This article will combine examples to introduce the principles, analysis and optimization methods followed in the process of optimizing the performance of transfer traffic splicing, aiming to provide readers with valuable reference and inspiration.

2. Optimization Principles

Performance optimization requires balancing and making trade-offs among various resources and constraints on the premise of meeting business needs, and following some major principles Helps eliminate uncertainty and approach the optimal solution to the problem. Specifically, the following three principles are mainly followed in the transfer traffic splicing optimization process:

2.1 Performance optimization is a means rather than an end

Although this article is about performance optimization , but it still needs to be emphasized at the beginning: don’t optimize for the sake of optimization. There are many ways to meet business needs, and performance optimization is just one of them. Sometimes the problem is very complex and there are many restrictions. Without significantly affecting the user experience, reducing the impact on users by relaxing restrictions or adopting other processes is also a good way to solve performance problems. In software development, there are many examples of significant cost reductions achieved by sacrificing a small amount of performance. For example, the HyperLogLog algorithm used for cardinality statistics (duplication removal) in Redis can count 264 data in only 12K space with a standard error of 0.81%.

Back to the problem itself, since production line data needs to be queried frequently and massive splicing operations are performed, if each user is required to immediately return the freshest transfer plan when querying, The cost will be very high. To reduce costs, a balance needs to be struck between response time and data freshness. After careful consideration, we choose to accept minute-level data inconsistencies. For some unpopular routes and dates, there may not be a good transfer plan when querying for the first time. In this case, just guide the user to refresh the page.

2.2 Incorrect optimization is the root of all evil

Donald Knuth mentioned in "Structured Programming With Go To Statements": "Programmers waste a lot of time Thinking and worrying about the performance of non-critical paths and trying to optimize this part of performance will have a very serious negative impact on the debugging and maintenance of the overall code, so in 97% of cases, we should forget small optimization points." In short, before the real performance problem is discovered, excessive and ostentatious optimization at the code level will not only not improve performance, but may lead to more errors. However, the author also emphasized: "For the remaining critical 3%, we should not miss the opportunity to optimize." Therefore, you need to always pay attention to performance issues, not make decisions that will affect performance, and make correct optimizations when necessary.

2.3 Quantitatively analyze performance and clarify the direction of optimization

As mentioned in the previous section, before optimizing, you must first quantify the performance and identify bottlenecks, so that the optimization can be more targeted. Quantitative analysis of performance can use time-consuming monitoring, Profiler performance analysis tools, Benchmark benchmark testing tools, etc., focusing on areas that take a particularly long time or have a particularly high execution frequency. As Amdahl's law states: "The degree of system performance improvement that can be obtained by using a faster execution method for a certain component in the system depends on the frequency with which this execution method is used, or the proportion of the total execution time."

In addition, it is also important to note that performance optimization is a protracted battle. As the business continues to develop, the architecture and code are constantly changing, so it is even more necessary to continuously quantify performance, continuously analyze bottlenecks, and evaluate optimization effects.

3. The road to performance analysis

3.1 Sort out the business process

Before performance analysis, we must first sort out the business process. The splicing of transfer transportation plans mainly includes the following four steps:

#a. Load the route map, such as Beijing to Shanghai via Nanjing transfer, only the information of the route itself is considered, and has nothing to do with the specific flight;

b. Check the production line data of trains, planes, cars and ships, including departure time, arrival time, departure station, arrival station, price and remaining ticket information, etc.;

c. To stitch together all feasible transfer transportation options, the main consideration is that the transfer time should not be too short to avoid being unable to complete the transfer; at the same time, it should not be too long to avoid waiting too long. After splicing out feasible solutions, you still need to improve the business fields, such as total price, total time taken, transfer information, etc.;

d. According to certain rules, from all the spliced ​​out From the feasible transfer solutions, some solutions that may be of interest to users are selected.

3.2 Quantitative analysis performance

(1) Increase time-consuming monitoring

Time-consuming Monitoring is the most intuitive way to observe the time-consuming situation of each stage from a macro perspective. It can not only view the time-consuming value and proportion of time-consuming at each stage of the business process, but also observe the time-consuming change trend over a long period of time.

Time-consuming monitoring can use the company’s internal indicator monitoring and alarm system to add time-consuming management to the main process of connecting transit transportation solutions. These processes include loading route maps, querying shift data and splicing it, filtering and saving splicing plans, etc. The time-consuming situation of each stage is shown in Figure 2. It can be seen that splicing (including production line data) takes the highest proportion of time-consuming, so it has become a key optimization target in the future.

Optimize the splicing performance of Ctrip’s transfer transportation plan

Figure 2 Time-consuming monitoring of transit traffic splicing

(2) Profiler performance analysis

Time-consuming management may invade business code and affect performance, so it should not be too many and is more suitable for monitoring main processes. The corresponding Profiler performance analysis tool (such as Async-profiler) can generate a more specific call tree and the CPU usage ratio of each function, thereby helping to analyze and locate critical paths and performance bottlenecks.

Optimize the splicing performance of Ctrip’s transfer transportation plan

Figure 3 Splicing call tree and CPU ratio

As shown in Figure 3, the splicing scheme (combineTransferLines) accounts for 53.80%, and query production line data (querySegmentCacheable, cache used) accounts for 21.45%. In the splicing scheme, calculating scheme score (computeTripScore, accounting for 48.22%), creating scheme entity (buildTripBO, accounting for 4.61%) and checking splicing feasibility (checkCombineMatchCondition, accounting for 0.91%) are the three largest links.

Optimize the splicing performance of Ctrip’s transfer transportation plan

Figure 4 Solution scoring call tree and CPU ratio

When we continued to analyze the calculation plan score (computeTripScore) with the highest proportion, we found that it was mainly related to the custom string formatting function (StringUtils.format), including direct calls (used to display plan score details), As well as an indirect call via getTripId (used to generate the ID of the scheme). The highest proportion of customized StringUtils.format is java/lang/String.replace. Java 8’s native string replacement is implemented through regular expressions, which is relatively inefficient (this problem has been improved after Java 9).

// 计算方案评分(computeTripScore) 中调用的StringUtils.format代码示例
StringUtils.format("AAAA-{0},BBBB-{1},CCCC-{2},DDDD-{3},EEEE-{4},FFFF-{5},GGGG-{6},HHHH-{7},IIII-{8},JJJJ-{9}",
aaaa, bbbb, cccc, dddd, eeee, ffff, gggg, hhhh, iiii, jjjj)


// getTripId 中调用StringUtils.format代码示例
StringUtils.format("{0}_{1}_{2}_{3}_{4}_{5}_{6}", aaaa, bbbb, cccc, dddd, eeee, ffff)


// 通过Java replace实现的自定义format函数
public static String format(String template, Object... parameters) {
for (int i = 0; i < parameters.length; i++) {
template = template.replace("{" + i + "}", parameters[i] + "");
}
return template;
}

(3)Benchmark benchmark test

With the help of Benchmark benchmark test tool, you can measure the performance of the code more accurately execution time. In Table 1, we use JMH (Java Microbenchmark Harness) to conduct time-consuming tests on three string formatting methods and one string splicing method. The test results show that string formatting using Java8's replace method has the worst performance, while using Apache's string splicing function has the best performance.

Table 1 Comparison of string formatting and splicing performance

##116.812

四、性能优化之路

通过以上的性能分析,我们发现拼接和查询产线数据是性能瓶颈,字符串格式化影响尤其大。因此,我们将致力于优化这些部分,以提高性能表现。

4.1 优化代码逻辑

优化代码逻辑是最简单且性价比最高的方法,可以是修正有问题的代码或替换为更好的实现。不同的实现,哪怕减上几纳秒,累加起来也是很可观的。借助一些经典算法或数据结构(如快速排序、红黑树等)可以在时间和空间复杂度方面带来显著优势。回到中转交通方案拼接性能优化本身,优化的代码逻辑主要包括:

(1)优化字符串拼接性能

如前面的JMH的结果所示,自定义的字符串格式化函数性能最差,因此作为重点优化目标。优化前后的对比如下所示:

// 优化前,通过Java replace实现的format函数
public static String format(String template, Object... parameters) {
for (int i = 0; i < parameters.length; i++) {
template = template.replace("{" + i + "}", parameters[i] + "");
}
return template;
}
// 优化后,通过Apache replace实现的format函数
public static String format(String template, Object... parameters) {
for (int i = 0; i < parameters.length; i++) {
String temp = new StringBuilder().append('{').append(i).append('}').toString();
template = org.apache.commons.lang3.StringUtils.replace(template, temp, String.valueOf(parameters[i]));
}
return template;
}

根据JMH的测试结果,即使是优化后的格式化函数,其性能也不是最优的。在不显著影响可读性的前提下,应尽量使用性能更优的StringUtils.join函数。

// 优化前
StringUtils.format("{0}_{1}_{2}_{3}_{4}_{5}_{6}", aaaa, bbbb, cccc, dddd, eeee, ffff)


// 优化后
StringUtils.join("_", aaaa, bbbb, cccc, dddd, eeee, ffff)

为进一步提升性能,可以在computeTripScore 函数中添加一个开关,仅在调试模式下才展示评分细节,这将确保该字符串格式化函数仅在需要时才被调用。

if (Config.getBoolean("enable.score.detail", false)) {
scoreDetail = StringUtils.format("AAAA-{0},BBBB-{1},CCCC-{2},DDDD-{3},EEEE-{4},FFFF-{5},GGGG-{6},HHHH-{7},IIII-{8},JJJJ-{9}",
aaaa, bbbb, cccc, dddd, eeee, ffff, gggg, hhhh, iiii, jjjj);
}

优化后的CPU占比如图5所示,此时字符串格式化已经不再是性能瓶颈。

Optimize the splicing performance of Ctrip’s transfer transportation plan

图5 优化后的拼接调用树和CPU占比

(2)增加索引降低拼接时间复杂度

Optimize the splicing performance of Ctrip’s transfer transportation plan

图6 增加索引降低拼接时间复杂度

在中转拼接过程中,我们需要将第一程每个班次的到达时间与第二程每个班次的出发时间进行比较,以判断中转时间是否过短或过长。为简化说明,假设换乘时间间隔需要满足大于30分钟且小于6小时。以北京到上海经南京中转的两程火车为例,3月9日北京到南京有66个班次,南京到上海有275个班次,考虑到隔夜车,还需要算上3月10日南京到上海的275个班次,那么最多需要比较36300(66*275*2)次。

为避免频繁比较,参考了MySQL B+树索引的思想,将第二程南京到上海的所有火车班次数据构建成红黑树。其中,树的键为秒级时间戳,例如2023-03-09 11:29出发的G367键为1677247680,值为G367的班次数据。有了索引树,最多只需要10次比较,就可以找到最近的满足最小换乘时间要求的班次。同理,最多需要10次比较,就能找到满足最大换乘时间要求的最晚班次。两者之间的所有班次都满足耗时要求,直接进行拼接即可。改进后最多需要比较1320(66*(10+10))次,约为原来的1/27.5。

(3)使用多路归并求Top-K算法

在筛选方案时,会存在以下场景:有多个中转点,每个中转点都有数百个得分较高的方案(内部已按得分由高到低排序,通过小根堆实现)。最终需要将这些方案合并,并从中筛选出得分最高的K个方案。

最简单的方法是使用快速排序将所有的方案排序,然后选取前K个,时间复杂度约为O(nlog2n)。然而,这并没有利用到每个队列自身有序的特点。通过多路归并算法时间复杂度可降为O(nlog2k),具体步骤为:

a.  从每个队列中拿出第一个元素(得分最高的方案),放入大根堆中;

b. Take the largest element from the top of the big root heap and put it into the result set;

c. If there are remaining elements in the queue where the element is located, then Add the next element to the heap;

d. Repeat steps 2 and 3 until the result set contains K elements or all queues are empty.

Optimize the splicing performance of Ctrip’s transfer transportation plan

Figure 7 Multi-path merge Top-K algorithm

4.2 Build multi-level cache

Caching is a typical space-for-time strategy that can cache data and calculation results. Caching data can improve access efficiency, and caching results avoids repeated calculations. While caching brings performance improvements, it also introduces new problems:

  • The cache capacity is limited, and the data loading, updating, invalidation and replacement strategies need to be carefully considered;
  • Design of cache architecture: Generally speaking, memory caches (such as HashMap, Caffeine, etc.) have the highest performance, followed by distributed caches such as Redis, RocksDB is relatively slow, and the upper limit of capacity is just the opposite. , need to be carefully selected and used together;
  • How to solve the cache inconsistency problem and how long the inconsistency can be accepted.

In the process of splicing transfer transportation solutions, a large amount of basic data (such as stations, administrative areas, etc.) and massive dynamic data (such as shift data) need to be used. Based on the above factors and combined with the business characteristics of transit transportation splicing, the cache architecture is designed as follows:

  • Basic data (such as stations, administrative areas, etc.), due to the small amount of data, changes The frequency is low, the entire amount is saved in HashMap, and the entire amount is updated periodically;
  • # Some train, plane, car, and ship schedule data are cached in Redis to improve access efficiency and stability. The caching strategies adopted by different production lines are slightly different, but in general they are a combination of scheduled updates and search-triggered updates;
  • Hundreds of product queries may be queried in one splicing process. For line data, the cumulative millisecond delay of Redis is also very large. Therefore, it is hoped to build another layer of memory cache on top of Redis to improve performance. Through analysis, it was found that there are very obvious hot data in the splicing process. The proportion of queries on popular dates and routes is very high and the number is relatively limited. Therefore, this part of hotspot data can be saved in the memory cache and replaced with LFU (Least Frequently Used). The final production line data memory cache hit rate reaches more than 45%, which is equivalent to reducing nearly half of the IO overhead.
  • Because minute-level data inconsistency can be accepted, the splicing results are cached. During the validity period, if the next user queries the same route on the same departure date, the cached data can be used directly. . Because the spliced ​​transfer scheme data is relatively large, the splicing results are saved in RocksDB. Although the performance is not as good as Redis, the impact on a single query is acceptable.

Optimize the splicing performance of Ctrip’s transfer transportation plan

Figure 8 Multi-level cache structure

4.3 Preprocessing

Although in theory you can choose any city as a transit point between the two places, in fact most transit cities are unable to splice high-quality solutions. Therefore, some high-quality transfer points are first screened out through offline preprocessing, thereby reducing the solution space from thousands to tens. Compared with dynamically changing shifts, line data is relatively fixed and can be calculated once a day. In addition, offline preprocessing can use big data technology to process massive data and is relatively insensitive to time consumption.

4.4 Multi-threaded processing

In a splicing process, dozens of lines with different transfer points need to be processed. The splicing of each line is independent of each other, so multi-threading can be used, which minimizes processing time. However, due to the influence of the number of line shifts and cache hit rate, the splicing time of different lines is difficult to be consistent. Many times, when two threads are assigned the same number of tasks, even if one thread finishes executing quickly, it must wait for the other thread to finish executing before proceeding to the next step. To avoid this situation, the work-stealing mechanism of ForkJoinPool is used. This mechanism can ensure that after each thread completes its own task, it will share the unfinished work of other threads, improve concurrency efficiency, and reduce idle time.

But multi-threading is not omnipotent. When using it, you need to pay attention to:

  • The execution of subtasks needs to be independent of each other and independent of each other. Influence. If there are dependencies, you need to wait for the previous task to be executed before starting the next task, which will make multi-threading meaningless;
  • The number of CPU cores determines the upper limit of concurrency. Too many threads will reduce performance due to frequent context switching. Special attention needs to be paid to indicators such as the number of threads, CPU usage, and CPU Throttled time.

4.5 Delayed calculation

By deferring calculation to the necessary moment, it is possible to avoid a lot of redundant overhead. For example, after splicing the transfer plan, you need to build the plan entity and improve the business fields. This part also consumes resources. And not all spliced ​​solutions will be screened out, which means that these unscreened solutions still need to consume computing resources. Therefore, the construction of the complete solution entity object is delayed. Tens of thousands of solutions in the splicing process are first saved as lightweight intermediate objects, and the complete solution entity is only built for the hundreds of intermediate objects after screening.

4.6 JVM Optimization

The transit traffic splicing project is based on Java 8 and uses the G1 (Garbage-First) garbage collector and is deployed on 8C8G machines. G1 achieves high throughput while meeting pause time requirements as much as possible. The default parameters set by the system architecture department are already suitable for most scenarios and usually do not require special optimization.

However, there are too many line transfer schemes, resulting in packets that are too large, exceeding half of the Region size (8G, the default Region size is 2M), resulting in many large objects that should enter the young generation directly Entering the old age, in order to avoid this situation, change the Region size to 16M.

5. Summary

Through the above analysis and optimization, the changes in splicing time consumption are shown in Figure 9:

Optimize the splicing performance of Ctrip’s transfer transportation plan

Figure 9 Performance optimization effect of transfer transportation scheme splicing

Although each business and scenario has its own characteristics, performance optimization also requires specific analysis. However, the principles are the same, and you can still refer to the analysis and optimization methods described in this article. A summary of all analysis and optimization methods in this article is shown in Figure 10.

Optimize the splicing performance of Ctrip’s transfer transportation plan

Figure 10 Summary of splicing optimization of transfer transportation plan

##Implementation

Average time taken to execute 1000 times (us)

StringUtils.format implemented using Java8's replace

1988.982

StringUtils.format implemented using Apache replace

656.537

Java8 comes with String.format

##1417.474

Apache's StringUtils.join

The above is the detailed content of Optimize the splicing performance of Ctrip’s transfer transportation plan. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete