Home > Article > Backend Development > PHP distributed tracing experience sharing
In this article, we mainly share with you some experiences of PHP distributed tracking, hoping to help everyone.
Since implementing microservices, we have encountered many problems. The biggest problem is how to troubleshoot faults. Service-oriented interfaces usually rely on multiple services. The slowness of the dependent interfaces will directly affect the service quality of the interfaces.
Slowness caused by this kind of dependence is very common online, but it is not easy to troubleshoot. The reason is that a large number of log developers track online through logs, which is not very intuitive for developers. And some company developers cannot see the specific execution status online. Generally speaking, these small probability failures online represent hidden dangers in the system. When the traffic increases, these hidden dangers will be amplified and even directly lead to large-scale online failures. In order to avoid similar things, we need to do a lot of things. The most intuitive is to use Distributed tracing systems for statistical analysis.
We often see experts talking about how to optimize online performance and how to improve performance. In fact, there is an important link that they did not mention. They are How to find low-probability faults? Distributed tracking systems are very common in large Internet companies, but small and medium-sized companies do not have the technical strength to implement this system. From our point of view, even if the traffic is very small, the system is still very important to the company and we need to strengthen it. Only by being able to find problems can we solve them. This is the purpose that I have always implemented.
The specific implementation of the distributed tracking system has certain technical difficulties. It must realize performance capture, log writing, log collection and sorting, log transmission, log storage, and logging. Indexing, log real-time analysis, and final merge display require the system to be able to cope with the impact of large traffic systems. For example, each request generates 1k logs per interface, then the QPS 2000 server will generate 2M logs. If a request relies on 5 interfaces, then it will be 10M logs per second. When the online business is more complex and the traffic is larger, time, this value will increase.
Large Internet companies have many distributed tracking systems that can withstand billions of traffic, but for small companies, this architecture is very burdensome, and many links such as dependencies Distributed messaging system, distributed storage, distributed computing, these alone will use at least 6 or more servers, which is not cost-effective for ordinary small companies.
This time we have two types of open source distributed tracing. One is a stand-alone version for small and medium-sized Internet companies. It can support PV. 2000w business system (such as payment system). There is also a distributed tracking system that supports distributed billions of PV. Currently, the stand-alone version of Fiery has just been opened (https://github.com/weiboad/fiery). This version is designed for use by small and medium-sized enterprises. The entire project is a Jar package that can be used out of the box. As long as there is Java8 runtime, it can be used directly. , of course the system needs to simply do a burying job. The C++ distributed version relies on many things and requires certain capabilities for operation and maintenance personnel. The stand-alone version will be released later depending on the situation. These core trading systems, which are completely open source and have sensitive data inside, are also fully available.
Currently, there are multiple methods of distributed tracking on the market, some of which are used internally by the company, and some are small-scale free and large-scale paid services. Common distributed tracing records the performance of each block through statistical methods. The methods we currently provide are not exactly the same as those on the market. We have made a lot of simplifications through continuous experiments, retaining only the functions we think are truly practical. We designed the system for distributed monitoring of key systems. Such as payment system and transaction system.
We record the specific situation, return value, specific performance and other information of each request. Through table analysis, we can quickly discover the performance of online dependent interfaces (third-party or Performance statistics of interfaces that are not embedded) and interfaces that are embedded are also independently analyzed for performance rankings. By viewing the analysis table, we can quickly find the slowest interface to request playback and analyze the reasons for slow online performance. Through practice, we have found that in many cases the slowness of the data resources that PHP relies on leads to poor performance of the PHP interface. Therefore, the focus is on relying on resources. Users can add other information according to their own needs, which can reduce a large number of useless logs and make it more frugal.
This open source Fiery mainly has three parts, PHP intrusive point library, streamlined log monitoring push module, and server. These three achieve distributed tracking for a website with a PV of less than 2000w.
The buried point library will generate a Traceid (UUID) at the entrance. This Traceid hides the IP address of the entrance server and the request time. All subsequent logs will be marked with this UUID. After log collection, all relevant logs will be stored according to this UUID. The buried point library will be responsible for receiving the Traceid sent by other requests and sending and maintaining the RPCID during runtime. The RPCID is a hierarchical counter through which we can directly restore the order and hierarchy of the calling relationship and display it to the developer. In addition, if an Exception occurs during PHP operation, it will be captured and recorded by the buried point library for the server to perform deduplication statistics. Finally, these logs will be logged to the local disk of the server. Due to some reasons, when multiple PHP processes write a file at the same time, the order may occasionally occur. We now log the logs based on the process ID plus the project name as the file name.
Fiery log capture and transmission We have implemented a simple version. This is to simplify the work of operation and maintenance personnel. There are indeed many open sources that provide similar functions. However, it needs to rely on other environments, which imposes a certain burden on operation and maintenance. We also have an experimental PHP log capture and transmission service, but it is still an experimental function. It is expected that there will be certain defects. Users can participate in debugging and improvement.
# We have done a lot of work on the server side of Fiery and have built-in Lucene and Rocksdb to index and store requests respectively. We have also done some work on memory statistics. At present, our statistical dimensions are fixed. We only count the responses of local interfaces, dependent interfaces, Mysql, and Curl. We also provide call relationship playback and error log alert deduplication statistics. Through these functions, performance faults and system abnormalities at key points on the line can be quickly discovered. Currently it is only a stand-alone version, and I can further expand it into a simpler distributed mode if needed in the future.
The above services are not only for online services. Currently, we have also used them internally to make many interesting attempts, such as accessing a set of QA testing environments and testing After completion, the Traceid generated by the faulty interface is sent directly to the development. The development can use the Traceid to find out all the call process, parameters, return conditions, and performance of this request, which can be visually viewed for easy analysis. This can also be done for unit testing before going online. Some time ago, when I posted on Weibo to promote Fiery, someone mentioned that it could be used as a honeypot to view the hacking process and specific details. Follow-up functions still need to be explored and improved by everyone. This system is designed to fill the gaps in the PHP ecosystem.
Related recommendations:
Methods for Redis to implement Session in PHP distribution
Problems related to PHP distribution and large amounts of data processing
PHP distributed internal memory sharing (Memcache)
The above is the detailed content of PHP distributed tracing experience sharing. For more information, please follow other related articles on the PHP Chinese website!