Home >System Tutorial >LINUX >Three essential skills for enterprise-level operations and maintenance
Introduction | Hello everyone, Beijing is very congested in the morning. Some guests may still be on the road at this time. Let me start with some suggestions. The theme I share this time is "Agile Operations Trends and Management Practices". When it comes to operation and maintenance, operation and maintenance in the era of big data faces three major problems: business, scale, and team. This makes us think of passive firefighting and taking the blame when we talk about the operation and maintenance of enterprise IT systems. How can we do our operation and maintenance work to eliminate risks before failures occur? How can we get rid of the cumbersome, inefficient and high-risk traditional human flesh operation and maintenance methods? How can we present our value and presence in the company more intuitively? These problems have become particularly urgent now that AIOps is widely mentioned. |
Cheng Yongxin, Director and Deputy General Manager of Xinju Network
So, as operation and maintenance personnel, how should we proactively seek change? Here I share an idea, which is the "three pillars of enterprise-level agile operation and maintenance", that is, visual presentation of value, automation liberation efficiency, and intelligent driving capabilities.
1. Visual presentation of valueIn the software market, there is a big difference between domestic and foreign countries. In fact, foreign software visualization is not as good as domestic ones, to be honest. Why? Because all these foreign software are called tool software, and these tool software are used by people who work, so whether these foreign tool software are open source or commercial, they are actually not that good. But it’s different in China. What the leaders of domestic companies want to see more is: You said I have a problem, okay, I want to see where the problem is. From a purely technical level, you may find that you can’t explain it clearly even if you explain it for a long time, so this At this time, visualization becomes even more important. On the one hand, it is necessary for our operators to do their own work. On the other hand, the key is how to present this value to the client and the leader, and how our work can be seen. , this is very important.
At the visualization level, our approach is to vertically connect the application layer, platform layer and device layer, horizontally access the IT equipment of the entire network, collect data from the entire network, and combine it with specific algorithms to display it.
For example, Xinju Network concretely presents the health of all devices through a self-developed scoring form to facilitate users to discover, drill down and solve problems immediately; through the self-developed APM application, it automatically buries points to achieve a perspective on the transaction process. It also analyzes and stores relevant data to achieve decomposition of the calling process and rapid location of performance problems. It truly realizes transparent visualization anytime, anywhere and across the entire network, and supports multiple screens such as APP, Web work interface, and large-screen display to present work content and value.
2. Automation liberates efficiencyIt goes without saying that automation liberates efficiency. For our operation and maintenance, if we can spend more time reading, studying, and researching, our own technology can be improved a bit. This is the best thing for us. Rather than doing repetitive work all day long.
In fact, we have been doing automation. I believe everyone who has done it will know that there are many pitfalls in this process. How can we overcome these pitfalls? The main purposes of automation are, first, to improve efficiency, second, to be safe and controllable, and third, to reduce dependence on personnel. To achieve these three points, we must understand: All standardization processes that are not solidified into the platform are useless; all operation and maintenance platform construction that is not scenario-driven is empty.
In this regard, our approach is:
The first step is to implement eight automated operation and maintenance scenarios, including automated operation and maintenance operations and automated resource activation and deployment, covering more than 90% of operation and maintenance work scenarios. Because from the perspective of operation and maintenance, if we want to make operation and maintenance more agile, it should be scenario-driven. What is my business scenario? Is this business scenario real, is it really helpful for production operation and maintenance, is it just a personal preference, is there a temporary failure, the leadership is under great pressure, or what, so it must be clarified. .
The second step is to achieve safe and efficient operation and maintenance through centralized automated operation and maintenance management, and form knowledge assets to ensure the effective transfer of knowledge.
The third step is to support API opening for automated operation and maintenance capabilities: encapsulate the automated operation and maintenance capabilities into APIs for calls by various systems and terminals. For example, operation and maintenance operations can be performed directly on mobile phones and system topologies.
The fourth step is to build an automated operation and maintenance PaaS platform: support customized development of scenarios and drive the transformation of the operation and maintenance team's capabilities. Standardization work must be solidified into the platform. Standardization that is not put into the platform is useless, and operation and maintenance without scenario-driven operation and maintenance is empty. Think about how many functions of traditional applications are not scene application functions, and how many companies have written a bunch of standards: operational standards, operation and maintenance standards, security specifications... A bunch of standards must be met. In the end, these standards are used in operations Are there any restrictions on the platform? Will it work on the operating platform? If the operating standards do not correspond to your platform at all, then it is just a decoration.
The first thing to do in intelligence is log analysis. When there are more and more types of devices, log analysis becomes more and more important. Starting from this level, we also use the self-developed IVORY big data log analysis platform, which adopts the big data underlying framework and drag-and-drop exclusive patented cutting algorithm to aggregate massive logs in real time, accurately search statistical logs, and intelligently analyze log content, just by Big data analysis and mining can be realized with simple operations.
In addition, making any operation and maintenance product without operation and maintenance experience is actually just a framework. If you want to truly achieve intelligence, it is not enough to have tools. You also need to combine actual operation and maintenance scenarios to optimize the value of data to achieve intelligence. Drive operation and maintenance capabilities, and ultimately achieve closed-loop problem solving from discovery, decision-making analysis to problem solving.
is our overall agile operation and maintenance solution framework for the whole site, with "visualization, automation, and intelligence" as the core of the agile operation and maintenance full-stack solution: through "operation and maintenance visualization", we can achieve anytime, anywhere, the entire network Transparent operation and maintenance value presentation; comprehensively improve operation and maintenance service efficiency and quality through "operation and maintenance automation"; realize intelligence-driven operation and maintenance capabilities through "operation and maintenance intelligence". In this process, we implement layered construction through operation and maintenance to solve problems at different levels at different levels. This is what I am sharing today, thank you all!
The above is the detailed content of Three essential skills for enterprise-level operations and maintenance. For more information, please follow other related articles on the PHP Chinese website!