Home  >  Article  >  Operation and Maintenance  >  Du Xiaoman and Chen Cunli: 20-year-old "commander" talks about operation and maintenance, performance and growth

Du Xiaoman and Chen Cunli: 20-year-old "commander" talks about operation and maintenance, performance and growth

王林
王林forward
2023-06-09 09:56:571398browse

Du Xiaoman and Chen Cunli: 20-year-old commander talks about operation and maintenance, performance and growth

Through interviews and manuscript requests, veterans in the field of operation and maintenance are invited to provide profound insights and collide together, with a view to forming some advanced consensus and promoting the industry to move forward better.

In this issue, we invite Chen Cunli, general manager of Du Xiaoman System Operation and Maintenance Department. He has spent most of his 20-year career in the Internet field. During his time in the Baidu Operations and Maintenance Department, his team members called him "Commander Chen" due to his excellent leadership style. Today we invite "Commander Chen" to talk about his views.

This is the 5th issue of the down-to-earth and high-level "Operation and Maintenance Forum" , let’s start!

Question Preview

    You joined Baidu very early and later became independent with Du Xiaoman. We understand that there are many employees around you who have been following you for a long time. , I have experienced a lot of business operation and maintenance tests. I believe everyone is very interested in how to unite a group of people to continue in this hard-working position of operation and maintenance. I would like to hear your thoughts.
  • Many people think that engineers are worthless if they don’t write code. What do you think about this issue? Do you have any advice on how engineers who don’t write code can continue to improve themselves?
  • You have experienced the development and ups and downs of many businesses, large and small, at Baidu and Du Xiaoman. Do you think there are any differences in concepts and methods of business operation and maintenance at different stages and sizes? Are there some principled methodologies to guide decision-making?
  • Do you think there are any common practices in the operation and maintenance industry that are actually wrong? Why?
  • There are some hot technical directions at the moment, including FinOps, observability, chatGPT, etc. What do you think of the development of these technical directions? Is it a hype concept or has real value? What should operation and maintenance personnel do? What kind of response measures?
  • With the development of cloud, the traditional operation and maintenance positions that only do Ops will disappear in the long run. Do you agree with this view? Do you have any suggestions for the transformation path of such friends?
  • Many friends complained on Maimai that the company’s performance ratings were unfair. Do you have any suggestions for them? In addition, as a manager, can you share how you design the performance appraisal mechanism?
Interview Record

Q: You joined Baidu very early, and later became independent with Du Xiaoman. We understand that there are many employees around you who are actually very long-term employees. Time has been following you, and you have experienced many business operation and maintenance tests. I believe everyone is very interested in how to unite a group of people to continue in this hard-working position of operation and maintenance. I would like to hear your thoughts.

Answer: I understand that you are complimenting me, and I am deeply grateful.

In 2000, I started my career by doing computer training, and then worked in a state-owned enterprise for 3 years. In 2004, I started my Internet-related career in Beijing. Looking back on my more than 20 years of professional experience, many teams were built from scratch. Therefore, there should be more than a thousand colleagues who have worked in the operation and maintenance department. There are also 300-400 brothers who have fought several tough battles with me. In 2018, I worked in Xiaoman , once again formed the current team from scratch, and it has been till today. In fact, it is painful and sad every time to leave the original team and classmates to form a new team from scratch. But I see that many of my former colleagues are now in very good working and living conditions. Some of them have successfully challenged the limits of the industry after leaving my team. Of course, they earn more than me. I am also happy for them in my heart.

If I were to talk about the characteristics of leading a team, I would summarize three points:

  • First of all, we attach great importance to team culture. On the first day every new employee joins the company, we tell them that our team’s vision is to become “the world’s top technical support team.” The dream of the core members of the team is to “use technology to redefine service assurance and make service assurance simpler.” We don’t recruit you to fill holes. We recruit you to make changes and use technology to change the unreasonable aspects of real work. There is a little story that has a great impact on me personally, and I will share it with you today: In the morning in the north, the mother was waiting for the traffic light on the way to send her children to school. At this time, an old cleaner next to her was working hard. At this time, the mother wanted to educate her children. Said: "Look at the cleaners, grandpa, they work so hard every day. You have to study hard. If you don't study well, you will have to be a cleaner and sweep the streets when you grow up." In the same scene, another mother's words in educating her children were very touching. To me, she said: "Child, you see the cleaner grandpa works hard every day. You must study hard and invent a sweeping machine in the future so that everyone no longer has to work hard to clean the streets manually." This story touched me very much. In some positions, Work always needs people to do it. When we do it, we have to do it differently and use technology to change it so that it will not be so difficult for people in the future.
  • Secondly, we pay great attention to the cultivation of talents and train them in different ways in stages. We believe that work is done by people, and only by improving the capabilities of these people can we do different work. In 2015, I summarized a 5-7 year training mechanism for engineering talents. This mechanism divides people into three stages. The first stage is for people who have just entered the workplace. In the first two years, these people mainly experience working methods, in-depth technical abilities and successful experiences. Every item here is very important. Then they will enter the second stage. We will improve their comprehensive vision and practical ability in 2-3 years. Today's computer engineering involves too much, from network to operating system, to kernel to application and database storage, etc., a Excellent engineers should be involved in every direction when it comes to architecture design and troubleshooting. If you only look at the materials without practical experience, you will hit a wall everywhere. At this stage, we will have a plan to rotate people and gain experience in every direction. After a period of time, of course, we will also ask for their personal wishes. After job rotation and experience, we think that these people’s skills are usually not a problem, so we will enter the third stage. In the third stage, we will cooperate with them and let them choose one. In the direction that you like and are good at, let’s challenge the limits of the industry together and grow together. Of course, more people will leave at this stage, because they are more capable and can easily find challenging and favorite directions outside. Usually the rewards will be very good. I often tell them, many of you will be in the future. You have gone further than me, so don’t forget about us when the time comes. We must be positive and positive in doing things, and don’t embarrass the teams and people we have worked with.
  • Finally, we pay great attention to the diversity and collaboration of team members. Complex work usually cannot be completed by one type of work independently. We regard operation and maintenance as a kind of technical guarantee. To ensure this guarantee, we must start with operation and maintenance scenario analysis, operation and maintenance capability improvement, and operation and maintenance product innovation. Corresponding products, research and development, operation and maintenance, and operations are all essential. This is like a special operations team in the army. It must have correspondents, health workers, fire teams, sniper teams, etc. It must find the right people according to the needs of the team and ensure their collaboration efficiency. It must build trust in practice and team building to achieve be honest and open.

Q: Many people think that engineers are worthless if they don’t write code. What do you think about this issue? Do you have any advice on how engineers who don’t write code can continue to improve themselves?

Answer: This topic can refer to military management. Everyone gives me a nickname "Commander". This may be related to my frequent use of military methods as a reference in my work. In my opinion, this question is the same as whether soldiers should go to the battlefield and shoot: soldiers must know how to use basic weapons, and it is best to have regular exercise. Of course, not all soldiers can win the war by fighting with weapons. We fight for logistical supplies, we fight for the advancement of weapons, and we fight for justice. Those who do logistics, weapons research, or propaganda are all an indispensable part of the war, but no matter what position they are in, they are all Job responsibilities should be fulfilled to the extreme, and the rest should be left to the war commander. So coming back to this issue, I understand that engineers must first understand the positioning of their position in the company, and then combine it with their own positioning to try to match the two. If they do not match, it is better to change to the matching position.

Question: You have experienced the development and ups and downs of many businesses, large and small, at Baidu and Du Xiaoman. Do you think there are any differences in the concepts and methods of business operation and maintenance at different stages and sizes? difference? Are there some principled methodologies to guide decision-making?

Answer: This is a good question. The difficulties encountered by different volumes of work are completely different. The difficulties faced by maintaining 10,000 machines are completely different from the difficulties faced by maintaining 100 machines.

When maintaining 100 machines, we may not need a tool that can quickly detect machine faults and automatically repair them, because according to the industry's machine failure rate, it can be done manually, and people will think it is just right, neither It’s tiring and we have things to do; but when maintaining 10,000 machines, if we only rely on manual labor, we will be too busy to inspect each machine. Coupled with coordinating maintenance time with suppliers and business operations, we will be so busy that we forget to eat. So my advice is that if you want to have a good balance between life and work, a small company is good. If you want to improve your technical capabilities and vision, you must go to large-scale and large-scale traffic, so that you can train yourself.

Let’s talk about another topic. Businesses have different business goals at different stages of development, and the corresponding operation and maintenance concepts and methods are also very different. Many companies will do well if they can survive in the early stage. They will hope to deploy and go online quickly, because the business must compete for the market, and they can only continue to develop if they survive first, so they rarely consider long-term planning. At this time, operation and maintenance came up and told the boss that we should consider business growth in the next ten years and build infrastructure based on business growth needs. This is unrealistic. But if a business already has millions or even tens of millions of core users, then there is a high probability that the business will focus on the end-user experience. At this time, the operation and maintenance must design the entire underlying architecture and facilities around the end-user experience. All improvements User experience work will receive support from the boss. Of course, the boss will also pay attention to other issues such as the cost of input and output, whether it is sustainable (the ratio of business growth rate and resource input). It should also be noted that there are huge differences between different industries. For example, there are huge differences between finance and the Internet.

In summary, it can be summarized as: Technology serves the business. All technologies that can help business development will be supported by resources. No matter what the work is, it needs to start from the perspective of "how to make the company better." "Thinking from this perspective, only if the company is good can you be good, and only if your team is good can you be good.

Q: Do you think there are any common practices in the operation and maintenance industry that are actually wrong? Why?

Answer: I have not thought deeply about what the industry is doing wrong. Each company has its own practical problems, so it is difficult to evaluate.

However, one thing I would like to mention is that I have never limited myself to operation and maintenance work. Operation and maintenance is an area that I am good at, and it is the basis for helping the company maintain the basic connection experience of users, but I usually prefer Pay attention to what the company's business urgently needs now? What do the company’s core users need? We give priority to whatever they need, because from my perspective, when it comes to ensuring stable service, every company owes a lot of debt and needs to pay it back slowly.

Question: Some of the hot technical directions at the moment include FinOps, observability, chatGPT, etc. What do you think of the development of these technical directions? Is it a hype concept or has real value? What response measures should operation and maintenance personnel take?

Answer: I personally think these directions are very good. If you only talk about them in words, it is just speculating on concepts. Only by actually implementing them can you achieve advanced productivity. These contents have achieved good results on Baidu in the past, and may be easier to implement in a large environment because the corresponding amount of data and talent depth will be more sufficient. But if someone only has 100 machines and still talks about FinOps, it may be a hype concept, and the same applies to other things.

Q: With the development of cloud, the traditional operation and maintenance positions that only do Ops will disappear in the long run. Do you agree with this view? Do you have any suggestions for the transformation path of such friends?

Answer: Operation and maintenance positions will not disappear, and the demand will become more and more important, but you really need to think carefully about whether people will do it.

In a software project, operation and maintenance is a very critical link, but whether this link is done by people or machines depends on the development of technology, just like sweeping the streets mentioned above, as long as there is The streets are here and there are people living there. The demand for street sweeping will not disappear and is very strong, but the replacement may be unmanned machines, which have now been gradually replaced by road sweepers driven by people. We must be aware of this, and we must also be aware of another point. Operation and maintenance is an extremely complex matter. It is far more complicated than road sweeping. From the maturity process of cloud services over so many years, you can feel that this is a long process. process, I would rather suggest that this process of operation and maintenance revolutionizes its own life, led and designed by the operation and maintenance itself, and eventually we will become the owners of the "operation and maintenance" product.

Q: Many friends complain about unfair company performance ratings on Maimai. Do you have any suggestions for them? In addition, as a manager, can you share how you design the performance appraisal mechanism?

Answer: This topic is relatively sensitive, and it is also a topic that operation and maintenance students are looking forward to discussing. Therefore, the following opinions are only my personal career experience and do not represent the views of any company.

The following is my personal perception. Performance is earned by yourself. Whether your performance is good or not depends on how much outstanding performance contribution you have brought to the company and what qualitative changes you have made in your own work through your own efforts. , performance is usually ranked relatively, so it is relatively fair, and it is difficult to achieve absolute fairness.

When we talk about performance, we might as well put ourselves in the shoes of the company’s bosses. One of them makes money for the company, and the other spends money to maintain the basic user experience for the company. Only by making more money can we give Everyone gets paid, so the results are obvious.

Of course, this is also related to the different hardships that everyone suffers. Some people say that there are five kinds of hardships in life. The first is physical hardship, which emphasizes working overtime. Many traditional operation and maintenance jobs can suffer from this hardship; the second is physical hardship. The second type is the pain of thinking, which requires the thoroughness of your layout and the precision of your work; the third type is the pain of enduring loneliness, which requires a person to continuously learn a lot of knowledge silently. When others are eating, drinking and having fun, he I have spent a lot of time constantly learning new knowledge; the fourth type is the pain of dignity. In order to accompany customers, I don’t have to show my old face, and serve everyone I see as if they were my ancestors; the fifth type is for everyone to take a guess. . Don't say that you can endure any kind of hardship. Different roles have different hardships. Having a good mentality is the foundation of good health.

Finally, I wish everyone can achieve good performance through their own efforts. The above opinions are just my personal experience and do not represent any company.

Extended reading

  • The 4th Issue of the Hundred Forums on Operation and Maintenance: Shao Haiyang again: 25-year Linux veteran talks about the Eight Honors and Eight Disgraces of DevOps
  • ​​The third issue of the Operation and Maintenance Forum: Kuaimao Laiwei: How to stabilize the operation and maintenance job​​
  • The second issue of the Operation and Maintenance Forum: Homework Helper Nie An : How to transform operation and maintenance, listen to the OPaS ideas of Zuoyebang​​
  • ​​Operation and Maintenance Forum No. 1: Jingyuan: Operation and Maintenance Geometry​​

About SRETalk

This public account talks about SRE related topics in all aspects. The person in charge is Qin Xiaohui, the founder and developer of Open-Falcon and Nightingale, Geek Time "​​Operation and Maintenance Monitoring System Practical Notes​​》Author, partner of Kuaimao Nebula (the entrepreneurial direction is unified monitoring and stability guarantee, please contact me for communication if you have any needs).

The above is the detailed content of Du Xiaoman and Chen Cunli: 20-year-old "commander" talks about operation and maintenance, performance and growth. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete