search
HomeOperation and MaintenanceSafetyHaving worked in operation and maintenance for more than ten years, there have been countless moments when I felt like I was still a novice...

Once upon a time, when I was a fresh graduate majoring in computer science, I browsed many job postings on recruitment websites. I was confused by the dazzling technical positions: R&D engineer, operation and maintenance engineer, and test engineer. ..‍

During college, my professional courses were so-so, not to mention having any technical vision, and I had no clear ideas about which technical direction to pursue.

Until a senior said to me: "Do operation and maintenance. You don't have to write code every day to do operation and maintenance. You just need to be able to play Liunx! It's much easier than doing development!"

‍‍‍‍‍‍‍‍I chose I believe...

I have been in the industry for more than ten years. I have suffered a lot, took a lot of blame, killed servers, and experienced department layoffs. If someone tells me now that I want to do operations and maintenance than development Simple, then I will block him without hesitation...‍‍‍‍‍‍‍

Basic operation and maintenance work is very simple, but the biggest feature of operation and maintenance work is that it is complicated

In my opinion, operation and maintenance work may be one of the most complex technical jobs, requiring handling a large number of technical details, integration and configuration of different platforms, and solving various complex problems and failures. Therefore, operation and maintenance personnel are required to have a wide range of skills and knowledge to cope with changing technical and business needs:

Operation and maintenance often face complex platform operation and maintenance work. The reason is that what enterprises usually need to manage and monitor is not a single platform and system, but much more complex. These systems may come from different vendors and use different protocols and technologies, including servers, storage, networks, applications, etc.

Complex configuration management is also one of the difficulties in operation and maintenance work. Configuration management involves a large number of tasks such as system installation, configuration updates, software installation and updates, etc. These tasks need to be coordinated and executed throughout the system.

The management of large-scale clusters is also not simple. Large enterprises need to manage thousands of servers, which requires powerful tools and automation technology. Operations staff need automated tools to manage configuration, updates, monitoring and reporting.

Operation and maintenance security issues cannot be ignored either. Operations and maintenance personnel need to protect the company's assets and data and ensure the security of the system. This may include firewalls, intrusion detection systems, security patch management, etc.

Operation and maintenance also require rich troubleshooting experience. Faults are common problems in operation and maintenance work. When a problem occurs in the system, operation and maintenance personnel need to quickly locate the fault and take measures to restore services.

Continuous learning is the most basic requirement for operation and maintenance personnel. The rapid evolution of operation and maintenance tools and technologies is exaggerated. IT technology is constantly developing, new technologies and tools are constantly emerging, and operation and maintenance personnel need to constantly learn and update knowledge to keep up with the rapid evolution of technology.

Operation and maintenance is a high-risk profession. The life of operation and maintenance who has never killed a server is not perfect?

If we talk about high-risk occupations, operation and maintenance can definitely be counted as one. Even in many large companies, downtime accidents caused by manual operations of operation and maintenance often occur:

Pacific Petroleum Company cyber attack ( 2021): In May 2021, the U.S. Pacific Oil Company suffered a ransomware attack, causing the company's network and servers to malfunction and shut down. According to reports, the incident was caused by an employee accidentally opening a malicious link.

GitLab Outage (2017): In January 2017, code hosting service provider GitLab experienced a serious data loss incident, resulting in many customers' data being permanently deleted. According to a later official statement from GitLab, this was caused by an employee accidentally deleting a file in a production database.

Walmart Server Outage (2019): In November 2019, the servers of the American retail giant Walmart went down multiple times within an hour, causing the company’s website, applications, and payment systems to not work properly. The incident was reportedly caused by an error made by an employee while performing routine server maintenance.

Microsoft Azure cloud service outage (2020): In September 2020, Microsoft's Azure cloud service experienced a global outage, causing many customers' applications and services to not work properly. It was later confirmed that the incident was due to a network configuration error.

Operation and maintenance may also face various force majeure, even natural disasters

Philippine Typhoon (2013): In November 2013, the Philippines encountered a severe Typhoon, the strongest typhoon to hit the Philippines since 1947. The typhoon left more than 6,000 people dead and missing and wreaked havoc on the country's infrastructure. The disaster also caused the outage of data centers and servers in the Philippines for many international businesses.

U.S. Hurricane (2012): In October 2012, the East Coast of the United States encountered a severe hurricane. The disaster caused large-scale power outages, communication interruptions, and flooding. The disaster also caused data center and server outages for some well-known companies and service providers, including Amazon, Google, and Netflix.

The career development direction is unclear, and operation and maintenance work often falls into confusion in the workplace...‍‍‍‍‍

Lack of hard skills may be the biggest problem faced by operation and maintenance people. As technology continues to advance, operation and maintenance work requires continuous learning of new skills and tools to adapt to changing market demands. However, for some people who have been working in operations and maintenance for many years, they may find that their skills have fallen behind the market demand, which can make them feel confused and overwhelmed.

The poor environment is really not caused by operation and maintenance. Compared with other technical fields, the career development path in the operation and maintenance field is relatively vague. In some organizations, operation and maintenance engineers are often regarded only as the "logistics department" and lack equal status and treatment with other technical teams. For example, they cannot receive due recognition and rewards. This aggravates the negative emotions of operation and maintenance, which to a certain extent causes operation and maintenance engineers to be unclear about their career development prospects.

I just walk with my head down and have no time to look up at the sky. The essence of operation and maintenance work is to ensure the stability and reliability of the system, so operation and maintenance engineers must maintain a high degree of vigilance and concentration at all times. This can lead to a very stressful job for them, especially when faced with system failures or emergencies. Tired of dealing with the hustle and bustle of life, I have no time to think about the future of career development.

So we often think about how to develop our operation and maintenance career better? ‍‍

The book "Vision" written by Brian Featherstone Howe describes the general development law of career. The principles mentioned in it may give us the answer:

Having worked in operation and maintenance for more than ten years, there have been countless moments when I felt like I was still a novice...

Have a mindset of the next 45 years. If you plan for a longer time span, such as 45 years, you will not care about the gains and losses of one city or one pool at the moment. And if you have a clear career plan, it is easier to overcome difficulties and persevere.

What we have to do is to clarify the development path of operation and maintenance technology, so as to achieve the ultimate in a segmented technology field

Transformation to DevOps: I don’t know when, a trend began to become popular in the technology circle The so-called "DevOps is dead" argument. However, DevOps is by no means simply asking development to do operation and maintenance, leaving operation and maintenance with nowhere to go.

Operation and maintenance work is already difficult, so stop creating panic for us.

The necessary components of real DevOps should be an internal DevOps platform and a dedicated team to maintain the internal platform, rather than a bunch of scattered open source tools that programmers need to handle themselves, or let developers do operations and maintenance. live. A true DevOps team should closely unite development and operation and maintenance, share responsibilities, and collaboratively improve IT performance to empower the business.

The transformation from operation and maintenance to DevOps requires operation and maintenance personnel to master some key tools and technologies, such as continuous integration, continuous delivery, automated testing, containerization, etc. At the same time, the DevOps team should introduce agile development, iterative development and continuous development. Delivery and other methods. In an enterprise that has established a complete DevOps culture, operation and maintenance transformation to DevOps work is a very good development path.

Transformation to AIOps: Similarly, AIOps has always been a good career development path for operation and maintenance. AIOps can help IT operations and maintenance personnel automate some routine, tedious, and low-value operations, such as log analysis, troubleshooting, etc., thus freeing up more time and energy to solve more complex problems.

At the same time, operation and maintenance work involves many aspects, including infrastructure management, application deployment, monitoring, troubleshooting, etc. These tasks require the professional knowledge and experience of human operation and maintenance personnel.

AIOps technology can improve the efficiency and accuracy of IT operations, but it will not completely replace the work of human operations personnel. Instead, they work together to make the entire IT operations team more efficient and productive.

Transformation to SRE: Continuously learn software development skills, master automation tools, testing, deployment and monitoring practices in DevOps. To learn cloud computing and container technology, SREs need to understand cloud computing platforms and container technologies, and master basic cloud services and container management tools, such as AWS, Docker, Kubernetes, etc. Master data analysis skills while building an SRE culture within the organization, such as core concepts such as reliability, automation, and a culture of experimentation. ​

The above is the detailed content of Having worked in operation and maintenance for more than ten years, there have been countless moments when I felt like I was still a novice.... For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Spring Boot Actuator端点大揭秘:轻松监控你的应用程序Spring Boot Actuator端点大揭秘:轻松监控你的应用程序Jun 09, 2023 pm 10:56 PM

一、SpringBootActuator端点简介1.1什么是Actuator端点SpringBootActuator是一个用于监控和管理SpringBoot应用程序的子项目。它提供了一系列内置的端点(Endpoints),这些端点可以用于查看应用程序的状态、运行情况和运行指标。Actuator端点可以以HTTP、JMX或其他形式暴露给外部系统,便于运维人员对应用程序进行监控、诊断和管理。1.2端点的作用和功能Actuator端点主要用于实现以下功能:提供应用程序的健康检查,包括数据库连接、缓存、

运维工作十多年,无数个瞬间、我觉得自己还是个小白...运维工作十多年,无数个瞬间、我觉得自己还是个小白...Jun 09, 2023 pm 09:53 PM

​曾几何时,当我还是一名初出茅庐的计算机专业应届生的时候,在招聘网站上浏览了很多招聘贴,眼花缭乱的技术岗位让我摸不着头脑:研发工程师、运维工程师、测试工程师...‍大学期间专业课马马虎虎,更谈不上有什么技术视野,对于具体从事那个技术方向并没有什么明确的想法。直到一位学长对我说:“做运维吧,做运维不用天天写代码,会玩Liunx就行!比做开发轻松多了!”‍‍‍‍‍‍‍‍我选择了相信......入行十多年,吃过很多苦,背了很多锅,弄死过服务器,经历过部门裁员,如果有人现在跟我说做运维比开发简单,那我会

Spring Cloud微服务架构部署与运维Spring Cloud微服务架构部署与运维Jun 23, 2023 am 08:19 AM

随着互联网的快速发展,企业级应用的复杂度日益增加。针对这种情况,微服务架构应运而生。它以模块化、独立部署、可扩展性高等特点,成为当今企业级应用开发的首选。作为一种优秀的微服务架构,SpringCloud在实际应用中展现出了极大的优势。本文将介绍SpringCloud微服务架构的部署与运维。一、部署SpringCloud微服务架构SpringCloud

运维要不要学golang吗运维要不要学golang吗Jul 17, 2023 pm 01:27 PM

运维不要学golang,其原因是:1、golang主要被用于开发高性能和并发性能要求较高的应用程序;2、运维工程师通常使用的工具和脚本语言已经能够满足大部分的管理和维护需求;3、学习golang需要一定的编程基础和经验;4、运维工程师的主要目标是确保系统的稳定和高可用性,而不是开发应用程序。

PG数据库运维工具要覆盖哪些能力PG数据库运维工具要覆盖哪些能力Jun 08, 2023 pm 06:56 PM

​过节前我和PG中国社区合作搞了一个关于如何使用D-SMART来运维PG数据库的线上直播,正好我的一个金融行业的客户听了我的介绍,打电话过来聊了聊。他们正在做数据库信创的选型,也试用了多个国产数据库,最后他们准备选择TDSQL。当时我觉得有点意外,他们从2020年就开始在做国产数据库选型,不过好像最初使用TDSQL后的感受并不太好。后来经过沟通才了解到,他们刚开始使用TDSQL的分布式数据库,发现对研发要求太高,所以后来就全部选择TDSQL的集中式MYSQL实例,用下来发现挺好用的。整个数据库云

途游邹轶:中小公司的运维怎么做?途游邹轶:中小公司的运维怎么做?Jun 09, 2023 pm 01:56 PM

通过采访和约稿的方式,请运维领域老炮输出深刻洞见,共同碰撞,以期形成一些先进的共识,推动行业更好得前进。这一期我们邀请到的是邹轶,途游游戏运维总监,邹总经常戏称自己是世界500万强企业的运维代表,可见内心中是觉得中小公司的运维建设思路和大型企业是有差别的,今天我们带着几个问题,来请邹总分享一下他的中小公司研运一体化之路。这里是接地气、有高度的《​​​运维百家讲坛​​》第6期,开讲!问题预览途游是游戏公司,您觉得游戏运维有哪些独特性?面临的最大运维挑战是什么?您又是如何解决这些挑战的?游戏运维的人

什么是可观测性?初学者需要知道的一切什么是可观测性?初学者需要知道的一切Jun 08, 2023 pm 02:42 PM

可观测性一词来源于工程领域,近年来在软件开发领域也日益流行。简而言之,可观测性是指根据外部输出以了解系统内部状态的能力。IBM对可观测性的定义为:通常,可观测性是指基于对复杂系统外部输出的了解就能够了解其内部状态或状况的程度。系统越可观测,定位性能问题根本原因的过程就能越快速且准确,而无需进行额外的测试或编码。在云计算中,可观测性还指对分布式应用系统及支撑其运行的基础设施的数据进行聚合、关联和分析的软件工具和实践,以便对应用系统进行更有效地监控、故障排除和调试,从而实现客户体验优化、服务水平协议

Uber实践:运维大型分布式系统的一些心得Uber实践:运维大型分布式系统的一些心得Jun 09, 2023 pm 04:53 PM

本文是Uber的工程师GergelyOrosz的文章,原文地址在:https://blog.pragmaticengineer.com/operating-a-high-scale-distributed-system/在过去的几年里,我一直在构建和运营一个大型分布式系统:优步的支付系统。在此期间,我学到了很多关于分布式架构概念的知识,并亲眼目睹了高负载和高可用性系统运行的挑战(一个系统远远不是开发完了就完了,线上运行的挑战实际更大)。构建系统本身是一项有趣的工作。规划系统如何处理10x/100

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Tools

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.