Home >Operation and Maintenance >Linux Operation and Maintenance >I thought I was familiar with Linux, but I never expected that it would turn upside down in the production environment...
Having been engaged in operation and maintenance for many years, I have encountered various problems, such as data loss, website malfunction, accidental deletion of database files, hacker attacks, etc. type of problem. I have also met many friends who thought they were very familiar with the Linux system. When they saw problems, they never panicked and were full of confidence. However, the production environment overturned (almost being fired) famous scene, countless. . .
So, today I will simply sort out some good Linux operating habits and share them with you. Let us operate safely and never roll over! !
The use of Linux when I first learned it , from basics to services to clusters, are all done on virtual machines. Although the teacher told us that there is no difference from real machines, our desire for real environments is increasing day by day. However, the various snapshots of virtual machines have allowed us to develop various skills. I have such a bad habit that when I got the permission to operate the server, I couldn't wait to try it. I remember on the first day at work, the boss gave me the root password. Since I could only use putty, I wanted to use xshell, so I quietly logged in to the server and tried to change it to xshell key login. Because there was no test and no ssh connection was left. After restarting the sshd server, I was blocked from the server. Fortunately, I backed up the sshd_config file at the time and later asked the computer room staff to Just go through CP. Fortunately, this is a small company, otherwise I would have been killed directly... I'm glad I had better luck back then.
The second example is about file synchronization. Everyone knows that rsync synchronizes quickly, but its speed of deleting files is much faster than rm -rf. There is a command in rsync that is based on a certain directory. When synchronizing a file (if the first directory is empty, the result can be imagined), the source directory (with data) will be deleted. At first, I wrote the directory backwards due to misoperation and lack of testing. The key is that there is no backup... The production environment data has been deleted and there is no backup. You can think about the consequences yourself. Its importance is self-evident.
Regarding the error rm -rf / var, I believe that for people with fast hands, or when the Internet speed is relatively slow, the probability of it occurring is quite high. When you find that the execution is completed After that, your heart felt at least half cold. You may say that I have pressed it so many times without any error, so don’t be afraid. I just want to say that you will understand when it happens once. Don’t think that those operation and maintenance accidents are all caused by others. If you don’t pay attention, the next one will happen. That's you.
The operation and maintenance management of the last company I worked for was quite chaotic. To give you the most typical example, the operation and maintenance personnel who had resigned several times had the server root password. Usually when we receive a task in operation and maintenance, we will conduct a simple check and if it cannot be solved, we will ask others for help. But when the problem is overwhelming, the customer service supervisor (who knows some Linux), the network administrator, and your boss will debug a server together. After various comparisons, I found that your server configuration file was different from the last time you modified it. Then you changed it back, and then you Googled it again. You found the problem and solved it, but others told you that they also solved it. What are modified are different parameters... This, I really don’t know which one is the real cause of the problem. Of course, this is still good. The problem is solved and everyone is happy. But you have encountered the file you just modified and the test is invalid. What happens when you go to modify and find that the file has been modified again? It’s really annoying and should not be done by multiple people.
Develop a habit. When you want to modify data, back up first, such as the .conf configuration file. In addition, when modifying the configuration file, it is recommended to comment the original options, then copy and modify. Furthermore, if there is a database backup in the first example, then the misoperation of rsync will be fine. So losing the database does not happen overnight, just casually It doesn't have to be so miserable if you have a backup.
There are many examples on the Internet, various rm -rf /, various deletions of the main database, various A kind of operation and maintenance accident... A small mistake will cause a lot of losses. If you really need to delete it, be cautious.
Originally there are all kinds of backups above, but I want to divide it into the data category to emphasize again that backup is very important. I remember my teacher said this When it comes to data, no amount of caution can be exaggerated. The company I work for has a third-party payment website and an online loan platform. The third-party payment is fully backed up every two hours, and the online loan platform is backed up every 20 minutes. Not much to say, let’s make up our own minds
In fact, not only data, but also the entire server environment, stability is more important than everything else. We don’t seek the fastest, but the most stable. We seek usability, so do not use new software on the server without testing, such as nginx php-fpm. In the production environment, php hangs in various ways, just restart it, or just change apache.
Nowadays, there are all kinds of pornographic photos everywhere, and all kinds of router backdoors. Therefore, when it comes to data, it is impossible not to keep it confidential. In addition, when searching for the public account Linux, this is how you should learn to reply "Linux" in the background to get a surprise gift package.
Change the default port (of course, if the professional wants to hack you, it will come out after scanning). Prohibit root login and use ordinary user key authentication sudo. Rule IP address users are restricted from using explosion-proof cracking software similar to hostdeny (more than a few attempts will directly block users). Screen users who log in in /etc/passwd
The firewall must be turned on in the production environment. And follow the minimum principle, drop everything, and then release the required service ports.
Do not use root for services that can be started by ordinary users, control the permissions of various services to the minimum, and control the fine-grained granularity.
Use third-party software to detect changes in key system files and various service configuration files at all times, such as /etc/passwd, /etc/my.cnf, / etc/httpd/con/httpd.con, etc.; use a centralized log monitoring system to monitor /var/log/secure, /etc/log/message, ftp upload and download files and other alarm error logs; in addition, for port scanning, you can also Using some third-party software, if it is found to be scanned, it will be directly pulled into host.deny. This information is very helpful for troubleshooting after the system is compromised. Someone has said that the cost a company invests in security is directly proportional to the cost it loses from security attacks. Security is a big topic and a very basic job. If the basics are done well, system security can be significantly improved. , the rest is done by security experts
Many people start from Starting with monitoring, large companies generally have professional 24-hour monitoring and operation. System operation monitoring generally includes hardware occupancy, memory, hard disk, CPU, network card, os including login monitoring and key system file monitoring. Regular monitoring can predict the probability of hardware damage and bring very practical functions to tuning. .
Service operation monitoringService monitoring generally refers to various applications, web, db, lvs, etc. This generally monitors some indicators and can be quickly discovered when a performance bottleneck occurs in the system. and solved. Log MonitoringThe log monitoring here is similar to the security log monitoring, but it is generally the error and alarm information monitoring of hardware, os, and applications. When the system is running stably, it does not matter. It's useless, but once a problem occurs and you don't monitor it, you will be very passive.In fact, based on more than a year of operation and maintenance experience, talking about tuning is basically just talk on paper, but I just I want to briefly summarize it and I will update it if I have a deeper understanding. Before optimizing the software, for example, you need to have an in-depth understanding of the operating mechanism of a software, such as nginx and apache. Everyone says nginx is fast, so you must know why nginx is fast, what principles it uses, how to process requests better than apache, and you must be able to compete with others. Put it in plain and easy-to-understand terms, and you must be able to understand the source code when necessary, otherwise all documents that use parameters as tuning objects are nonsense.
Once you are familiar with the underlying operating mechanism, you must have a tuning framework and sequence. For example, if there is a bottleneck in the database, many people will directly change the configuration file of the database. My suggestion is to first analyze the bottleneck, check the logs, write down the tuning direction, and then start. Tuning the database server should be the last step. The first thing should be the hardware and operating system. Today's database servers are all It will be released for all operating systems after various tests, so you should not start with it first.
牛逼啊!接私活必备的 N 个开源项目!赶快收藏
Only adjust one parameter at a time. Everyone knows this. If you adjust too much, you will become confused.
To determine whether tuning is useful, and to test the stability and performance of a new version of software, benchmark testing is necessary. The test involves many factors and tests whether it is close to the business The actual demand depends on the experience of the tester. For relevant information, you can refer to the third edition of "High Performance MySQL" which is quite good. My teacher once said that there are no one-size-fits-all parameters. Any parameter changes or tuning must conform to the business scenario. So don’t Google any more tuning, it will have no long-term effect on your improvement and the improvement of the business environment.
Many rm -rf /data are at the peak of irritability in the first few minutes after get off work, so Aren't you going to control your mentality? Some people have said that you have to go to work even if you are irritable, but you can try to avoid processing critical data when you are irritable. The more stressful the environment is, the more calm you must be, otherwise you will lose more. Most people have the experience of rm -rf /data/mysql. You can imagine how you feel after deleting it. But if there is no backup, what's the use of being anxious. Generally, in this case, you have to calm down and think about it. Prepare for the worst. For mysql, if you delete the physical files, some tables will still exist in the memory, so disconnect the business, but do not close the mysql database. This is very helpful for recovery, and use dd to copy the hard disk, and then you can For recovery, of course, most of the time you can only find a data recovery company. Imagine that the data has been deleted. If you perform various operations, close the database, and then repair it, not only may the file be overwritten, but the table in the memory may not be found.
The production environment is not child's play, and the database is not child's play either. You must be responsible for the data. The consequences of not backing up are very serious.
Many operation and maintenance personnel are busy and will not take care of the problem when it is solved. I remember that last year, a customer's website could not be opened. I found out through the PHP code error report The session and whos_online were damaged. The previous operation and maintenance repaired it through repair, so I repaired it in this way. However, after a few hours, it happened again three or four times, so I went to Google to find out the reasons for the inexplicable damage to the database table: First, The bugs of myisam are: the second is mysqlbug, and the third is that mysql was killed during the writing process. Finally, it was found that the memory was not enough, which caused OOM to kill the mysqld process and there was no swap partition. The background monitoring memory was sufficient, and finally the physical memory was upgraded to solve the problem. .
Be sure to check the machine you are on before important operations, and try to avoid opening too many windows.
The above is the detailed content of I thought I was familiar with Linux, but I never expected that it would turn upside down in the production environment.... For more information, please follow other related articles on the PHP Chinese website!