Server failure instance analysis-Safety-php.cn

Home

Operation and Maintenance

Safety

Server failure instance analysis

王林

Jun 02, 2023 pm 03:12 PM

server

1. Something went wrong

Since we are in the IT industry, we need to deal with failures and problems every day, so we can be called firefighters, running around to solve problems. However, the scope of the fault this time is a bit large, and the host machine cannot be opened.

Fortunately, the monitoring system left some evidence.

Evidence found that the machine’s CPU, memory, and file handles continued to rise with the growth of business... until monitoring could not collect the information.

What’s terrible is that there are a lot of Java processes deployed on these hosts. For no other reason than to save costs, the applications were mixed. When a host exhibits overall anomalies, it can be difficult to find the culprit.

Since the remote login has expired, impatient operation and maintenance personnel can only choose to restart the machine and start restarting the application after the restart. After a long wait, all processes returned to normal operation, but after only a short period of time, the host machine suddenly crashed.

The business has been in a state of decline, which is really annoying. It also makes people anxious. After several attempts, the operation and maintenance collapsed, and the emergency plan was launched: rollback!

There were a lot of recent online records, and some developers went online and deployed privately, so the operation and maintenance was confused: rollback. Which ones? Fortunately, someone had a bright idea and remembered that there is also the find command. Then find all the recently updated jar packages and roll them back.

find /apps/deploy -mtime +3 | grep jar$

If you don’t know the find command, it’s really a disaster. Fortunately someone knows.

I rolled back more than a dozen jar packages. Fortunately, I didn’t encounter any database schema changes, and the system finally ran normally.

2. Find the reason

There is no other way, check the logs and conduct code review.

In order to ensure the quality of the code, the scope of the code review should be limited to code changes in the last 1 or 2 weeks, because some functional codes require a certain amount of time to mature before they can shine online.

Looking at the submission record "OK" that filled the screen, the technical manager's face turned green.

"xjjdog said, "80% of programmers can't write commit records", I think 100% of you can't write it."

Everyone was quiet, enduring the pain and checking the historical changes. After everyone's unremitting efforts, we finally found some problematic codes in the mountains of shit. A group created by the CxO himself, and everyone throws code that may cause problems into it.

"The system service was interrupted for nearly an hour, and the impact was very bad." The CxO said, "The problem must be completely solved. Investors are very concerned about this issue."!

okokok, with Nail With the help of nails, everyone's gestures became uniform.

3. Thread pool parameters

There are a lot of codes, and everyone has been discussing the problematic code for a long time. This sentence can be rewritten as follows: We examined some complex code using parallel streams and nested within lambda expressions, paying special attention to the use of thread pools.

In the end everyone decided to go through the thread pool code again. One of the passages says this.

RejectedExecutionHandler handler = new ThreadPoolExecutor.DiscardOldestPolicy(); ThreadPoolExecutor executor = new ThreadPoolExecutor(100,200,                 60000,                 TimeUnit.MILLISECONDS,                 new LinkedBlockingDeque(10),                 handler);

Not to mention, the parameters are decent, and even a rejection strategy is considered.

Java's thread pool makes programming very simple. These parameters cannot be reviewed without going through them one by one, as shown in the image above.

corePoolSize: The number of core threads. The core thread will survive after it is created.
maxPoolSize: The maximum number of threads
keepAliveTime: thread idle time
workQueue: blocking queue
threadFactory: thread creation factory
handler: rejection strategy

Let’s introduce their relationship below.

If the number of threads is less than the number of core threads and a new task arrives, the system will create a new thread to handle the task. If the current number of threads exceeds the number of core threads and the blocking queue is not full, the task will be placed in the blocking queue. When the number of threads is greater than the number of core threads and the blocking queue is full, new threads will be created to serve until the number of threads reaches the maximumPoolSize size. At this time, if there are new tasks, the rejection policy will be triggered.

Let’s talk about the rejection strategy. JDK has 4 built-in policies, the default of which is AbortPolicy, which directly throws an exception. Several others are introduced below.

DiscardPolicy is more radical than abort. It directly discards the task without even exception information.
Task processing is performed by the calling thread. This is How CallerRunsPolicy is implemented. When the thread pool resources of a web application are full, new tasks will be assigned to Tomcat threads for execution. In some cases, this method can reduce the execution pressure of some tasks, but in more cases, it will directly block the running of the main thread
DiscardOldestPolicy discards the front of the queue Task, and then try to execute the task again

This thread pool code is newly added, and the parameter settings are also reasonable, and there is no big problem. Using the DiscardOldestPolicy rejection policy is the only possible risk. When there are a lot of tasks, this rejection policy will cause tasks to be queued and requests to time out.

Of course we cannot let go of this risk. To be honest, it is the most likely risk code that can be found so far.

"Change DiscardOldestPolicy to the default AbortPolicy, repackage it and try it online." The technical guru said in the group.

4. What is the problem?

As a result, after the grayscale service was launched, the host died shortly after. It's the reason why it didn't run, but why?

The size of the thread pool, the minimum is 100, the maximum is 200, nothing is too much. The capacity of the blocking queue is only 10, so nothing will cause a problem. If you say it's caused by this thread pool, I won't believe you even to death.

But the business department reported that if this code is added, it will die, but if it is not added, it will be fine. The technical experts are scratching their heads and wondering about her sister.

In the end, someone finally couldn't help it anymore and downloaded the business code to debug it.

When he opened Idea, he was instantly confused and then understood instantly. He finally understood why this code caused problems.

Server failure instance analysis

The thread pool is actually created in the method!

When every request comes, it will create a thread pool until the system restarts Resources cannot be allocated.

It’s so domineering.

Everyone is paying attention to how the parameters of the thread pool are set, but no one has ever doubted the location of this code.

The above is the detailed content of Server failure instance analysis. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:亿速云. If there is any infringement, please contact admin@php.cn delete

如何在 RHEL 9 上配置 DHCP 服务器Jun 08, 2023 pm 07:02 PM

DHCP是“动态主机配置协议DynamicHostConfigurationProtocol”的首字母缩写词，它是一种网络协议，可自动为计算机网络中的客户端系统分配IP地址。它从DHCP池或在其配置中指定的IP地址范围分配客户端。虽然你可以手动为客户端系统分配静态IP，但DHCP服务器简化了这一过程，并为网络上的客户端系统动态分配IP地址。在本文中，我们将演示如何在RHEL9/RockyLinux9上安装和配置DHCP服务器。先决条件预装RHEL9或RockyLinux9具有sudo管理权限的普

在容器中怎么使用nginx搭建上传下载的文件服务器May 15, 2023 pm 11:49 PM

一、安装nginx容器为了让nginx支持文件上传，需要下载并运行带有nginx-upload-module模块的容器：sudopodmanpulldocker.io/dimka2014/nginx-upload-with-progress-modules:latestsudopodman-d--namenginx-p83:80docker.io/dimka2014/nginx-upload-with-progress-modules该容器同时带有nginx-upload-module模块和ng

服务器怎么使用Nginx部署Springboot项目May 14, 2023 pm 01:55 PM

1,将java项目打成jar包这里我用到的是maven工具这里有两个项目，打包完成后一个为demo.jar,另一个为jst.jar2.准备工具1.服务器2.域名(注：经过备案)3.xshell用于连接服务器4.winscp（注:视图工具，用于传输jar）3.将jar包传入服务器直接拖动即可3.使用xshell运行jar包注：（服务器的java环境以及maven环境，各位请自行配置，这里不做描述。）cd到jar包路径下执行：nohupjava-jardemo.jar>temp.txt&

vue3项目打包发布到服务器后访问页面显示空白怎么解决May 17, 2023 am 08:19 AM

vue3项目打包发布到服务器后访问页面显示空白1、处理vue.config.js文件中的publicPath处理如下：const{defineConfig}=require('@vue/cli-service')module.exports=defineConfig({publicPath:process.env.NODE_ENV==='production'?'./':'/&

python中怎么使用TCP实现对话客户端和服务器May 17, 2023 pm 03:40 PM

TCP客户端一个使用TCP协议实现可连续对话的客户端示例代码：importsocket#客户端配置HOST='localhost'PORT=12345#创建TCP套接字并连接服务器client_socket=socket.socket(socket.AF_INET,socket.SOCK_STREAM)client_socket.connect((HOST,PORT))whileTrue:#获取用户输入message=input("请输入要发送的消息：&

Linux怎么在两个服务器直接传文件May 14, 2023 am 09:46 AM

scp是securecopy的简写，是linux系统下基于ssh登陆进行安全的远程文件拷贝命令。scp是加密的，rcp是不加密的，scp是rcp的加强版。因为scp传输是加密的,可能会稍微影响一下速度。另外，scp还非常不占资源，不会提高多少系统负荷，在这一点上，rsync就远远不及它了。虽然rsync比scp会快一点，但当小文件众多的情况下，rsync会导致硬盘I/O非常高，而scp基本不影响系统正常使用。场景：假设我现在有两台服务器(这里的公网ip和内网ip相互传都可以，当然用内网ip相互传

如何使用psutil模块获取服务器的CPU、内存和磁盘使用率？May 07, 2023 pm 10:28 PM

psutil是一个跨平台的Python库，它允许你获取有关系统进程和系统资源使用情况的信息。它支持Windows、Linux、OSX、FreeBSD、OpenBSD和NetBSD等操作系统，并提供了一些非常有用的功能，如：获取系统CPU使用率、内存使用率、磁盘使用率等信息。获取进程列表、进程状态、进程CPU使用率、进程内存使用率、进程IO信息等。杀死进程、发送信号给进程、挂起进程、恢复进程等操作。使用psutil，可以很方便地监控系统的运行状况，诊断问题和优化性能。以下是一个简单的示例，演示如何

怎么在同一台服务器上安装多个MySQLMay 29, 2023 pm 12:10 PM

一、安装前的准备工作在进行MySQL多实例的安装前，需要进行以下准备工作：准备多个MySQL的安装包，可以从MySQL官网下载适合自己环境的版本进行下载：https://dev.mysql.com/downloads/准备多个MySQL数据目录，可以通过创建不同的目录来支持不同的MySQL实例，例如：/data/mysql1、/data/mysql2等。针对每个MySQL实例，配置一个独立的MySQL用户，该用户拥有对应的MySQL安装路径和数据目录的权限。二、基于二进制包安装多个MySQL实例

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

How Long Does It Take To Beat Split Fiction?

1 months agoByDDD

R.E.P.O. Best Graphic Settings

2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

1 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 English version

Recommended: Win version, supports code prompts!

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Zend Studio 13.0.1

Powerful PHP integrated development environment

Atom editor mac version download

The most popular open source editor

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Hot Topics

Where is the login entrance for gmail email?

7410

1631

1358

1268

1218