Pacemaker Resource Agent的错误处理
1.前言 Pacemaker通过调用各个resource agent提供的操作(比如start,stop)实现对资源的控制,当这个方法执行出错时,Pacemaker会根据执行的操作和错误类型进行不同的错误处理。2. 错误类型
Pacemaker将错误分成3类:soft,hard和fatal,后两种属于环境或配置问题,如果没有人工干预是不可能自动修复的。一般的故障都采用OCF_ERR_GENERIC作为返回值,比如,服务进程crash,网络不通等,OCF_ERR_GENERIC属于soft类型。
B.3.How are OCF Return Codes Interpreted?
The first thing the cluster does is to check the return code against the expected result. If the result does not match the expected value, then the operation is considered to have failed and recovery action is initiated.There are three types of failure recovery:TableB.3.Types of recovery performed by the cluster
Type | Description | Action Taken by the Cluster |
---|---|---|
soft | A transient error occurred | Restart the resource or move it to a new location |
hard | A non-transient error that may be specific to the current node occurred | Move the resource elsewhere and prevent it from being retried on the current node |
fatal | A non-transient error that will be common to all cluster nodes (eg. a bad configuration was specified) | Stop the resource and prevent it from being started on any cluster node |
Assuming an action is considered to have failed, the following table outlines the different OCF return codes and the type of recovery the cluster will initiate when it is received.
B.4.OCF Return Codes
TableB.4.OCF Return Codes and their Recovery Types
RC | OCF Alias | Description | RT |
---|---|---|---|
0 | OCF_SUCCESS | Success. The command completed successfully. This is the expected result for all start, stop, promote and demote commands. | soft |
1 | OCF_ERR_GENERIC | Generic "there was a problem" error code. | soft |
2 | OCF_ERR_ARGS | The resource’s configuration is not valid on this machine. Eg. refers to a location/tool not found on the node. | hard |
3 | OCF_ERR_UNIMPLEMENTED | The requested action is not implemented. | hard |
4 | OCF_ERR_PERM | The resource agent does not have sufficient privileges to complete the task. | hard |
5 | OCF_ERR_INSTALLED | The tools required by the resource are not installed on this machine. | hard |
6 | OCF_ERR_CONFIGURED | The resource’s configuration is invalid. Eg. required parameters are missing. | fatal |
7 | OCF_NOT_RUNNING | The resource is safely stopped. The cluster will not attempt to stop a resource that returns this for any action. | N/A |
8 | OCF_RUNNING_MASTER | The resource is running in Master mode. | soft |
9 | OCF_FAILED_MASTER | The resource is in Master mode but has failed. The resource will be demoted, stopped and then started (and possibly promoted) again. | soft |
other | NA | Custom error code. | soft |
Although counterintuitive, even actions that return 0 (aka.OCF_SUCCESS) can be considered to have failed.
3. 错误处理
每个资源的操作(operation)有一个on-fail属性,用于控制如何进行出错处理。
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Pacemaker_Explained/index.html#_monitoring_resources_for_failure
Table5.3.Properties of an Operation
Field | Description |
---|---|
id | Your name for the action. Must be unique. |
name | The action to perform. Common values: monitor, start, stop |
interval | How frequently (in seconds) to perform the operation. Default value: 0, meaning never. |
timeout | How long to wait before declaring the action has failed. |
on-fail | The action to take if this action ever fails. Allowed values:* ignore - Pretend the resource did not fail* block - Don’t perform any further operations on the resource* stop - Stop the resource and do not start it elsewhere* restart - Stop the resource and start it again (possibly on a different node)* fence - STONITH the node on which the resource failed* standby - Move all resources away from the node on which the resource failedThe default for the stop operation is fence when STONITH is enabled and block otherwise. All other operations default to stop. |
enabled | If false, the operation is treated as if it does not exist. Allowed values: true, false |
但是,实际测试验证后,发现不管如何设置on-fail,效果都不会变,也就是说永远是缺省行为。
以下是让Resource Agent的各个操作返回OCF_ERR_GENERIC时资源管理器的处理:
操作 | 错误处理 | 对应的on-fail值 |
---|---|---|
start |
设置fail-count=1000000 在本节点上调用stop 在其它节点上start该资源 |
restart |
stop |
设置fail-count=1000000 阻止该资源的进一步操作,该资源成为unmanaged FAILED状态,如下 dummy(ocf::heartbeat:Dummy2):Started srdsdevapp69 (unmanaged) FAILED |
block |
monitor |
设置fail-count+=1 在本节点上依次调用stop,start,monitor。如果monitor依然出错,重复stop,start,monitor,直到fail-count达到migration-threshold后,保持资源为stop状态。 |
restart |
promote |
设置fail-count+=1 在本节点上依次调用demote,stop,start。 在其它节点上调用promote以提升其它节点上的资源为master |
restart |
demote |
设置fail-count+=1 在本节点上依次调用stop,start,demote。如果demote依然出错,重复stop,start,demote,直到fail-count达到migration-threshold后,保持资源为stop状态。 |
restart |
notify | 无视 | ignore |
注1:超时的处理与OCF_ERR_GENERIC相同
注2:Pacemaker不会对已经stop了的资源调用post stop notify。
注3:测试环境Pacemaker 1.1.7-6 ,CentOS 6.3
4.启示
上面关于错误处理的测试结果,可以给Resource Agent编写者提供几点启示:
- 1. 如非确实必要,不要让stop操作返回错误
- 2. monitor和start的判断要保持一致,即不应该出现start成功后立刻执行monitor却失败的情况,否则可能导致循环。
- 3. restart成功后执行demote不应该失败,否则可能导致循环。
- 4. migration-threshold设置为一个比较小的值(默认值是INFINITY,即100000),也可以减少上面的2和3的影响。

根据美国司法部的解释,蓝色警报旨在提供关于可能对执法人员构成直接和紧急威胁的个人的重要信息。这种警报的目的是及时通知公众,并让他们了解与这些罪犯相关的潜在危险。通过这种主动的方式,蓝色警报有助于增强社区的安全意识,促使人们采取必要的预防措施以保护自己和周围的人。这种警报系统的建立旨在提高对潜在威胁的警觉性,并加强执法机构与公众之间的沟通,以共尽管这些紧急通知对我们社会至关重要,但有时可能会对日常生活造成干扰,尤其是在午夜或重要活动时收到通知时。为了确保安全,我们建议您保持这些通知功能开启,但如果

Android中的轮询是一项关键技术,它允许应用程序定期从服务器或数据源检索和更新信息。通过实施轮询,开发人员可以确保实时数据同步并向用户提供最新的内容。它涉及定期向服务器或数据源发送请求并获取最新信息。Android提供了定时器、线程、后台服务等多种机制来高效地完成轮询。这使开发人员能够设计与远程数据源保持同步的响应式动态应用程序。本文探讨了如何在Android中实现轮询。它涵盖了实现此功能所涉及的关键注意事项和步骤。轮询定期检查更新并从服务器或源检索数据的过程在Android中称为轮询。通过

为了提升用户体验并防止数据或进度丢失,Android应用程序开发者必须避免意外退出。他们可以通过加入“再次按返回退出”功能来实现这一点,该功能要求用户在特定时间内连续按两次返回按钮才能退出应用程序。这种实现显著提升了用户参与度和满意度,确保他们不会意外丢失任何重要信息Thisguideexaminesthepracticalstepstoadd"PressBackAgaintoExit"capabilityinAndroid.Itpresentsasystematicguid

1.java复杂类如果有什么地方不懂,请看:JAVA总纲或者构造方法这里贴代码,很简单没有难度。2.smali代码我们要把java代码转为smali代码,可以参考java转smali我们还是分模块来看。2.1第一个模块——信息模块这个模块就是基本信息,说明了类名等,知道就好对分析帮助不大。2.2第二个模块——构造方法我们来一句一句解析,如果有之前解析重复的地方就不再重复了。但是会提供链接。.methodpublicconstructor(Ljava/lang/String;I)V这一句话分为.m

如何将WhatsApp聊天从Android转移到iPhone?你已经拿到了新的iPhone15,并且你正在从Android跳跃?如果是这种情况,您可能还对将WhatsApp从Android转移到iPhone感到好奇。但是,老实说,这有点棘手,因为Android和iPhone的操作系统不兼容。但不要失去希望。这不是什么不可能完成的任务。让我们在本文中讨论几种将WhatsApp从Android转移到iPhone15的方法。因此,坚持到最后以彻底学习解决方案。如何在不删除数据的情况下将WhatsApp

原因:1、安卓系统上设置了一个JAVA虚拟机来支持Java应用程序的运行,而这种虚拟机对硬件的消耗是非常大的;2、手机生产厂商对安卓系统的定制与开发,增加了安卓系统的负担,拖慢其运行速度影响其流畅性;3、应用软件太臃肿,同质化严重,在一定程度上拖慢安卓手机的运行速度。

1.启动ida端口监听1.1启动Android_server服务1.2端口转发1.3软件进入调试模式2.ida下断2.1attach附加进程2.2断三项2.3选择进程2.4打开Modules搜索artPS:小知识Android4.4版本之前系统函数在libdvm.soAndroid5.0之后系统函数在libart.so2.5打开Openmemory()函数在libart.so中搜索Openmemory函数并且跟进去。PS:小知识一般来说,系统dex都会在这个函数中进行加载,但是会出现一个问题,后

1.自动化测试自动化测试主要包括几个部分,UI功能的自动化测试、接口的自动化测试、其他专项的自动化测试。1.1UI功能自动化测试UI功能的自动化测试,也就是大家常说的自动化测试,主要是基于UI界面进行的自动化测试,通过脚本实现UI功能的点击,替代人工进行自动化测试。这个测试的优势在于对高度重复的界面特性功能测试的测试人力进行有效的释放,利用脚本的执行,实现功能的快速高效回归。但这种测试的不足之处也是显而易见的,主要包括维护成本高,易发生误判,兼容性不足等。因为是基于界面操作,界面的稳定程度便成了


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

Dreamweaver CS6
Visual web development tools

WebStorm Mac version
Useful JavaScript development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software
