当前位置: 首页 > news >正文

DB2 HADR+TSA运维,TSA添加资源组的命令

Tivoli System Automation(TSA)是一个高可用性集群管理软件,DB2 TSA+HADR高可用方案可以实现DB2 hadr主备的自动检测切换。本文详细介绍了TSA的常用命令,如何把CDC或者DSG添加到TSA集群中,以及TSA的错误分析方法

常用命令:
lsrpdomain/lsrpnode - 查询domain和node信息:

[db2inst1@p0-pbd-pbd-db2 ~]$ lsrpdomain
Name        OpState RSCTActiveVersion MixedVersions TSPort GSPort
hadr_domain Online  3.2.4.4           No            12347  12348
[db2inst1@p0-pbd-pbd-db2 ~]$ lsrpnode
Name           OpState RSCTVersion
p0-pbd-pbd-db2 Online  3.2.4.4
p0-pbd-pbd-db1 Online  3.2.4.4

lssam - 查询resource状态:
[db2inst1@p0-pbd-pbd-db2 ~]$ lssam
Online IBM.ResourceGroup:cdc_I2KFK38-rg Nominal=Online
        '- Online IBM.Application:cdc-I2KFK38-rs
                |- Offline IBM.Application:cdc-I2KFK38-rs:p0-pbd-pbd-db1
                '- Online IBM.Application:cdc-I2KFK38-rs:p0-pbd-pbd-db2


lsrg -Ab -V -g <resource group> - 查询resource group状态以及属性

[db2inst1@p0-pbd-pbd-db2 ~]$ lsrg -Ab -V -g cdc_I2KFK38-rg
Starting to list resource group information.
lsrg: Executed on Thu Aug 31 09:50:58 2023 at "p0-pbd-pbd-db2", master node "p0-pbd-pbd-db2".

Displaying Resource Group information:
All Attributes
For Resource Group "cdc_I2KFK38-rg".


Resource Group 1:
        Name                             = cdc_I2KFK38-rg
        MemberLocation                   = Collocated
        Priority                         = 0
        AllowedNode                      = ALL
        NominalState                     = Online
        ExcludedList                     = {}
        Subscription                     = {}
        Owner                            =
        Description                      =
        InfoLink                         =
        Requests                         = {}
        Force                            = 0
        ActivePeerDomain                 = hadr_domain
        OpState                          = Online
        TopGroup                         = cdc_I2KFK38-rg
        MoveStatus                       = [None]
        ConfigValidity                   =
        LockState                        = 0
        AutomationDetails[CompoundState] = Satisfactory
                          [DesiredState] = Online
                         [ObservedState] = Online
                          [BindingState] = Bound
                       [AutomationState] = Idle
                          [ControlState] = Startable
                           [HealthState] = Not Applicable
Completed listing resource group information.


chrg -o online(offline) <resource group> - 启停resource group同时修改Nominal State
rgreq -o start(stop) <resource group> - 启停resource group但是不修改Nominal State
rgreq -o lock(unlock) <resource group> - 锁定或解锁resource group。

锁定资源组就可以让资源组不再自动根据依赖的资源组进行启停,可以等依赖的资源组发生切换后确定Online后再解锁资源组,确保资源组正常运行,比如一台DB2 HADR上建了很多CDC实例和DSG复制软件实例,并把这些实例进程加到了TSA资源组,并依赖HADR PRIMARY,PRIMARY在哪台机器上,这些CDC和DSG进程就跑在哪台机器上,已保证追到最新的日志。
再进行高可用切换演练的时候,在shutdown HADR standby机器的之前先把CDC和DSG资源组锁上,如果不锁上的话,而原备机和primary Log GAP比较大的话,切换到新的PRIMARY起来后
CDC和DSG会找不到最新的log而报错失败。
lsrg |egrep -i "dsg|cdc" | grep -v db2inst1|awk '{print "rgreq -o lock " $1}' | sh

lsrsrc IBM.Application - 列出所有resource属性,监控的CDC/Db2脚本及timeout时间。
resetrsrc -s 'Name =="db2_db2inst1_0-rs"' IBM.Application - 重置资源状态。

lsrsrc IBM.Application :
resource 57:
        Name                  = "db2_db2inst1_p0-pbd-pbd-db2_0-rs"
        ResourceType          = 0
        AggregateResource     = "0x2028 0xffff 0xe38eb1e1 0xa0a9fe1d 0x96244eb2 0x54fb9408"
        StartCommand          = "/usr/sbin/rsct/sapolicies/db2/db2V105_start.ksh db2inst1 0"
        StopCommand           = "/usr/sbin/rsct/sapolicies/db2/db2V105_stop.ksh db2inst1 0"
        MonitorCommand        = "/usr/sbin/rsct/sapolicies/db2/db2V105_monitor.ksh db2inst1 0"

resource 58:
        Name                  = "db2_db2inst1_p0-pbd-pbd-db2_0-rs"
        ResourceType          = 1
        AggregateResource     = "0x3fff 0xffff 0x00000000 0x00000000 0x00000000 0x00000000"
        StartCommand          = "/usr/sbin/rsct/sapolicies/db2/db2V105_start.ksh db2inst1 0"
        StopCommand           = "/usr/sbin/rsct/sapolicies/db2/db2V105_stop.ksh db2inst1 0"
        MonitorCommand        = "/usr/sbin/rsct/sapolicies/db2/db2V105_monitor.ksh db2inst1 0"

如下cdc_tsa.sh脚本可以将CDC实例添加到TSA集群资源组里:如cdc_I2KFK38-rg资源组,I2KFK38就是CDC的实例名
vi cdc_tsa.sh

OsUser=cdcuser
instName=test
ResourceName=cdc_${InstName}-rs
ResourceGroupName=cdc_${InstName}-rg
dependondb2ResourceName='IBM.ResourceGroup:db2_db2inst1_db2inst1_TKYLCDC-rg'

mkrsrc IBM.Application Name="${ResourceName}" ResourceType=1 StartCommand="/usr/sbin/rsct/sapolicies/cdc/${InstName}_start.sh" StopCommand="/usr/sbin/rsct/sapolicies/cdc/${InstName}_stop.sh" MonitorCommand="/usr/sbin/rsct/sapolicies/cdc/${InstName}_jiankong.sh" MonitorCommandPeriod=10 MonitorCommandTimeout=120 StartCommandTimeout=900 StopCommmandTimeout=900 UserName="${OsUser}" RunCommandsSync=1 ProtectionMode=0 NodeNameList='{"p0-pbd-pbd-db2","p0-pbd-pbd-db1"}'

mkrg ${ResourceGroupName}

#锁定资源
rgreq -o lock ${ResourceGroupName}

#node2 offline 资源
chrg -o Offline ${ResourceGroupName}

# 绑定 资源 -> 资源组 关系
addrgmbr -g ${ResourceGroupName} IBM.Application:${ResourceName}

# 绑定 资源组 和 DB2资源组的依赖关系
mkrel -p DependsOn -S IBM.Application:${ResourceName} -G ${dependondb2ResourceName} ${ResourceName}_DependsOn_db2-rel

# 切换资源组上线
chrg -o Online ${ResourceGroupName}

# 解锁资源
rgreq -o unlock ${ResourceGroupName}


TSA问题诊断:
问题诊断日志:
1)/var/log/messages
2)/var/ct/hadr_domain/log/mc 
drwxr-x--- 2 root root   6 Jul 23 14:42 IBM.ConfigRM
drwxr-xr-x 2 root root 4096 Jul 23 14:42 IBM.GblResRM 
drwxr-xr-x 2 root root 4096 Jul 23 14:42 IBM.RecoveryRM
drwxr-xr-x 2 root root 4096 Jul 23 14:42 IBM.StorageRM
drwxr-xr-x 2 root root  78 Jul 23 14:42 IBM.TestRM

如上所示:每个resource manager daemon对应一个文件夹。TSA重点关注GblResRM和RecoveryRM。
1) IBM.GblResRM – The “eyes and hands” of the cluster.
Responsible for start, stop, monitor and cleanup of IBM.Application resources. In the context of DB2, it is responsible for managing all DB2 defined entities.

Basically passive. It invokes monitor commands for resources based on defined intervals and services IBM.RecoveryRM requests.

2) IBM.RecoveryRM – The “brain” of the cluster.
Inputs are RMC events from other resource managers (IBM.GblResRM for IBM.Application resources, IBM.ConfigRM for hosts and network adapters, etc.), commands from users, and the resource model.

Output is commands issued to other resource managers to start/stop/cleanup resources.

Structured as a rule engine that determines how to respond to incoming events, and an optimizer component (called a “binder”) to determine resource placement if resources need to move between hosts.

使用rpttr -o dtic trace.29.sp format各个Resource Manager的trace文件。

http://www.lryc.cn/news/148404.html

相关文章:

  • LeetCode-135-分发糖果
  • Viva Workplace Analytics Employee Feedback SU Viva Glint部署方案
  • ASIC-WORLD Verilog(14)系统任务
  • 两台电脑共享文件设置
  • 《C和指针》笔记17:sizeof
  • 说说大表关联小表
  • Unity 之 方括号[ ] 的用法以及作用
  • 微服务nacos或者yml配置内容部分加密jasypt
  • Vue:插槽,与自定义事件
  • Window11-Ubuntu双系统安装
  • 【React】React学习:从初级到高级(一)
  • Flutter 安装教程 + 运行教程
  • 正中优配:A股早盘三大股指微涨 华为概念表现活跃
  • SAP MM学习笔记26- SAP中 振替转记(转移过账)和 在库转送(库存转储)4- Plant间在库转送 之 在库转送Order(有出荷)
  • suricata规则字段解析
  • 韶音骨传导耳机好不好,韶音骨传导耳机值得入手吗
  • 【LeetCode】208.实现Trie(前缀树)
  • 多线程笔记: volatile、synchronized、Monitor等
  • shell语法--数组相关
  • AI:05 - 基于深度学习的道路交通信号灯的检测与识别
  • The Sandbox 即将参加韩国区块链周,并带来一系列独家周边活动!
  • Mysql高阶语句 (一)
  • win10 ping不通 Docker ip(解决截图)
  • 讲讲几道关于 TCP/UDP 通信的面试题
  • golang 连接 oracle 数据库 增删改查
  • Unity——音频管理器(附例子)
  • TCP协议基础
  • C# NetTopologySuite+ProjNet 任意图形类型坐标转换
  • Windows笔记本电脑开机黑屏
  • Samb共享用户的设置和修改Linux用户的id号,修改Linux组的id号,加入组,删除组成员等