一键管理 StarRocks:简化集群的启动、停止与状态查看
一键管理 StarRocks:简化集群的启动、停止与状态查看
在日常运维中, StarRocks 集群的启动、停止与状态查看 一直是一件颇为繁琐的事情.
想象一下,如果双十一前夕,StarRocks 集群在做例行压测。凌晨 3 点,监控告警:CPU 飙高,查询超时。值班同学第一反应是「重启试试」。
- 5 个 FE、8 个 BE,分布在 8 台服务器;
录到每台服务器,分别执行:
/fe/bin/start_fe.sh --daemon
/be/bin/start_be.sh --daemon
- 逐台 SSH,输入密码,执行 stop → start;
- 第 4 台机器输错路径,进程没起来,导致元数据不一致;
- 回滚、定位、修复,一共要花多少时间 。 更不要说跟多的 服务器。
这种逐台操作的方式,不仅耗时,还容易出错。 如果能一键启动、停止集群,并随时查看节点状态,效率将大大提升。
今天我来分享一个 一键管理 StarRocks 的linux 的 shell脚本 ,实现以下功能:
一键启动 / 停止集群
统一查看 FE / BE 节点状态
动态配置节点,方便扩展
优点 代码归一性
文章目录
- 一键管理 StarRocks:简化集群的启动、停止与状态查看
- 代码
- 依赖关系
- 启动 Start the Cluster:
- 停止 Stop the Cluster:
- 检查 Check the Cluster Status:
- 总结
具体代码已经 更新到 starrocks issue #615415 ,但是能不能被采纳就不知道,希望能采纳。
🙏 🙏 🙏 🙏 🙏
代码
#!/bin/bash# =================================================================
# StarRocks Cluster Management Script
#
# Author: tomxjc305
# Date: 2025-08-02
# Description: This script automates the start, stop, and status
# check operations for a StarRocks cluster with dynamic configuration.
# =================================================================# --- Configuration Section ---# Define the FE and BE nodes dynamically in the environment variables or a configuration file.
# Example:
# export STARROCKS_FE_NODES="192.168.5.128 "
# export STARROCKS_BE_NODES="192.168.5.128 192.168.5.129 192.168.5.130"
# These can be configured by exporting environment variables in your shell or you can modify this section
# for more dynamic reading, such as from a configuration file.# --- Configuration from environment variables ---
STARROCKS_FE_NODES=("192.168.5.128") # List of Frontend (FE) nodes
STARROCKS_BE_NODES=("192.168.5.130" "192.168.5.129" "192.168.5.128") # List of Backend (BE) nodes# MySQL(StarRocks) connection settings for the FE node
MASTER_FE_HOST="192.168.5.128"
MYSQL_QUERY_PORT="9030"# Installation path of StarRocks on all nodes
STARROCKS_FE_DIR="/data/starrocks/fe"
STARROCKS_BE_DIR="/data/starrocks/be"# MySQL(StarRocks) client connection info
MYSQL_USER="root"
MYSQL_PASSWORD="123456"# --- Main Logic ---# Get the operation parameter (start, stop, status) from the user
ACTION=$1# Check if the user has provided an operation parameter
if [[ -z "$ACTION" ]]; thenecho "Error: Operation not specified."echo "Usage: $0 {start|stop|status}"exit 1
fi# Function: Start component
# Parameter 1: Host IP
# Parameter 2: Component type (fe/be)
# Parameter 3: Host password
start_component() {local host=$1local role=$2local password=$3local cmd=""echo "--- [Start] Connecting to $host ($role)... ---"if [[ "$role" == "fe" ]]; thencmd="cd $STARROCKS_FE_DIR && ./bin/start_fe.sh --daemon"elif [[ "$role" == "be" ]]; thencmd="cd $STARROCKS_BE_DIR && ./bin/start_be.sh --daemon"elseecho "Warning: Unknown component type '$role' found on $host. Skipping."returnfiecho "Executing command: $cmd"sshpass -p "$password" ssh -o StrictHostKeyChecking=no root@"$host" "$cmd"if [[ $? -eq 0 ]]; thenecho "Successfully started $role on $host."elseecho "Error: Failed to start $role on $host."fi
}# Function: Stop component
# Parameter 1: Host IP
# Parameter 2: Component type (fe/be)
# Parameter 3: Host password
stop_component() {local host=$1local role=$2local password=$3local cmd=""echo "--- [Stop] Connecting to $host ($role)... ---"if [[ "$role" == "fe" ]]; thencmd="cd $STARROCKS_FE_DIR && ./bin/stop_fe.sh"elif [[ "$role" == "be" ]]; thencmd="cd $STARROCKS_BE_DIR && ./bin/stop_be.sh"elseecho "Warning: Unknown component type '$role' found on $host. Skipping."returnfiecho "Executing command: $cmd"sshpass -p "$password" ssh -o StrictHostKeyChecking=no root@"$host" "$cmd"if [[ $? -eq 0 ]]; thenecho "Successfully stopped $role on $host."elseecho "Error: Failed to stop $role on $host."fi
}# Function: Check cluster status
check_cluster_status() {echo "--- [Status Check] Connecting to master FE node ($MASTER_FE_HOST) to query cluster status... ---"echo ""echo ">>> Querying all Frontend nodes:"# Build MySQL command to query FE statuslocal fe_status_cmd="mysql -u $MYSQL_USER -p$MYSQL_PASSWORD -h $MASTER_FE_HOST -P $MYSQL_QUERY_PORT -e 'SHOW FRONTENDS;'"eval $fe_status_cmdif [[ $? -ne 0 ]]; thenecho "Error: Failed to query FE status. Please check if the master FE node is running or if the network is reachable."fiecho ""echo ">>> Querying all Backend nodes:"# Build MySQL command to query BE statuslocal be_status_cmd="mysql -u $MYSQL_USER -p$MYSQL_PASSWORD -h $MASTER_FE_HOST -P $MYSQL_QUERY_PORT -e 'SHOW BACKENDS;'"eval $be_status_cmdif [[ $? -ne 0 ]]; thenecho "Error: Failed to query BE status."fiecho ""
}# --- Main Execution Flow ---case "$ACTION" instart)echo "========== Starting StarRocks Cluster =========="for host in "${STARROCKS_FE_NODES[@]}"; dostart_component "$host" "fe" "$MYSQL_PASSWORD"donefor host in "${STARROCKS_BE_NODES[@]}"; dostart_component "$host" "be" "$MYSQL_PASSWORD"doneecho "========== Cluster start operation completed ==========";;stop)echo "========== Stopping StarRocks Cluster =========="for host in "${STARROCKS_BE_NODES[@]}"; dostop_component "$host" "be" "$MYSQL_PASSWORD"donefor host in "${STARROCKS_FE_NODES[@]}"; dostop_component "$host" "fe" "$MYSQL_PASSWORD"doneecho "========== Cluster stop operation completed ==========";;status)echo "========== Checking StarRocks Cluster Status =========="check_cluster_statusecho "========== Cluster status check completed ==========";;*)echo "Error: Invalid operation '$ACTION'."echo "Usage: $0 {start|stop|status}"exit 1;;
esacexit 0
依赖关系
下载sshpass 并编译
wget https://sourceforge.net/projects/sshpass/files/sshpass/1.09/sshpass-1.09.tar.gz
tar -xzf sshpass-1.09.tar.gz
cd sshpass-1.09
./configure && make && sudo make install
执行步骤如下:
启动 Start the Cluster:
./starrocks_cluster_manager.sh start
停止 Stop the Cluster:
./starrocks_cluster_manager.sh stop
检查 Check the Cluster Status:
./starrocks_cluster_manager.sh status
总结
自我感觉还是不够通用化,应该有很多提升的空间。但目前我在维护中就可以一建启动了。
有了这个脚本, StarRocks 集群的日常运维可以从繁琐的手动登录,变成一条命令搞定 ,大幅提升了运维效率。如果你有更大规模的集群,进一步提速。