当前位置: 首页 > news >正文

【博客627】gobgp服务无损变更:graceful restart特性

gobgp服务无损变更:graceful restart特性

场景

当我们的bgp网关在对外宣告bgp路由的时候,如果我们网关有新的特性要发布,那么此时如果把网关停止再启动新版本,此时bgp路由会有短暂撤回再播出的过程,会有网络抖动

期待的行为:无损变更

我们希望bgp网关服务在变更的时候,播出去的路由能够在bgp网关中断时,继续保持一段时间,除非过了这段时间,bgp网关仍无法正常启动,对端网络设备再进行路由撤回

graceful restart特性

  • bgp服务非正常退出时,会启动优雅重启特性,此时路由不会马上撤回
  • bgp服务是被SIGTERM信号终止的时候,则会马上回撤路由

解析:

通过此配置,如果与对等方协商了优雅重启功能,则对等方启动优雅重启帮助程序,当 gobgpd 非自愿死亡或 SIGINT 时,SIGKILL 信号发送到 gobgpd。请注意,当 SIGTERM 信号发送到 gobgpd 时,优雅重启协商的对等点不会启动优雅重启帮助程序,因为 gobgpd 在它死亡之前会向这些对等点发送通知消息

graceful restart演示:gobgp graceful restart example

场景:

192.168.128.132节点与192.168.128.134节点建立bgp连接,132向134宣告路由,同时132会模拟退出后,让134进行路由保持的特性,也即:graceful restart

192.168.128.132节点的bgp config文件:

[global.config]as = 65001router-id = "192.168.128.132"
[[neighbors]][neighbors.config]neighbor-address = "192.168.128.134"peer-as = 65001[neighbors.graceful-restart.config]enabled = truerestart-time = 30[[neighbors.afi-safis]][neighbors.afi-safis.config]afi-safi-name = "ipv4-unicast"[neighbors.afi-safis.mp-graceful-restart.config]enabled = true[neighbors.afi-safis.long-lived-graceful-restart.config]enabled = truerestart-time = 30

192.168.128.134节点的bgp config文件:

[global.config]as = 65001router-id = "192.168.128.134"
[[neighbors]][neighbors.config]neighbor-address = "192.168.128.132"peer-as = 65001[neighbors.graceful-restart.config]enabled = truelong-lived-enabled = truerestart-time = 30notification-enabled = true[[neighbors.afi-safis]][neighbors.afi-safis.config]afi-safi-name = "ipv4-unicast"[neighbors.afi-safis.mp-graceful-restart.config]enabled = true[neighbors.afi-safis.long-lived-graceful-restart.config]enabled = truerestart-time = 30

启动两个bgp server:

sudo ./gobgpd -f bgp-graceful.conf -l debug -p -r

在132上宣告一条路由:./gobgp global rib -a ipv4 add 192.168.3.0/24 origin igp

                  Key=192.168.128.134 Topic=config
INFO[0000] Add a peer configuration                      Key=192.168.128.134 Topic=Peer
DEBU[0000] IdleHoldTimer expired                         Duration=0 Key=192.168.128.134 Topic=Peer
DEBU[0000] state changed                                 Key=192.168.128.134 Topic=Peer new=BGP_FSM_ACTIVE old=BGP_FSM_IDLE reason=idle-hold-timer-expired
DEBU[0005] try to connect                                Key=192.168.128.134 Topic=Peer
DEBU[0005] state changed                                 Key=192.168.128.134 Topic=Peer new=BGP_FSM_OPENSENT old=BGP_FSM_ACTIVE reason=new-connection
DEBU[0005] state changed                                 Key=192.168.128.134 Topic=Peer new=BGP_FSM_OPENCONFIRM old=BGP_FSM_OPENSENT reason=open-msg-received
INFO[0005] Peer Up                                       Key=192.168.128.134 State=BGP_FSM_OPENCONFIRM Topic=Peer
DEBU[0005] state changed                                 Key=192.168.128.134 Topic=Peer new=BGP_FSM_ESTABLISHED old=BGP_FSM_OPENCONFIRM reason=open-msg-negotiated
DEBU[0005] Now syncing, suppress sending updates. start deferral timer  Duration=360 Key=192.168.128.134 Topic=Server
DEBU[0005] received update                               Key=192.168.128.134 Topic=Peer attributes="[]" nlri="[]" withdrawals="[]"
DEBU[0005] EOR received                                  AddressFamily=ipv4-unicast Key=192.168.128.134 Topic=Peer
INFO[0005] sync finished                                 Topic=Server
DEBU[0005] sent update                                   Key=192.168.128.134 State=BGP_FSM_ESTABLISHED Topic=Peer attributes="[]" nlri="[]" withdrawals="[]"
DEBU[0012] create Destination                            Nlri=192.168.3.0/24 Topic=Table
DEBU[0012] sent update                                   Key=192.168.128.134 State=BGP_FSM_ESTABLISHED Topic=Peer attributes="[{Origin: i}  {Nexthop: 192.168.128.132} {LocalPref: 100}]" nlri="[192.168.3.0/24]" withdrawals="[]"

在134上查看从132邻居学到的路由:

luzejia@luzejia-virtual-machine:~/Desktop$ ./gobgp neighbor 192.168.128.132 adj-inID  Network              Next Hop             AS_PATH              Age        Attrs0   192.168.3.0/24       192.168.128.132                           00:00:02   [{Origin: i} {LocalPref: 100}]

使用ctrl + c将bgp server停掉,可以看到做了一些清理现场的行为,让134对端知道你是正常退出,不需要启动优雅重启,直接回撤路由即可

sudo ./gobgpd -f bgp-graceful.conf -l debug -p -r
INFO[0000] gobgpd started                               
INFO[0000] Finished reading the config file              Topic=Config
INFO[0000] Add Peer                                      Key=192.168.128.134 Topic=config
INFO[0000] Add a peer configuration                      Key=192.168.128.134 Topic=Peer
DEBU[0000] IdleHoldTimer expired                         Duration=0 Key=192.168.128.134 Topic=Peer
DEBU[0000] state changed                                 Key=192.168.128.134 Topic=Peer new=BGP_FSM_ACTIVE old=BGP_FSM_IDLE reason=idle-hold-timer-expired
DEBU[0001] Accepted a new passive connection             Key=192.168.128.134 Topic=Peer
DEBU[0001] stop connect loop                             Key=192.168.128.134 Topic=Peer
DEBU[0001] state changed                                 Key=192.168.128.134 Topic=Peer new=BGP_FSM_OPENSENT old=BGP_FSM_ACTIVE reason=new-connection
DEBU[0001] peer has restarted, skipping wait for EOR     Key=192.168.128.134 State=BGP_FSM_OPENSENT Topic=Peer
DEBU[0001] state changed                                 Key=192.168.128.134 Topic=Peer new=BGP_FSM_OPENCONFIRM old=BGP_FSM_OPENSENT reason=open-msg-received
INFO[0001] Peer Up                                       Key=192.168.128.134 State=BGP_FSM_OPENCONFIRM Topic=Peer
DEBU[0001] state changed                                 Key=192.168.128.134 Topic=Peer new=BGP_FSM_ESTABLISHED old=BGP_FSM_OPENCONFIRM reason=open-msg-negotiated
INFO[0001] sync finished                                 Key=192.168.128.134 Topic=Server
DEBU[0001] received update                               Key=192.168.128.134 Topic=Peer attributes="[]" nlri="[]" withdrawals="[]"
DEBU[0001] EOR received                                  AddressFamily=ipv4-unicast Key=192.168.128.134 Topic=Peer
DEBU[0001] sent update                                   Key=192.168.128.134 State=BGP_FSM_ESTABLISHED Topic=Peer attributes="[]" nlri="[]" withdrawals="[]"
DEBU[0008] create Destination                            Nlri=192.168.3.0/24 Topic=Table
DEBU[0008] sent update                                   Key=192.168.128.134 State=BGP_FSM_ESTABLISHED Topic=Peer attributes="[{Origin: i}  {Nexthop: 192.168.128.132} {LocalPref: 100}]" nlri="[192.168.3.0/24]" withdrawals="[]"
^CINFO[0021] stopping gobgpd server                       
INFO[0021] Delete a peer configuration                   Key=192.168.128.134 Topic=Peer
INFO[0021] Peer Down                                     Key=192.168.128.134 Reason=dying State=BGP_FSM_ESTABLISHED Topic=Peer
DEBU[0021] freed fsm.h                                   Key=192.168.128.134 State=BGP_FSM_ESTABLISHED Topic=Peer

134上观察路由,发现被回撤:

luzejia@luzejia-virtual-machine:~/Desktop$ ./gobgp neighbor 192.168.128.132 adj-in
neighbor 192.168.128.132's BGP session is not established

查看134的日志,原因是识别到了132是peer down,然后回撤路由

INFO[0028] Peer Down                                     Key=192.168.128.132 Reason="notification-received code 6(cease) subcode 3(peer deconfigured)" State=BGP_FSM_ESTABLISHED Topic=Peer
DEBU[0028] state changed                                 Key=192.168.128.132 Topic=Peer new=BGP_FSM_IDLE old=BGP_FSM_ESTABLISHED reason="notification-received code 6(cease) subcode 3(peer deconfigured)"
DEBU[0028] Removing withdrawals                          Key=192.168.3.0/24 Topic=Table
DEBU[0033] IdleHoldTimer expired                         Duration=5 Key=192.168.128.132 Topic=Peer
DEBU[0033] state changed                                 Key=192.168.128.132 Topic=Peer new=BGP_FSM_ACTIVE old=BGP_FSM_IDLE reason=idle-hold-timer-expired

如果使用kill -9来杀掉132上的bgp server

luzejia@luzejia-virtual-machine:~/bgp$ sudo ./gobgpd -f bgp-graceful.conf -l debug -p -r
INFO[0000] gobgpd started                               
INFO[0000] Finished reading the config file              Topic=Config
INFO[0000] Add Peer                                      Key=192.168.128.134 Topic=config
INFO[0000] Add a peer configuration                      Key=192.168.128.134 Topic=Peer
DEBU[0000] IdleHoldTimer expired                         Duration=0 Key=192.168.128.134 Topic=Peer
DEBU[0000] state changed                                 Key=192.168.128.134 Topic=Peer new=BGP_FSM_ACTIVE old=BGP_FSM_IDLE reason=idle-hold-timer-expired
DEBU[0005] try to connect                                Key=192.168.128.134 Topic=Peer
DEBU[0005] state changed                                 Key=192.168.128.134 Topic=Peer new=BGP_FSM_OPENSENT old=BGP_FSM_ACTIVE reason=new-connection
DEBU[0005] state changed                                 Key=192.168.128.134 Topic=Peer new=BGP_FSM_OPENCONFIRM old=BGP_FSM_OPENSENT reason=open-msg-received
INFO[0005] Peer Up                                       Key=192.168.128.134 State=BGP_FSM_OPENCONFIRM Topic=Peer
DEBU[0005] state changed                                 Key=192.168.128.134 Topic=Peer new=BGP_FSM_ESTABLISHED old=BGP_FSM_OPENCONFIRM reason=open-msg-negotiated
DEBU[0005] Now syncing, suppress sending updates. start deferral timer  Duration=360 Key=192.168.128.134 Topic=Server
DEBU[0005] received update                               Key=192.168.128.134 Topic=Peer attributes="[]" nlri="[]" withdrawals="[]"
DEBU[0005] EOR received                                  AddressFamily=ipv4-unicast Key=192.168.128.134 Topic=Peer
INFO[0005] sync finished                                 Topic=Server
DEBU[0005] sent update                                   Key=192.168.128.134 State=BGP_FSM_ESTABLISHED Topic=Peer attributes="[]" nlri="[]" withdrawals="[]"
DEBU[0012] create Destination                            Nlri=192.168.3.0/24 Topic=Table
DEBU[0012] sent update                                   Key=192.168.128.134 State=BGP_FSM_ESTABLISHED Topic=Peer attributes="[{Origin: i}  {Nexthop: 192.168.128.132} {LocalPref: 100}]" nlri="[192.168.3.0/24]" withdrawals="[]"
已杀死

134上观察到132的路由还在,并且有S标志,这个是保留的意思,证明启动了优雅重启,暂时不回撤,等待对端重启:

luzejia@luzejia-virtual-machine:~/Desktop$ ./gobgp neighbor 192.168.128.132 adj-inID  Network              Next Hop             AS_PATH              Age        Attrs
S  0   192.168.3.0/24       192.168.128.132                           00:00:21   [{Origin: i} {LocalPref: 100}]

134上看到识别出了peer是graceful restart,启动了优雅重启,没有马上回撤路由,但是过了超时时间后还是回撤了路由:

DEBU[0053] From same AS, ignore                          Key=192.168.128.132 Path="{ 192.168.3.0/24 | src: { 192.168.128.132 | as: 65001, id: 192.168.128.132 }, nh: 192.168.128.132 }" Topic=Peer
INFO[0071] peer graceful restart                         Key=192.168.128.132 State=BGP_FSM_ESTABLISHED Topic=Peer
INFO[0071] Peer Down                                     Key=192.168.128.132 Reason=graceful-restart State=BGP_FSM_ESTABLISHED Topic=Peer
DEBU[0071] state changed                                 Key=192.168.128.132 Topic=Peer new=BGP_FSM_IDLE old=BGP_FSM_ESTABLISHED reason=graceful-restart
DEBU[0071] Implicit withdrawal of old path, since we have learned new path from the same peer  Key=192.168.3.0/24 Path="{ 192.168.3.0/24 | src: { 192.168.128.132 | as: 65001, id: 192.168.128.132 }, nh: 192.168.128.132 }" Topic=Table
DEBU[0076] IdleHoldTimer expired                         Duration=5 Key=192.168.128.132 Topic=Peer
DEBU[0076] state changed                                 Key=192.168.128.132 Topic=Peer new=BGP_FSM_ACTIVE old=BGP_FSM_IDLE reason=idle-hold-timer-expired
DEBU[0085] try to connect                                Key=192.168.128.132 Topic=Peer
DEBU[0085] failed to connect                             Error="dial tcp 0.0.0.0:0->192.168.128.132:179: connect: connection refused" Key=192.168.128.132 Topic=Peer
WARN[0101] graceful restart timer expired                Key=192.168.128.132 State=BGP_FSM_ACTIVE Topic=Peer
DEBU[0101] stop connect loop                             Key=192.168.128.132 Topic=Peer
DEBU[0101] state changed                                 Key=192.168.128.132 Topic=Peer new=BGP_FSM_IDLE old=BGP_FSM_ACTIVE reason=restart-timer-expired
DEBU[0101] Removing withdrawals                          Key=192.168.3.0/24 Topic=Table
DEBU[0106] IdleHoldTimer expired                         Duration=5 Key=192.168.128.132 Topic=Peer
DEBU[0106] state changed                                 Key=192.168.128.132 Topic=Peer new=BGP_FSM_ACTIVE old=BGP_FSM_IDLE reason=idle-hold-timer-expired
DEBU[0115] try to connect                                Key=192.168.128.132 Topic=Peer
DEBU[0115] failed to connect                             Error="dial tcp 0.0.0.0:0->192.168.128.132:179: connect: connection refused" Key=192.168.128.132 Topic=Peer

过了graceful resstart的timeout时间后,看到路由被正常撤回

luzejia@luzejia-virtual-machine:~/Desktop$ ./gobgp neighbor 192.168.128.132 adj-in
neighbor 192.168.128.132's BGP session is not established

总结:

  • bgp服务非正常退出时,会启动优雅重启特性,此时路由不会马上撤回
  • bgp服务是被SIGTERM信号终止的时候,则会马上回撤路由

注意:

  • bgp服务是被SIGTERM信号终止的时候,则会马上回撤路由,这部分需要自己实现去捕捉SIGTERM信号,然后调用gobgp server的stop接口,才能实现路由回撤,也就是实际stop接口向对端宣告了一个自己是正常退出的down信息,从而告知对端此时不需要启动优雅重启特性来保持路由,直接回撤即可

捕捉SIGTERM信号并进行处理,参考gobgpd源码,给出一个example:

package mainimport ("fmt""io""net/http"_ "net/http/pprof""os""os/signal""runtime""syscall""github.com/coreos/go-systemd/v22/daemon""github.com/jessevdk/go-flags""github.com/kr/pretty""github.com/sirupsen/logrus""golang.org/x/net/context""google.golang.org/grpc""google.golang.org/grpc/credentials""github.com/osrg/gobgp/v3/internal/pkg/version""github.com/osrg/gobgp/v3/pkg/config""github.com/osrg/gobgp/v3/pkg/server"
)func main() {sigCh := make(chan os.Signal, 1)signal.Notify(sigCh, syscall.SIGTERM, syscall.SIGINT)......logger.Info("gobgpd started")bgpServer := server.NewBgpServer(server.GrpcListenAddress(opts.GrpcHosts), server.GrpcOption(grpcOpts), server.LoggerOption(&builtinLogger{logger: logger}))go bgpServer.Serve()for sig := range sigCh {if sig != syscall.SIGHUP {stopServer(bgpServer, opts.UseSdNotify)return}logger.WithFields(logrus.Fields{"Topic": "Config",}).Info("Reload the config file")newConfig, err := config.ReadConfigFile(opts.ConfigFile, opts.ConfigType)if err != nil {logger.WithFields(logrus.Fields{"Topic": "Config","Error": err,}).Warningf("Can't read config file %s", opts.ConfigFile)continue}currentConfig, err = config.UpdateConfig(context.Background(), bgpServer, currentConfig, newConfig)if err != nil {logrus.WithFields(logrus.Fields{"Topic": "Config","Error": err,}).Warningf("Failed to update config %s", opts.ConfigFile)continue}}
}func stopServer(bgpServer *server.BgpServer, useSdNotify bool) {logger.Info("stopping gobgpd server")bgpServer.Stop()if useSdNotify {daemon.SdNotify(false, daemon.SdNotifyStopping)}
}
http://www.lryc.cn/news/12503.html

相关文章:

  • 一起学 pixijs(1):常见图形的绘制
  • 2023年PMP考试教材有哪些?(含pmp资料)
  • centos7防火墙工具firewall-cmd使用
  • js html过滤所有标签格式并清除所有nbsp;
  • 「技术选型」深度学习软件如何选择?
  • 加油站会员管理小程序实战开发教程13
  • Go语言Web入门之浅谈Gin框架
  • 《MySQL学习》 MySQL优化器选择如何选择索引
  • uniapp 悬浮窗(应用内、无需授权) Ba-FloatWindow2
  • MMKV与mmap:全方位解析
  • 【信息系统项目管理师】项目管理十大知识领域记忆敲出(整体范围进度)
  • 一起学 pixijs(3):Sprite
  • 深入讲解Kubernetes架构-垃圾收集
  • Flink03: 集群安装部署
  • OCR项目实战(一):手写汉语拼音识别(Pytorch版)
  • 【js】export default也在影响项目性能呢
  • 《软件安全》 彭国军 阅读总结
  • 深入讲解Kubernetes架构-节点与控制面之间的通信
  • 120个IT冷知识,看完就不愁做选择题了
  • Java之动态规划之机器人移动
  • seata源码-全局事务提交 服务端源码
  • C++ 模板
  • JWT安全漏洞以及常见攻击方式
  • 华为OD机试题 - 最小施肥机能效(JavaScript)
  • Python(1)变量的命名规则
  • Shiro1.9学习笔记
  • 2.5|iot|嵌入式Linux系统开发与应用|第4章:Linux外壳shell脚本程序编程
  • 九龙证券|连续七周获加仓,四大行业成“香饽饽”!
  • 210天从外包踏进华为跳动那一刻,我泪目了
  • CMake 引入第三方库