当前位置: 首页 > article >正文

矩阵的偏导数

X = ( x i j ) m × n X = (x_{ij})_{m \times n} X=(xij)m×n,函数 f ( X ) = f ( x 11 , x 12 , … , x 1 n , x 21 , … , x m n ) f(X) = f(x_{11}, x_{12}, \ldots, x_{1n}, x_{21}, \ldots, x_{mn}) f(X)=f(x11,x12,,x1n,x21,,xmn) 是一个 m × n m \times n m×n 元的多元函数,且偏导数

∂ f ∂ x i j ( i = 1 , 2 , … , m , j = 1 , 2 , … , n ) \frac{\partial f}{\partial x_{ij}} \quad (i=1,2,\ldots,m,\ j=1,2,\ldots,n) xijf(i=1,2,,m, j=1,2,,n)

都存在。定义 f ( X ) f(X) f(X) 对矩阵 X X X 的导数为:

d f ( X ) d X = ( ∂ f ∂ x i j ) m × n = [ ∂ f ∂ x 11 ⋯ ∂ f ∂ x 1 n ⋮ ⋱ ⋮ ∂ f ∂ x m 1 ⋯ ∂ f ∂ x m n ] \frac{df(X)}{dX} = \left( \frac{\partial f}{\partial x_{ij}} \right)_{m \times n} =\begin{bmatrix} \frac{\partial f}{\partial x_{11}} & \cdots & \frac{\partial f}{\partial x_{1n}} \\ \vdots & \ddots & \vdots \\ \frac{\partial f}{\partial x_{m1}} & \cdots & \frac{\partial f}{\partial x_{mn}} \end{bmatrix} dXdf(X)=(xijf)m×n= x11fxm1fx1nfxmnf

(1) 设 x = ( ξ 1 , ξ 2 , ⋯ , ξ n ) ⊤ \mathbf{x} = (\xi_1, \xi_2, \cdots, \xi_n)^\top x=(ξ1,ξ2,,ξn) n n n 元函数 f ( x ) f(\mathbf{x}) f(x),求 d f d x ⊤ \frac{df}{d\mathbf{x}^\top} dxdf d f d x \frac{df}{d\mathbf{x}} dxdf d 2 f d x 2 \frac{d^2f}{d\mathbf{x}^2} dx2d2f

d f d x ⊤ = ( ∂ f ∂ ξ 1 , ∂ f ∂ ξ 2 , ⋯ , ∂ f ∂ ξ n ) \frac{df}{d\mathbf{x}^\top} = \begin{pmatrix} \frac{\partial f}{\partial \xi_1}, \frac{\partial f}{\partial \xi_2},\cdots, \frac{\partial f}{\partial \xi_n} \end{pmatrix} dxdf=(ξ1f,ξ2f,,ξnf)

∇ f ( x ) = d f d x = ( ∂ f ∂ ξ 1 ∂ f ∂ ξ 2 ⋮ ∂ f ∂ ξ n ) ,这就是梯度。 \nabla f(\mathbf{x}) = \frac{df}{d\mathbf{x}} = \begin{pmatrix} \frac{\partial f}{\partial \xi_1} \\ \frac{\partial f}{\partial \xi_2} \\ \vdots \\ \frac{\partial f}{\partial \xi_n} \end{pmatrix} \text{,这就是梯度。} f(x)=dxdf= ξ1fξ2fξnf ,这就是梯度。

H ( x ) = ∇ 2 f ( x ) = ∂ 2 f ∂ x ∂ x ⊤ = [ ∂ 2 f ∂ ξ 1 2 ∂ 2 f ∂ ξ 1 ∂ ξ 2 ⋯ ∂ 2 f ∂ ξ 1 ∂ ξ n ∂ 2 f ∂ ξ 2 ∂ ξ 1 ∂ 2 f ∂ ξ 2 2 ⋯ ∂ 2 f ∂ ξ 2 ∂ ξ n ⋮ ⋮ ⋱ ⋮ ∂ 2 f ∂ ξ n ∂ ξ 1 ∂ 2 f ∂ ξ n ∂ ξ 2 ⋯ ∂ 2 f ∂ ξ n 2 ] , 这就是Hessian 矩阵,它是对称的。 H(\mathbf{x}) = \nabla^2 f(\mathbf{x}) = \frac{\partial^2 f}{\partial \mathbf{x} \partial \mathbf{x}^\top} = \begin{bmatrix} \frac{\partial^2 f}{\partial \xi_1^2} & \frac{\partial^2 f}{\partial \xi_1 \partial \xi_2} & \cdots & \frac{\partial^2 f}{\partial \xi_1 \partial \xi_n} \\ \frac{\partial^2 f}{\partial \xi_2 \partial \xi_1} & \frac{\partial^2 f}{\partial \xi_2^2} & \cdots & \frac{\partial^2 f}{\partial \xi_2 \partial \xi_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial \xi_n \partial \xi_1} & \frac{\partial^2 f}{\partial \xi_n \partial \xi_2} & \cdots & \frac{\partial^2 f}{\partial \xi_n^2} \end{bmatrix}, \text{这就是Hessian 矩阵,它是对称的。} H(x)=2f(x)=xx2f= ξ122fξ2ξ12fξnξ12fξ1ξ22fξ222fξnξ22fξ1ξn2fξ2ξn2fξn22f ,这就是Hessian 矩阵,它是对称的。

(2) 设 a = ( a 1 , a 2 , ⋯ , a n ) ⊤ \mathbf{a} = \begin{pmatrix} a_1, a_2, \cdots, a_n \end{pmatrix}^\top a=(a1,a2,,an) 为向量变量,且 f ( x ) = f ( x , a ) f(\mathbf{x}) = f(\mathbf{x}, \mathbf{a}) f(x)=f(x,a),求 ∂ f ∂ x \frac{\partial f}{\partial \mathbf{x}} xf

解:由于 f ( x ) = ∑ i = 1 n a i ξ j f(\mathbf{x}) = \sum_{i=1}^{n} a_i \xi_j f(x)=i=1naiξj ∂ f ∂ ξ j = a j \frac{\partial f}{\partial \xi_j} = a_j ξjf=aj ( j = 1 , 2 , ⋯ , n ) (j = 1,2,\cdots, n) (j=1,2,,n),所以

∂ f ∂ x = ( ∂ f ∂ ξ 1 ∂ f ∂ ξ 2 ⋮ ∂ f ∂ ξ n ) = ( a 1 a 2 ⋮ a n ) = a \frac{\partial f}{\partial \mathbf{x}} = \begin{pmatrix} \frac{\partial f}{\partial \xi_1} \\ \frac{\partial f}{\partial \xi_2} \\ \vdots \\ \frac{\partial f}{\partial \xi_n} \end{pmatrix} = \begin{pmatrix} a_1 \\ a_2 \\ \vdots \\ a_n \end{pmatrix} = \mathbf{a} xf= ξ1fξ2fξnf = a1a2an =a

(3) 设 A = ( a i j ) m × n A = \left(a_{ij}\right)_{m \times n} A=(aij)m×n 为常矩阵, X = ( x i j ) n × m X = \left( x_{ij} \right)_{n \times m} X=(xij)n×m 为矩阵变量,且 f ( X ) = tr ⁡ ( A X ) f(\mathbf{X}) = \operatorname{tr}(\mathbf{A X}) f(X)=tr(AX),求 ∂ f ∂ X \frac{\partial f}{\partial X} Xf

分析:
( c 11 ⋯ c 1 m ⋮ ⋱ ⋮ c m 1 ⋯ c m m ) = ( a 11 ⋯ a 1 n ⋮ ⋱ ⋮ a m 1 ⋯ a m n ) ( x 11 ⋯ x 1 n ⋮ ⋱ ⋮ x n 1 ⋯ x n m ) \begin{pmatrix} c_{11} & \cdots & c_{1m} \\ \vdots & \ddots & \vdots \\ c_{m1} & \cdots & c_{mm} \end{pmatrix}=\begin{pmatrix} a_{11} & \cdots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} & \cdots & a_{mn} \end{pmatrix}\begin{pmatrix} x_{11} & \cdots & x_{1n} \\ \vdots & \ddots & \vdots \\ x_{n1} & \cdots & x_{nm} \end{pmatrix} c11cm1c1mcmm = a11am1a1namn x11xn1x1nxnm

展开后得:
c 11 = a 11 x 11 + a 12 x 21 + ⋯ + a 1 n x n 1 , c 22 = a 21 x 12 + a 22 x 22 + ⋯ + a 2 n x n 2 , ⋮ c m n = a m 1 x 1 m + a m 2 x 2 m + ⋯ + a m n x n m \begin{equation} \begin{aligned} c_{11} &= a_{11}x_{11} + a_{12}x_{21} + \cdots + a_{1n}x_{n1}, \\ c_{22} &= a_{21}x_{12} + a_{22}x_{22} + \cdots + a_{2n}x_{n2}, \\ &\qquad \mathllap{\vdots} \\ c_{mn} &= a_{m1}x_{1m} + a_{m2}x_{2m} + \cdots + a_{mn}x_{nm} \end{aligned} \end{equation} c11c22cmn=a11x11+a12x21++a1nxn1,=a21x12+a22x22++a2nxn2,=am1x1m+am2x2m++amnxnm

规律:每个 x x x 只会被用到一次, x x x 的下标和 a a a 的下标是相反的。

解:由于 A X = ( ∑ i = 1 n a i k x k i ) m × m AX = \left(\sum_{i=1}^n a_{ik}x_{ki}\right)_{m \times m} AX=(i=1naikxki)m×m

所以: f ( X ) = tr ⁡ ( A X ) = ∑ s = 1 m ∑ k = 1 n a s k x k s f(\mathbf{X}) = \operatorname{tr}(\mathbf{AX}) = \sum_{s=1}^{m} \sum_{k=1}^n a_{sk} x_{ks} f(X)=tr(AX)=s=1mk=1naskxks

而:
( ∂ f ∂ x i j ) n × m = ( a j i ) n × m ( i = 1 , 2 , ⋯ , n , j = 1 , 2 , ⋯ , m ) \left( \frac{\partial f}{\partial x_{ij}} \right)_{n \times m} = (a_{ji})_{n \times m} \quad (i=1,2,\cdots,n, j = 1,2,\cdots,m) (xijf)n×m=(aji)n×m(i=1,2,,n,j=1,2,,m)

故:
∂ f ∂ X = ( ∂ f ∂ x i j ) = ( a j i ) n × m = A ⊤ \frac{\partial f}{\partial X} = \left( \frac{\partial f}{\partial x_{ij}} \right) = (a_{ji})_{n \times m} = A^\top Xf=(xijf)=(aji)n×m=A

(4) 设 x = ( ξ 1 , ξ 2 , ⋯ , ξ n ) ⊤ \mathbf{x} = \left( \xi_1, \xi_2, \cdots, \xi_n \right)^\top x=(ξ1,ξ2,,ξn),矩阵 A = ( a i j ) n × n A = \left(a_{ij}\right)_{n \times n} A=(aij)n×n n n n 元函数 f ( x ) = x ⊤ A x f(\mathbf{x}) = \mathbf{x}^\top A \mathbf{x} f(x)=xAx,求导数 d f d x \dfrac{d f}{d \mathbf{x}} dxdf

解:因
f ( x ) = x ⊤ A x = ( ξ 1 , ξ 2 , ⋯ , ξ n ) ( a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋱ ⋮ a n 1 a n 2 ⋯ a n n ) ( ξ 1 ξ 2 ξ 3 ⋮ ξ n ) = ( ξ 1 ξ 2 ⋯ ξ k ⋯ ξ n ) ( ∑ i = 1 n a 1 i ξ i ∑ i = 1 n a 2 i ξ i ⋮ ∑ i = 1 n a k i ξ i ⋮ ∑ i = 1 n a n i ξ i ) = ξ 1 ∑ j = 1 n a 1 j ξ j + ⋯ + ξ k ∑ j = 1 n a k j ξ j + ⋯ + ξ n ∑ j = 1 n a n j ξ j \begin{align*} f\left( \mathbf{x} \right) &= \mathbf{x}^\top A \mathbf{x} \\ &= \left( \xi_1, \xi_2, \cdots, \xi_n \right) \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{pmatrix} \begin{pmatrix} \xi_1 \\ \xi_2 \\ \xi_3 \\ \vdots \\ \xi_n \end{pmatrix} \\ &= \left( \begin{array}{cccccc} \xi_1 & \xi_2 & \cdots & \xi_k & \cdots & \xi_n \end{array} \right) \left( \begin{array}{c} \displaystyle \sum_{i=1}^n a_{1i} \xi_i \\ \displaystyle \sum_{i=1}^n a_{2i} \xi_i \\ \vdots \\ \displaystyle \sum_{i=1}^n a_{ki} \xi_i \\ \vdots \\ \displaystyle \sum_{i=1}^n a_{ni} \xi_i \end{array} \right) \\ &= \xi_1\sum_{j=1}^{n}a_{1j}\xi_j + \cdots + \xi_k\sum_{j=1}^{n}a_{kj}\xi_j + \cdots + \xi_n\sum_{j=1}^{n}a_{nj}\xi_j \end{align*} f(x)=xAx=(ξ1,ξ2,,ξn) a11a21an1a12a22an2a1na2nann ξ1ξ2ξ3ξn =(ξ1ξ2ξkξn) i=1na1iξii=1na2iξii=1nakiξii=1naniξi =ξ1j=1na1jξj++ξkj=1nakjξj++ξnj=1nanjξj

所以:
∂ f ( x ) ∂ ξ k = ξ 1 a 1 k + ⋯ + ξ k − 1 a k − 1 , k + ( ∑ j = 1 n a k j ξ j + ξ k a k k ) + ξ k + 1 a k + 1 , k + ⋯ + ξ n a n k = ∑ i = 1 n a i k ξ i + ∑ j = 1 n a k j ξ j , k = 1 , 2 , ⋯ , n \begin{align*} \frac{\partial f(\mathbf{x})}{\partial \xi_k} &= \xi_1 a_{1k} + \cdots + \xi_{k-1} a_{k-1,k} + \left( \sum_{j=1}^{n} a_{kj} \xi_j + \xi_k a_{kk}\right) + \xi_{k+1} a_{k+1,k} + \cdots + \xi_n a_{nk} \\ &= \sum_{i=1}^n a_{ik} \xi_i + \sum_{j=1}^n a_{kj} \xi_j, \quad k=1,2,\cdots,n \end{align*} ξkf(x)=ξ1a1k++ξk1ak1,k+(j=1nakjξj+ξkakk)+ξk+1ak+1,k++ξnank=i=1naikξi+j=1nakjξj,k=1,2,,n
所以:
d f d x = ( ∂ f ∂ ξ 1 ∂ f ∂ ξ 2 ⋮ ∂ f ∂ ξ n ) = ( ∑ j = 1 n a 1 j ξ j ∑ j = 1 n a 2 j ξ j ⋮ ∑ j = 1 n a n j ξ j ) + ( ∑ i = 1 n a i 1 ξ i ∑ i = 1 n a i 2 ξ i ⋮ ∑ i = 1 n a i n ξ i ) = A x + A ⊤ x = ( A + A ⊤ ) x \begin{align*} \dfrac{d f}{d \mathbf{x}} &=\begin{pmatrix} \frac{\partial f}{\partial \xi_1} \\ \frac{\partial f}{\partial \xi_2} \\ \vdots \\ \frac{\partial f}{\partial \xi_n} \end{pmatrix} =\left( \begin{array}{c} \displaystyle \sum_{j=1}^n a_{1j} \xi_j \\ \displaystyle \sum_{j=1}^n a_{2j} \xi_j \\ \vdots \\ \displaystyle \sum_{j=1}^n a_{nj} \xi_j \end{array} \right) + \left( \begin{array}{c} \displaystyle \sum_{i=1}^n a_{i1} \xi_i \\ \displaystyle \sum_{i=1}^n a_{i2} \xi_i \\ \vdots \\ \displaystyle \sum_{i=1}^n a_{in} \xi_i \end{array} \right) \\ &=Ax + A^\top x = (A + A^\top)x \end{align*} dxdf= ξ1fξ2fξnf = j=1na1jξjj=1na2jξjj=1nanjξj + i=1nai1ξii=1nai2ξii=1nainξi =Ax+Ax=(A+A)x
特别地,当A为对称矩阵时, d f d x = 2 A x \dfrac{d f}{d \mathbf{x}} = 2Ax dxdf=2Ax

http://www.lryc.cn/news/2402824.html

相关文章:

  • 第R9周:阿尔茨海默病诊断(优化特征选择版)
  • 电动螺丝刀-多实体拆图建模案例
  • 当丰收季遇上超导磁测量:粮食产业的科技新征程
  • 电子电气架构 --- 什么是功能架构?
  • Android四大组件通讯指南:Kotlin版组件茶话会
  • C++.OpenGL (11/64)材质(Materials)
  • AudioRelay 0.27.5 手机充当电脑音响
  • 会计 - 合并1- 业务、控制、合并日
  • 前端项目eslint配置选项详细解析
  • NVIDIA Dynamo:数据中心规模的分布式推理服务框架深度解析
  • 第十三节:第四部分:集合框架:HashMap、LinkedHashMap、TreeMap
  • Spring AI之RAG入门
  • 应用案例 | 设备分布广, 现场维护难? 宏集Cogent DataHub助力分布式锅炉远程运维, 让现场变“透明”
  • C#中的密封类与静态类:特性、区别与应用实例
  • LINUX 66 FTP 2 ;FTP被动模式;FTP客户服务系统
  • 网心云 OEC/OECT 笔记(2) 运行RKNN程序
  • vue-21 (使用 Vuex 模块和异步操作构建复杂应用)
  • #开发环境篇:postMan可以正常调通,但是浏览器里面一直报403
  • 将word文件转为kindle可识别的azw3文件的方法
  • 动态规划之01背包
  • Lua和JS的继承原理
  • 灵活控制,modbus tcp转ethernetip的 多功能水处理方案
  • boost::qvm 使用示例
  • go语言学习 第6章:错误处理
  • VMware 安装 CentOS8详细教程 (附步骤截图)附连接公网、虚拟机yum源等系统配置
  • Editing Language Model-based Knowledge Graph Embeddings
  • 深入了解linux系统—— 进程池
  • JavaScript 原型与原型链:深入理解 __proto__ 和 prototype 的由来与关系
  • 逻辑回归与Softmax
  • vscode .husky/pre-commit: line 4: npx: command not found