当前位置: 首页 > news >正文

改进神经网络

Improve NN

文章目录

  • Improve NN
    • train/dev/test set
    • Bias/Variance
    • basic recipe
    • Regularization
      • Logistic Regression
      • Neural network
      • other ways
    • optimization problem
      • Normalizing inputs
      • vanishing/exploding gradients
      • weight initialize
      • gradient check
        • Numerical approximation
        • grad check

train/dev/test set

0.7/0/0.3 0.6.0.2.0.2 -> 100-10000

0.98/0.01/0.01 … -> big data

Bias/Variance

偏差度量的是单个模型的学习能力,而方差度量的是同一个模型在不同数据集上的稳定性。

在这里插入图片描述

high variance ->high dev set error

high bias ->high train set error

basic recipe

high bias -> bigger network / train longer / more advanced optimization algorithms / NN architectures

high variance -> more data / regularization / NN architecture

Regularization

Logistic Regression

L 2 r e g u l a r i z a t i o n : m i n J ( w , b ) → J ( w , b ) = 1 m ∑ i = 1 m L ( y ^ ( i ) , y ( i ) ) + λ 2 m ∥ w ∥ 2 2 L2\;\; regularization:\\min\mathcal{J}(w,b)\rightarrow J(w,b)=\frac{1}{m}\sum_{i=1}^m\mathcal{L}(\hat y^{(i)},y^{(i)})+\frac{\lambda}{2m}\Vert w\Vert_2^2 L2regularization:minJ(w,b)J(w,b)=m1i=1mL(y^(i),y(i))+2mλw22

Neural network

F r o b e n i u s n o r m ∥ w [ l ] ∥ F 2 = ∑ i = 1 n [ l ] ∑ j = 1 n [ l − 1 ] ( w i , j [ l ] ) 2 D r o p o u t r e g u l a r i z a t i o n : d 3 = n p . r a n d m . r a n d ( a 3. s h a p e . s h a p e [ 0 ] , a 3. s h a p e [ 1 ] < k e e p . p r o b ) a 3 = n p . m u l t i p l y ( a 3 , d 3 ) a 3 / = k e e p . p r o b Frobenius\;\; norm\\ \Vert w^{[l]}\Vert^2_F=\sum_{i=1}^{n^{[l]}}\sum_{j=1}^{n^{[l-1]}}(w_{i,j}^{[l]})^2\\\\ Dropout\;\; regularization:\\ d3=np.randm.rand(a3.shape.shape[0],a3.shape[1]<keep.prob)\\ a3=np.multiply(a3,d3)\\ a3/=keep.prob Frobeniusnormw[l]F2=i=1n[l]j=1n[l1](wi,j[l])2Dropoutregularization:d3=np.randm.rand(a3.shape.shape[0],a3.shape[1]<keep.prob)a3=np.multiply(a3,d3)a3/=keep.prob

other ways

  • early stopping
  • data augmentation

optimization problem

speed up the training of your neural network

Normalizing inputs

  1. subtract mean

μ = 1 m ∑ i = 1 m x ( i ) x : = x − μ \mu =\frac{1}{m}\sum _{i=1}^{m}x^{(i)}\\ x:=x-\mu μ=m1i=1mx(i)x:=xμ

  1. normalize variance

σ 2 = 1 m ∑ i = 1 m ( x ( i ) ) 2 x / = σ \sigma ^2=\frac{1}{m}\sum_{i=1}^m(x^{(i)})^2\\ x/=\sigma σ2=m1i=1m(x(i))2x/=σ

vanishing/exploding gradients

y = w [ l ] w [ l − 1 ] . . . w [ 2 ] w [ 1 ] x w [ l ] > I → ( w [ l ] ) L → ∞ w [ l ] < I → ( w [ l ] ) L → 0 y=w^{[l]}w^{[l-1]}...w^{[2]}w^{[1]}x\\ w^{[l]}>I\rightarrow (w^{[l]})^L\rightarrow\infty \\w^{[l]}<I\rightarrow (w^{[l]})^L\rightarrow0 y=w[l]w[l1]...w[2]w[1]xw[l]>I(w[l])Lw[l]<I(w[l])L0

weight initialize

v a r ( w ) = 1 n ( l − 1 ) w [ l ] = n p . r a n d o m . r a n d n ( s h a p e ) ∗ n p . s q r t ( 1 n ( l − 1 ) ) var(w)=\frac{1}{n^{(l-1)}}\\ w^{[l]}=np.random.randn(shape)*np.sqrt(\frac{1}{n^{(l-1)}}) var(w)=n(l1)1w[l]=np.random.randn(shape)np.sqrt(n(l1)1)

gradient check

Numerical approximation

f ( θ ) = θ 3 f ′ ( θ ) = f ( θ + ε ) − f ( θ − ε ) 2 ε f(\theta)=\theta^3\\ f'(\theta)=\frac{f(\theta+\varepsilon)-f(\theta-\varepsilon)}{2\varepsilon} f(θ)=θ3f(θ)=2εf(θ+ε)f(θε)

grad check

d θ a p p r o x [ i ] = J ( θ 1 , . . . θ i + ε . . . ) − J ( θ 1 , . . . θ i − ε . . . ) 2 ε = d θ [ i ] c h e c k : ∥ d θ a p p r o x − d θ ∥ 2 ∥ d θ a p p r o x ∥ 2 + ∥ d θ ∥ 2 < 1 0 − 7 d\theta_{approx}[i]=\frac{J(\theta_1,...\theta_i+\varepsilon...)-J(\theta_1,...\theta_i-\varepsilon...)}{2\varepsilon}=d\theta[i]\\ check:\frac{\Vert d\theta_{approx}-d\theta\Vert_2}{\Vert d\theta_{approx}\Vert_2+\Vert d\theta\Vert_2}<10^{-7} dθapprox[i]=2εJ(θ1,...θi+ε...)J(θ1,...θiε...)=dθ[i]check:dθapprox2+dθ2dθapproxdθ2<107

http://www.lryc.cn/news/296492.html

相关文章:

  • HarmonyOS 开发学习笔记
  • maven java 如何打纯源码zip包
  • Altium Designer(AD)原理图库添加阵列管脚图文教程及视频演示
  • P3647 题解
  • Vivado Tri-MAC IP的例化配置(三速以太网IP)
  • 交友系统---让陌生人变成熟悉人的过程。APP小程序H5三端源码交付,支持二开。
  • uni-app 经验分享,从入门到离职(三)——关于 uni-app 生命周期快速了解上手
  • PostgreSQL 与 MySQL 相比,优势何在?
  • Linux(三)--文件系统
  • DC-8靶机渗透详细流程
  • SolidWorks学习笔记——入门知识2
  • Elasticsearch:通过 ingest pipeline 对大型文档进行分块
  • 数据库管理-第148期 最强Oracle监控EMCC深入使用-05(20240208)
  • Bug2- Hive元数据启动报错:主机被阻止因连接错误次数过多
  • HarmonyOS 鸿蒙应用开发(十、第三方开源js库移植适配指南)
  • Docker- chapter 1
  • 解决IntellIJ Idea内存不足
  • 【网络技术】【Kali Linux】Nmap嗅探(二)多设备扫描
  • 简化版SpringMVC
  • Java密码校验(正则表达式):密码由这四种元素组成(数字、大写字母、小写字母、特殊字符),且必须包含全部四种元素;密码长度大于等于8个字符。
  • 【AMI】2400 环境安装步骤
  • AI:124-基于深度学习的人体遮挡物体重建技术
  • 23种设计模式之单例模式
  • leetCode 30天
  • vue3+vite+ts 配置commit强制码提交规范配置 commitlint
  • PlateUML绘制UML图教程
  • 自然语言处理(NLP)——使用Rasa创建聊天机器人
  • 使用虚拟主机部署多站点
  • Openresty+Lua+Redis实现高性能缓存
  • 基于Vue2用keydown、keyup事件实现长按键盘任意键(或组合键)3秒触发自定义事件(以F1键为例)