当前位置：首页 > news >正文

改进神经网络

news 2025/8/12 8:08:52

Improve NN

文章目录

Improve NN
- train/dev/test set
- Bias/Variance
- basic recipe
- Regularization
- - Logistic Regression
  - Neural network
  - other ways
- optimization problem
- - Normalizing inputs
  - vanishing/exploding gradients
  - weight initialize
  - gradient check
  - - Numerical approximation
    - grad check

train/dev/test set

0.7/0/0.3 0.6.0.2.0.2 -> 100-10000

0.98/0.01/0.01 … -> big data

Bias/Variance

偏差度量的是单个模型的学习能力，而方差度量的是同一个模型在不同数据集上的稳定性。

在这里插入图片描述

high variance ->high dev set error

high bias ->high train set error

basic recipe

high bias -> bigger network / train longer / more advanced optimization algorithms / NN architectures

high variance -> more data / regularization / NN architecture

Regularization

Logistic Regression

$L2\;\; regularization:\\min\mathcal{J}(w,b)\rightarrow J(w,b)=\frac{1}{m}\sum_{i=1}^m\mathcal{L}(\hat y^{(i)},y^{(i)})+\frac{\lambda}{2m}\Vert w\Vert_2^2$

Neural network

$Frobenius\;\; norm\\ \Vert w^{[l]}\Vert^2_F=\sum_{i=1}^{n^{[l]}}\sum_{j=1}^{n^{[l-1]}}(w_{i,j}^{[l]})^2\\\\ Dropout\;\; regularization:\\ d3=np.randm.rand(a3.shape.shape[0],a3.shape[1]<keep.prob)\\ a3=np.multiply(a3,d3)\\ a3/=keep.prob$

other ways

early stopping
data augmentation

optimization problem

speed up the training of your neural network

Normalizing inputs

subtract mean

$\mu =\frac{1}{m}\sum _{i=1}^{m}x^{(i)}\\ x:=x-\mu$

normalize variance

$\sigma ^2=\frac{1}{m}\sum_{i=1}^m(x^{(i)})^2\\ x/=\sigma$

vanishing/exploding gradients

$y=w^{[l]}w^{[l-1]}...w^{[2]}w^{[1]}x\\ w^{[l]}>I\rightarrow (w^{[l]})^L\rightarrow\infty \\w^{[l]}<I\rightarrow (w^{[l]})^L\rightarrow0$

weight initialize

$var(w)=\frac{1}{n^{(l-1)}}\\ w^{[l]}=np.random.randn(shape)*np.sqrt(\frac{1}{n^{(l-1)}})$

gradient check

Numerical approximation

$f(\theta)=\theta^3\\ f'(\theta)=\frac{f(\theta+\varepsilon)-f(\theta-\varepsilon)}{2\varepsilon}$

grad check

$d\theta_{approx}[i]=\frac{J(\theta_1,...\theta_i+\varepsilon...)-J(\theta_1,...\theta_i-\varepsilon...)}{2\varepsilon}=d\theta[i]\\ check:\frac{\Vert d\theta_{approx}-d\theta\Vert_2}{\Vert d\theta_{approx}\Vert_2+\Vert d\theta\Vert_2}<10^{-7}$