当前位置: 首页 > news >正文

【Machine Learning】Suitable Learning Rate in Machine Learning

一、The cases of different learning rates:

        In the gradient descent algorithm model:

w = w - \alpha \frac{ \partial J(w,b) }{ \partial w }

        \alpha is the learning rate of the demand, how to determine the learning rate, and what impact does it have if it is too large or too small? We will analyze it through the following graph:

        We can use the same method as before to understand this equation, so that b in J (w, b) is 0, and then we can create a two-dimensional coordinate graph:

        So let's first observe the case of a smaller learning rate (starting from F):

        In this case, there is a high probability that the minimum point can be found, which means that it can eventually converge.

        Then there are situations with high learning rates:

        We can find that when the learning rate is high but within a certain limit, convergence can also be achieved. The reason for this can be started from the formula. Whenever a point drops to a point with a smaller slope, its learning rate remains unchanged, but the slope decreases, and it will eventually continue to decline until convergence. However, will this situation continue? We can take a look at the following situation:

        The difference between this and the above is that when descending, it may just skip the optimal point, which may result in the convergence value not being optimal.

        Finally, there is the case of divergence:

        So the situation is roughly like these:

        In the picture, loss is an indicator that measures the difference between the predicted results of the model and the actual labels, and epoch is a complete training process in the gradient descent algorithm, which includes multiple iterations of parameter updates.

二、How to choose the Suitable Learning Rate:

        In algorithm design, we should adjust the learning rate in real time and determine the size of the adjustment by observing the fitted model. After each iteration, use the estimated model parameters to view the value of the error function. If the error rate decreases compared to the previous iteration, the learning rate can be increased. If the error rate increases compared to the previous iteration, the value of the previous iteration should be reset and the learning rate reduced to 50% of the previous iteration. Therefore, this is a method of adaptive learning rate adjustment. There are simple and direct methods for dynamically changing learning rates in deep learning frameworks such as Caffe and TensorFlow.

        The commonly used learning rates are 0.00001, 0.0001, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, 10

http://www.lryc.cn/news/319404.html

相关文章:

  • 力扣每日一题 矩阵中移动的最大次数 DP
  • 计算机网络 |内网穿透
  • 爬虫学习 Scrapy中间件代理UA随机selenium使用
  • React理念——Fiber架构的主要原理
  • [蓝桥杯练习题]确定字符串是否包含唯一字符/确定字符串是否是另一个的排列
  • 鸿蒙Harmony应用开发—ArkTS声明式开发(容器组件:UIExtensionComponent (系统接口))
  • Jenkins: 配合docker来部署项目
  • Leetcode 22. 括号生成
  • ChatGPT编程—实现小工具软件(批量替换文本、批量处理图像文件)
  • 更安全的C gets()和str* 以及fgets和strcspn的用法
  • 专升本 C语言笔记-07 逗号运算符
  • k8s之图形界面DashBoard【九】
  • 基于Java+Springmvc+vue+element实现高校心理健康系统详细设计和实现
  • python --阿里云(智能媒体管理/视频点播)
  • 湖南麒麟SSH服务漏洞
  • 升级ChatGPT4.0失败的解决方案
  • 常用图像滤波器,图像增强
  • 【PyTorch】成功解决ModuleNotFoundError: No module named ‘torch’
  • CommandInvokationFailure: Failed to update Android SDK package list. 报错的解决方法
  • 9.用FFmpeg测试H.264文件的解码时间
  • 重建3D结构方式 | 显式重建与隐式重建(Implicit Reconstruction)
  • 模型的参数量、计算量、延时等的关系
  • Java映射(含源码)
  • JMeter 面试题及答案整理,最新面试题
  • lua脚本的基础内容
  • MySQL语法分类 DDL(1)
  • 苹果Find My App用处多多,产品认准伦茨科技ST17H6x芯片
  • Lua中文语言编程源码-第三节,更改lualib.h Lua标准库, 使Lua支持中文关键词(与所有的基础库相关)
  • Vue | 使用 ECharts 绘制折线图
  • NVENC 视频编码器 API 编程指南 ( 中文转译 )