当前位置: 首页 > news >正文

机器学习-问答题准备(英文)-更新中

第一章 入门

  1. How would you define Machine Learning?
    Machine Learning is about building systems that can learn from data. Learning means getting better at some task, given some performance measure.

  2. Can you name four types of problems where it shines?
    To replace long lists of hand-tuned rules, like filtering spam emails; to make better recommendations for customers; to build systems that adapt to fluctuating environments; to recognize voice and pictures.

  3. What is a labeled training set?
    It contains desired solution (a.k.a. a label) for each instance.

  4. What are the two most common supervised tasks?
    Regression and classification.

  5. Can you name four common unsupervised tasks?
    Clustering, dimensionality reduction, visualization and association rule learning.

  6. What type of Machine Learning algorithm would you use to allow a robot to walk in various unknown terrains?
    Reinforcement learning.

  7. What type of algorithm would you use to segment your customers into multiple groups?
    If you don’t know what groups you would like to have, then you can use clustering, otherwise use classification instead.

  8. Would you frame the problem of spam detection as a supervised learning problem or an unsupervised learning problem?
    It’s a supercised learning problem.

  9. What is an online learning system?
    It can learn incrementally. This makes it capable of adapting rapidly ro both changing data and autonomous systems, and of training on very large quantities of data.

  10. What is out-of-core learning?
    Out-of-core algorithms can handle vast quantities of data that cannot fit in a computer’s main memory. An out-of-core learning algorithm chops the data into mini-batches and uses online learning techniques to learn from these minibatches.

  11. What type of learning algorithm relies on a similarity measure to make predictions?
    Instance-based learning.

  12. What is the difference between a model parameter and a learning algorithm’s hyperparameter?
    A model has one or more model parameters that determine what it will predict given a new instance, for example, the slope of a linear model. A hyperparameter is a parameter of the learning algorithm itself, not of the model, for example, the amount of regularization to apply.

  13. What do model-based learning algorithms search for? What is the most common strategy they use to succeed? How do they make predictions?
    They search for an optimal value for the model parameters such that the model will generalize well to new instances. We usually train such systems by minimizing a cost function that measures how bad the system is at making predictions on training data, plus a penalty for model complexity if the model is regularized. To make predictions, we feed the new instance’s features into the model’s prediction function, using the parameter values found by the learning system.

  14. Can you name four of the main challenges in Machine Learning?
    Lack of data, poor data quality, nonrepresentative data, excessively complex models that overfit the data.

  15. If your model performs great on the training data but generalizes poorly to new instances, what is happening? Can you name three possible solutions?
    It seems that the model is likely overfitting the training data. Possible solutions such as getting more data, simplifying the model, and reducing the noise in the training data.

  16. What is a test set, and why would you want to use it?
    A test set is used to estimate the generalization error that a model will make on new instances, before the model is launched in production.

  17. What is the purpose of a validation set?
    A validation set is used to compare models. It helps to select the best model and tune the hyperparameters.

  18. What is the train-dev set, when do you need it, and how do you use it?
    The train-dev set is a part of the training set that’s held out(the model is not trained on it).
    It’s used when there is a risk of mismatch between the training data the validation plus test data.
    The model is trained on the rest of the training set, and evaluated on both the train-dev set and the validation set.If the model performs well on the traing set but not on the train-dev set, then the model is overfitting the traing set. If the model performs well on train-dev set, but not on the validation set, then there is probably a significant data mismatch between the traing data and the validation + test data.

  19. What can go wrong if you tune hyperparameters using the test set?
    If you do so, you risk overfitting the test set and the generalization error will be optimistic.

http://www.lryc.cn/news/58667.html

相关文章:

  • 展示演示软件设计制作(C语言)
  • Android 自定义view 入门 案例
  • [imangazaliev/didom]一个简单又快速的DOM操作库
  • Cookie和Session的工作流程及区别(附代码案例)
  • 适用于高级别自动驾驶的驾驶员可预见误用仿真测试
  • Linux之进程知识点
  • 一种供水系统物联网监测系统
  • Linux驱动开发——字符设备(2)
  • 【MySQL数据库原理】MySQL Community安装与配置
  • 【ROS参数服务器增删改c++操作1】
  • elasticsearch 常用数据类型详解和范例
  • 力扣119杨辉三角 II:代码实现 + 方法总结(数学规律法 记忆法/备忘录)
  • 安装pandas遇到No module named ‘_bz2’ 的解决方案
  • 【数据治理-05】什么数据才是货真价实的数据资产,一起聊聊数据资产
  • 第三章 ARM处理器体系结构【嵌入式系统】
  • 最速下降法
  • R语言实践——ggplot2+ggrepel绘制散点+优化注释文本位置
  • [TIFS 2022] FLCert:可证明安全的联邦学习免受中毒攻击
  • css3关键帧动画
  • 在 macOS Mojave 之后的每一个版本中都隐藏着比特币白皮书(Bitcoin Whitepaper)
  • 一文看懂SpringBoot操纵数据库
  • 科普:java与C++的区别
  • 突发!ChatGPT疯了!
  • docker-compose容器编排使用详解+示例
  • 可用的rtsp ,rtmp地址以及使用VLC和ffmpeg 播放视频流
  • Python机器学习:朴素贝叶斯
  • 几个最基本软件的环境变量配置
  • 物业企业如何加快向现代服务业转型
  • java ssm人力资源系统Y3程序
  • leetcode重点题目分类别记录(三)动态规划深入与素数理论