当前位置: 首页 > news >正文

Item-Based Recommendations with Hadoop

Mahout在MapReduce上实现了Item-Based Collaborative Filtering,这里我尝试运行一下。

  1. 安装Hadoop

  2. 从下载Mahout并解压

  3. 准备数据
    下载1 Million MovieLens Dataset,解压得到ratings.dat,用

    sed ‘s/:😦[0-9]{1,}):😦[0-9]{1})::[0-9]{1,}$/,\1,\2/’ ratings.dat
    处理成需要的格式。

  4. 运行
    mahout recommenditembased -s SIMILARITY_LOGLIKELIHOOD -i /path/to/input/file -o /path/to/desired/output -n 25
    参数:

MAHOUT-JOB: /home/laxe/apple/mahout/mahout-examples-0.11.0-job.jar
Job-Specific Options:
--input (-i) input Path to job input directory.
--output (-o) output The directory pathname for output.
--numRecommendations (-n) numRecommendations Number of recommendations per user.
--usersFile usersFile File of users to recommend for.
--itemsFile itemsFile File of items to recommend for.
--filterFile (-f) filterFile File containing comma-separated userID,itemID pairs. Used to exclude the item from the recommendations for that user(optional).
--userItemFile (-uif) userItemFile File containing comma-separated userID,itemID pairs(optional). Used to include only these items into recommendations. Cannot be used together with usersFile or itemsFile.
--booleanData (-b) booleanData Treat input as without prefvalues.
--maxPrefsPerUser (-mxp) maxPrefsPerUser Maximum number of preferences considered per user in final recommendation phase.
--minPrefsPerUser (-mp) minPrefsPerUser Ignore users with less preferences than this in the similarity computation (default: 1).
--maxSimilaritiesPerItem (-m) maxSimilaritiesPerItem Maximum number of similarities considered per item.
--maxPrefsInItemSimilarity (-mpiis) maxPrefsInItemSimilarity Max number of preferences to consider per user or item in the item similarity computation phase, users or items with more preferences will be sampled down(default: 500).
--similarityClassname (-s) similarityClassname Name of distributed similarity measures class to instantiate,
alternatively use one of the predefined similarities([SIMILARITY_COOCCURRENCE, SIMILARITY_LOGLIKELIHOOD, SIMILARITY_TANIMOTO_COEFFICIENT, SIMILARITY_CITY_BLOCK, SIMILARITY_COSINE, SIMILARITY_PEARSON_CORRELATION, SIMILARITY_EUCLIDEAN_DISTANCE])
--threshold (-tr) threshold Discard item pairs with a similarity value below this.
--outputPathForSimilarityMatrix (-opfsm) outputPathForSimilarityMatrix Write the items imilarity matrix to this path(optional).
--randomSeed randomSeed Use this seed for sampling.
--sequencefileOutput Write the output into a Sequence File instead of a text file.
--help (-h) Print out help.
--tempDir tempDir Intermediate output directory.
--startPhase startPhase First phase to run.
--endPhase endPhase Last phase to run specify HDFS directories while running on hadoop; else specify local file system directories.

参考
Introduction to Item-Based Recommendations with Hadoop
mahout分布式:Item-based推荐

http://www.lryc.cn/news/163549.html

相关文章:

  • 基于物理层网络编码的相位同步算法matlab仿真
  • 数据结构——七大排序[源码+动图+性能测试]
  • G. The Morning Star
  • 电池的健康状态 SOH 估计
  • Web 安全之 Permissions Policy(权限策略)详解
  • 【黄啊码】nginx如何设置php运行的
  • 无涯教程-JavaScript - ISPMT函数
  • LeetCode 面试题 03.05. 栈排序
  • 构建微服务项目时启动网关服务失败的解决方案
  • 零基础教程:使用yolov8训练无人机VisDrone数据集
  • 【Mysql专题】使用Mysql做排行榜,线上实例
  • matlab数据处理: cell table array+datetime
  • 如何应用运营商大数据精准营销?
  • AJAX学习笔记5同步与异步理解
  • 911面试
  • 【Java基础篇 | 面向对象】—— 继承
  • DELL precision上安装nvidia A4000驱动 cuda cudnn
  • 数据结构算法刷题(29)动态规划
  • W11下CMake MinGW配置OpenCV和Qt
  • 反转字符串 反转字符串 || 反转字符串 |||
  • XML解析 不允许有匹配 _[xX][mM][lL]_ 的处理指令目标
  • 【C++进阶(五)】STL大法--list模拟实现以及list和vector的对比
  • Docker安装RabbitMQ集群_亲测成功
  • 50道基础数据结构面试题
  • 【Linux基础】权限管理
  • C++初阶--类和对象(中)
  • 【MySQL系列】视图特性
  • 管理类联考——数学——汇总篇——知识点突破——应用题——最值问题
  • 学习SpringMvc第二战之【SpringMVC之综合案例】
  • 【算法日志】单调栈: 单调栈简介及其应用