spring boot项目基于mahout推荐算法实现商品推荐
spring boot 项目基于mahout实现的推荐系统
·
项目需求
当前正在开发一款电商app,优化需求中有“猜你喜欢"这样的功能。其本质就是基于用户对商品的操作行为寻找到与这个用户类似的一些用户,并把这些用户的一些当前用户没有接触过或者说操作过的商品推荐给用户,经过寻找,我发现mahout推荐引擎比较符合我的开发需求。
代码
我在查询了多个博客以后,终于汇总出了适合我的使用场景的集成mahout的代码,并且做了组件化调整,相关的代码如下
依赖:
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-dependencies</artifactId>
<version>2.7.5</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>com.baomidou</groupId>
<artifactId>mybatis-plus-boot-starter</artifactId>
<version>3.4.3.4</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-aop</artifactId>
</dependency>
<!-- 数据库驱动mariadb -->
<dependency>
<groupId>org.mariadb.jdbc</groupId>
<artifactId>mariadb-java-client</artifactId>
<version>2.7.5</version>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-mr</artifactId>
<version>0.12.2</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-jcl</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-analyzers-common</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<artifactId>jersey-client</artifactId>
<groupId>com.sun.jersey</groupId>
</exclusion>
<exclusion>
<artifactId>jersey-core</artifactId>
<groupId>com.sun.jersey</groupId>
</exclusion>
<exclusion>
<artifactId>jersey-apache-client4</artifactId>
<groupId>com.sun.jersey.contribs</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>cn.hutool</groupId>
<artifactId>hutool-all</artifactId>
<version>5.7.18</version>
</dependency>
<!-- swagger3-->
<dependency>
<groupId>io.springfox</groupId>
<artifactId>springfox-boot-starter</artifactId>
<version>3.0.0</version>
</dependency>
<dependency>
<groupId>com.github.xiaoymin</groupId>
<artifactId>knife4j-spring-boot-starter</artifactId>
<version>3.0.3</version>
<exclusions>
<exclusion>
<artifactId>swagger-models</artifactId>
<groupId>io.swagger</groupId>
</exclusion>
<exclusion>
<artifactId>swagger-annotations</artifactId>
<groupId>io.swagger</groupId>
</exclusion>
<exclusion>
<artifactId>guava</artifactId>
<groupId>com.google.guava</groupId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
推荐组件代码:
import com.example.mahoutdemo.config.FilterRescorer;
import com.example.mahoutdemo.utils.RecommendPreference;
import lombok.extern.slf4j.Slf4j;
import org.apache.mahout.cf.taste.common.Refreshable;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.IRStatistics;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.eval.RecommenderEvaluator;
import org.apache.mahout.cf.taste.eval.RecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.common.FastByIDMap;
import org.apache.mahout.cf.taste.impl.eval.AverageAbsoluteDifferenceRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.eval.GenericRecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.eval.RMSRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.model.GenericDataModel;
import org.apache.mahout.cf.taste.impl.model.GenericPreference;
import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.neighborhood.ThresholdUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericBooleanPrefItemBasedRecommender;
import org.apache.mahout.cf.taste.impl.recommender.GenericBooleanPrefUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.model.PreferenceArray;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.similarity.ItemSimilarity;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import org.springframework.lang.NonNull;
import java.util.Arrays;
import java.util.Collection;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
@Slf4j
public class Recommender {
private final DataModel dataModel;
private Recommender(DataModel dataModel) {
this.dataModel = dataModel;
}
public static Recommender build(List<? extends RecommendPreference> preferenceList) {
DataModel model = buildJdbcDataModel(preferenceList);
return new Recommender(model);
}
/**
* dataModel 有两种结构:
* GenericDataModel: 用户ID,物品ID,用户对物品的打分(UserID,ItemID,PreferenceValue)
* GenericBooleanPrefDataModel: 用户ID,物品ID (UserID,ItemID),这种方式表达用户是否浏览过该物品,但并未对物品进行打分。
* 因为系统需要根据用户行为或评分进行推荐所以使用GenericDataModel
* @param preferenceList 用户行为或评分集合
* @return DataModel
*/
private static DataModel buildJdbcDataModel(List<? extends RecommendPreference> preferenceList) {
FastByIDMap<PreferenceArray> fastByIdMap = new FastByIDMap<>();
Map<Long, List<RecommendPreference>> map = preferenceList.stream().collect(Collectors.groupingBy(RecommendPreference::getUserId));
Collection<List<RecommendPreference>> list = map.values();
for (List<RecommendPreference> preferences : list) {
GenericPreference[] array = new GenericPreference[preferences.size()];
for (int i = 0; i < preferences.size(); i++) {
RecommendPreference preference = preferences.get(i);
GenericPreference item = new GenericPreference(preference.getUserId(), preference.getItemId(), preference.getValue());
array[i] = item;
}
fastByIdMap.put(array[0].getUserID(), new GenericUserPreferenceArray(Arrays.asList(array)));
}
return new GenericDataModel(fastByIdMap);
}
/**
* 创建基于用户的推荐器
*
* @param similarityType 相似度算法类型
* @return UserBaseRecommender
*/
public UserBaseRecommender userBaseRecommender(String similarityType) throws TasteException {
return new UserBaseRecommender(similarityType);
}
/**
* 创建基于物品的推荐器
*
* @param similarityType 相似度算法类型
* @return UserBaseRecommender
*/
public ItemBaseRecommender itemBaseRecommender(String similarityType) throws TasteException {
return new ItemBaseRecommender(similarityType);
}
public class UserBaseRecommender {
private final UserSimilarity similarity;
private UserNeighborhood neighborhood;
public UserBaseRecommender(String similarity) throws TasteException {
this.similarity = userSimilarity(similarity, dataModel);
}
private UserSimilarity userSimilarity(String type, DataModel m) throws TasteException {
return (UserSimilarity) getSimilarity(type, m);
}
/**
* 计算最近邻域 基于固定数量的邻居,对每个用户取固定数量N个最近邻居
*
* @param num 固定邻居数量
* @return UserNeighborhood
*/
public UserRecommenderBuilder nearestUserNeighborhood(int num) throws TasteException {
neighborhood = new NearestNUserNeighborhood(num, similarity, dataModel);
return new UserRecommenderBuilder(similarity,neighborhood);
}
/**
* 计算最近邻域 基于相似度,对每个用户基于一定的限制,取落在相似度限制以内的所有用户为邻居
*
* @param num 最低相似度
* @return UserNeighborhood
*/
public UserRecommenderBuilder thresholdUserNeighborhood(double num) throws TasteException {
neighborhood = new ThresholdUserNeighborhood(num, similarity, dataModel);
return new UserRecommenderBuilder(similarity,neighborhood);
}
}
public class UserRecommenderBuilder {
private final UserSimilarity similarity;
private final UserNeighborhood neighborhood;
private UserRecommenderBuilder(@NonNull UserSimilarity similarity, @NonNull UserNeighborhood neighborhood) {
this.similarity = similarity;
this.neighborhood = neighborhood;
}
/**
* 获取基于用户的推荐器
*
* @param pref 是否需要首选,即偏好
* @return RecommenderBuilder
*/
public SimpleRecommender recommender(boolean pref) throws TasteException {
return new SimpleRecommender(pref
//基于用户的推荐引擎
? model -> new GenericUserBasedRecommender(model, neighborhood, similarity)
//基于用户的无偏好值推荐引擎
: model -> new GenericBooleanPrefUserBasedRecommender(model, neighborhood, similarity));
}
}
class ItemBaseRecommender {
private final ItemSimilarity similarity;
public ItemBaseRecommender(String similarity) throws TasteException {
this.similarity = itemSimilarity(similarity, dataModel);
}
public ItemSimilarity itemSimilarity(String type, DataModel m) throws TasteException {
return (ItemSimilarity) getSimilarity(type, m);
}
/**
* 获取基于用户的推荐器
*
* @param pref 是否需要首选
* @return RecommenderBuilder
*/
public SimpleRecommender recommender(boolean pref) throws TasteException {
return new SimpleRecommender(pref
//基于物品的推荐引擎
? model -> new GenericItemBasedRecommender(model, similarity)
//基于物品的无偏好值推荐引擎
: model -> new GenericBooleanPrefItemBasedRecommender(model, similarity));
}
}
private static Refreshable getSimilarity(String type, DataModel m) throws TasteException {
switch (type) {
case RecommenderConstants.SIMILARITY_PEARSON:
return new PearsonCorrelationSimilarity(m);
case RecommenderConstants.SIMILARITY_COSINE:
return new UncenteredCosineSimilarity(m);
case RecommenderConstants.SIMILARITY_TANIMOTO:
return new TanimotoCoefficientSimilarity(m);
case RecommenderConstants.SIMILARITY_LOGLIKELIHOOD:
return new LogLikelihoodSimilarity(m);
case RecommenderConstants.SIMILARITY_CITY_BLOCK:
return new CityBlockSimilarity(m);
case RecommenderConstants.SIMILARITY_SPEARMAN:
return new SpearmanCorrelationSimilarity(m);
case RecommenderConstants.SIMILARITY_EUCLIDEAN:
default:
return new SpearmanCorrelationSimilarity(m);
}
}
public class SimpleRecommender {
private final RecommenderBuilder builder;
private final org.apache.mahout.cf.taste.recommender.Recommender recommender;
private SimpleRecommender(RecommenderBuilder builder) throws TasteException {
this.builder = builder;
this.recommender = builder.buildRecommender(dataModel);
}
public List<RecommendedItem> recommend(long userId, int howMany) throws TasteException {
return recommender.recommend(userId, howMany);
}
public List<RecommendedItem> recommend(long userId, int howMany, FilterRescorer rescorer) throws TasteException {
return recommender.recommend(userId, howMany, rescorer);
}
/**
* 计算用户的预测评分和实际评分之间的“均方根”差异。这是这个差值的平均值的平方根
*/
public Evaluator rmsEvaluator() {
return new Evaluator(builder, new RMSRecommenderEvaluator());
}
/**
* 计算平均差值的比较器
*/
public Evaluator averageAbsoluteDifferenceEvaluator() {
return new Evaluator(builder, new AverageAbsoluteDifferenceRecommenderEvaluator());
}
}
public class Evaluator {
private final RecommenderBuilder builder;
private final RecommenderEvaluator recommenderEvaluator;
private Evaluator(RecommenderBuilder builder, RecommenderEvaluator recommenderEvaluator) {
this.builder = builder;
this.recommenderEvaluator = recommenderEvaluator;
}
/**
* 推荐器进度评估,数值越低精度越高
* 返回值越小,推荐算法越好,0是最低/最好的评估值,意味着完美匹配。
* @param trainPt 培训数据占比
*/
public double evaluate(double trainPt) throws TasteException {
return recommenderEvaluator.evaluate(builder, null, dataModel, trainPt, 1.0);
}
public IRStatistics statsEvaluator(int topn) throws TasteException {
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
return evaluator.evaluate(builder, null, dataModel, null, topn, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);
}
}
}
public interface RecommendPreference {
long getUserId();
long getItemId();
float getValue();
}
推荐相关常量:
public interface RecommenderConstants {
/**
* 皮尔森相似度算法
*-原理:用来反映两个变量线性相关程度的统计量
*-范围:[-1,1],绝对值越大,说明相关性越强,负相关对于推荐的意义小。
*-说明:1、 不考虑重叠的数量;
* 2、 如果只有一项重叠,无法计算相似性(计算过程被除数有n-1);
* 3、 如果重叠的值都相等,也无法计算相似性(标准差为0,做除数)。
* 该相似度并不是最好的选择,也不是最坏的选择,只是因为其容易理解,在早期研究中经常被提起。
* 使用Pearson线性相关系数必须假设数据是成对地从正态分布中取得的,并且数据至少在逻辑范畴内必须是等间距的数据。
* Mahout中,为皮尔森相关计算提供了一个扩展,通过增加一个枚举类型(Weighting)的参数来使得重叠数也成为计算相似度的影响因子。
*/
String SIMILARITY_PEARSON = "pearson";
/**
* 欧氏距离相似度算法
*-原理:利用欧式距离d定义的相似度s,s=1 / (1+d)。
*-范围:[0,1],值越大,说明d越小,也就是距离越近,则相似度越大。
*-说明:同皮尔森相似度一样,该相似度也没有考虑重叠数对结果的影响,同样地,
* Mahout通过增加一个枚举类型(Weighting)的参数来使得重叠数也成为计算相似度的影响因子。
*/
String SIMILARITY_EUCLIDEAN = "euclidean";
/**
* 余弦相似度算法
* 原理:多维空间两点与所设定的点形成夹角的余弦值。
* 范围:[-1,1],值越大,说明夹角越大,两点相距就越远,相似度就越小。
* 说明:在数学表达中,如果对两个项的属性进行了数据中心化,计算出来的余弦相似度和皮尔森相似度是一样的,
* 在mahout中,实现了数据中心化的过程,所以皮尔森相似度值也是数据中心化后的余弦相似度。
* 另外在新版本中,Mahout提供了UncenteredCosineSimilarity类作为计算非中心化数据的余弦相似度。
*/
String SIMILARITY_COSINE = "cosine";
/**
* Spearman秩相关系数相似度算法
* 原理:Spearman秩相关系数通常被认为是排列后的变量之间的Pearson线性相关系数。
* 范围:{-1.0,1.0},当一致时为1.0,不一致时为-1.0。
* 说明:计算非常慢,有大量排序。针对推荐系统中的数据集来讲,用Spearman秩相关系数作为相似度量是不合适的。
*/
String SIMILARITY_SPEARMAN = "spearman";
/**
* 基于Manhattan距离相似度
* 原理:曼哈顿距离的实现,同欧式距离相似,都是用于多维数据空间距离的测度
* 范围:[0,1],同欧式距离一致,值越小,说明距离值越大,相似度越大。
* 说明:比欧式距离计算量少,性能相对高。
*/
String SIMILARITY_CITY_BLOCK = "cityBlock";
/**
* 相似性,基于对数似然比的相似度
* 原理:重叠的个数,不重叠的个数,都没有的个数
* 范围:具体可去百度文库中查找论文《Accurate Methods for the Statistics of Surprise and Coincidence》
* 说明:处理无打分的偏好数据,比Tanimoto系数的计算方法更为智能。
*/
String SIMILARITY_LOGLIKELIHOOD = "loglikelihood";
/**
* 基于谷本系数计算相似度
* 原理:又名广义Jaccard系数,是对Jaccard系数的扩展,等式为
* 范围:[0,1],完全重叠时为1,无重叠项时为0,越接近1说明越相似。
* 说明:处理无打分的偏好数据。
*/
String SIMILARITY_TANIMOTO = "tanimoto";
}
基于组件我做了一个简单的demo 从数据库中取数据,然后通过推荐引擎获取推荐数据
源码地址:https://gitee.com/fenglifei/mahout-demo.git
本文参考博客如下:
https://blog.csdn.net/u013473512/article/details/78694958
https://blog.csdn.net/czp11210/article/details/49813833
https://blog.csdn.net/weixin_59823583/article/details/127078413
开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!
更多推荐
已为社区贡献1条内容
所有评论(0)