当前位置：首页 > news >正文

Elasticsearch实战（四）：Springboot实现Elasticsearch指标聚合与下钻分析open-API

news 2025/7/7 20:01:36

文章目录

系列文章索引
一、指标聚合与分类
- 1、什么是指标聚合（Metric）
- 2、Metric聚合分析分为单值分析和多值分析两类
- 3、概述
二、单值分析API设计
- 1、Avg(平均值)
- - （1）对所有文档进行avg聚合（DSL）
  - （2）对筛选后的文档聚合
  - （3）根据Script计算平均值
  - （4）总结
- 2、Max(最大值)
- - （1）统计所有文档
  - （2）统计过滤后的文档
- 3、Min(最小值)
- - （1）统计所有文档
  - （2）统计筛选后的文档
- 4、Sum(总和)
- - （1）统计所有文档汇总
- 5、Cardinality(唯一值)
- - （1）统计所有文档
  - （2）统计筛选后的文档
三、多值分析API设计
- 1、Stats Aggregation
- - （1）统计所有文档
  - （2）统计筛选文档
- 2、扩展状态统计
- - （1）统计所有文档
  - （2）统计筛选后的文档
- 3、百分位度量/百分比统计
- - （1）统计所有文档
  - （2）统计筛选后的文档
- 4、百分位等级/百分比排名聚合
- - （1）统计所有文档
  - （2）统计过滤后的文档
四、JavaAPI实现

系列文章索引

Elasticsearch实战（一）：Springboot实现Elasticsearch统一检索功能
Elasticsearch实战（二）：Springboot实现Elasticsearch自动汉字、拼音补全，Springboot实现自动拼写纠错
Elasticsearch实战（三）：Springboot实现Elasticsearch搜索推荐
Elasticsearch实战（四）：Springboot实现Elasticsearch指标聚合与下钻分析
Elasticsearch实战（五）：Springboot实现Elasticsearch电商平台日志埋点与搜索热词

一、指标聚合与分类

1、什么是指标聚合（Metric）

聚合分析是数据库中重要的功能特性，完成对某个查询的数据集中数据的聚合计算，
如：找出某字段（或计算表达式的结果）的最大值、最小值，计算和、平均值等。
ES作为搜索引擎兼数据库，同样提供了强大的聚合分析能力。
对一个数据集求最大值、最小值，计算和、平均值等指标的聚合，在ES中称为指标聚合。

2、Metric聚合分析分为单值分析和多值分析两类

1、单值分析，只输出一个分析结果
min,max,avg,sum,cardinality（cardinality 求唯一值，即不重复的字段有多少（相当于mysql中的distinct）
2、多值分析，输出多个分析结果
stats,extended_stats,percentile,percentile_rank

3、概述

官网：https://www.elastic.co/guide/en/elasticsearch/reference/7.4/search-aggregations-metrics.html
语法：

"aggregations" : {"<aggregation_name>" : { <!--聚合的名字 -->"<aggregation_type>" : { <!--聚合的类型 --><aggregation_body> <!--聚合体：对哪些字段进行聚合 -->}[,"meta" : { [<meta_data_body>] } ]? <!--元 -->[,"aggregations" : { [<sub_aggregation>]+ } ]? <!--在聚合里面在定义子聚合-->}[,"<aggregation_name_2>" : { ... } ]* <!--聚合的名字 -->
}

openAPI设计目标与原则：
1、DSL调用与语法进行高度抽象，参数动态设计
2、Open API通过结果转换器支持上百种组合调用qurey,constant_score,match/matchall/filter/sort/size/frm/higthlight/_source/includes
3、逻辑处理公共调用，提升API业务处理能力
4、保留原生API与参数的用法

二、单值分析API设计

1、Avg(平均值)

从聚合文档中提取的价格的平均值。

（1）对所有文档进行avg聚合（DSL）

POST product_list_info/_search
{"size": 0,"aggs": {"result": {"avg": {"field": "price"}}}
}

以上汇总计算了所有文档的平均值。
“size”: 0, 表示只查询文档聚合数量，不查文档，如查询50，size=50
aggs：表示是一个聚合
result：可自定义，聚合后的数据将显示在自定义字段中

OpenAPI查询参数设计：

{"indexName": "product_list_info","map": {"size": 0,"aggs": {"result": {"avg": {"field": "price"}}}}
}

（2）对筛选后的文档聚合

POST product_list_info/_search
{"size": 0,"query": {"term": {"onelevel": "手机通讯"}},"aggs": {"result": {"avg": {"field": "price"}}}
}

OpenAPI查询参数设计：

{"indexName": "product_list_info","map": {"size": 0,"query": {"term": {"onelevel": "手机通讯"}},"aggs": {"result": {"avg": {"field": "price"}}}}
}

（3）根据Script计算平均值

es所使用的脚本语言是painless这是一门安全-高效的脚本语言,基于jvm的

#统计所有
POST product_list_info/_search?size=0
{"aggs": {"result": {"avg": {"script": {"source": "doc.evalcount.value"}}}}
}
结果："value" : 599929.2282791147
"source": "doc['evalcount']"
"source": "doc.evalcount"

#有条件
POST product_list_info/_search?size=0
{"query": {"term": {"onelevel": "手机通讯"}},"aggs": {"czbk": {"avg": {"script": {"source": "doc.evalcount"}}}}
}
结果："value" : 600055.6935087288

OpenAPI查询参数设计：

{"indexName": "product_list_info","map": {"size": 0,"aggs": {"czbk": {"avg": {"script": {"source": "doc.evalcount"}}}}}
}

（4）总结

avg平均
1、统一avg（所有文档）
2、有条件avg（部分文档）
3、脚本统计（所有）
4、脚本统计（部分）

2、Max(最大值)

计算从聚合文档中提取的数值的最大值。

（1）统计所有文档

POST product_list_info/_search
{"size": 0,"aggs": {"result": {"max": {"field": "price"}}}
}

结果： “value” : 9.9999999E7

OpenAPI查询参数设计：

{"indexName": "product_list_info","map": {"size": 0,"aggs": {"result": {"max": {"field": "price"}}}}
}

（2）统计过滤后的文档

POST product_list_info/_search
{"size": 0,"query": {"term": {"onelevel": "手机通讯"}},"aggs": {"result": {"max": {"field": "price"}}}
}

结果： “value” : 2474000.0

OpenAPI查询参数设计：

{"indexName": "product_list_info","map": {"size": 0,"query": {"term": {"onelevel": "手机通讯"}},"aggs": {"czbk": {"max": {"field": "price"}}}}
}

结果： “value” : 2474000.0

3、Min(最小值)

计算从聚合文档中提取的数值的最小值。

（1）统计所有文档

POST product_list_info/_search
{"size": 0,"aggs": {"result": {"min": {"field": "price"}}}
}

结果：“value”: 0.0

OpenAPI查询参数设计：

{"indexName": "product_list_info","map": {"size": 0,"aggs": {"result": {"min": {"field": "price"}}}}
}

（2）统计筛选后的文档

POST product_list_info/_search
{"size": 1,"query": {"term": {"onelevel": "手机通讯"}},"aggs": {"czbk": {"min": {"field": "price"}}}
}

结果：“value”: 0.0

参数size=1；可查询出金额为0的数据

OpenAPI查询参数设计：

{"indexName": "product_list_info","map": {"size": 1,"query": {"term": {"onelevel": "手机通讯"}},"aggs": {"result": {"min": {"field": "price"}}}}
}

4、Sum(总和)

（1）统计所有文档汇总

POST product_list_info/_search
{"size": 0,"query": {"constant_score": {"filter": {"match": {"threelevel": "手机"}}}},"aggs": {"result": {"sum": {"field": "price"}}}
}

结果：“value” : 3.433611809E7

OpenAPI查询参数设计：

{"indexName": "product_list_info","map": {"size": 0,"query": {"constant_score": {"filter": {"match": {"threelevel": "手机"}}}},"aggs": {"result": {"sum": {"field": "price"}}}}
}

5、Cardinality(唯一值)

Cardinality Aggregation，基数聚合。它属于multi-value，基于文档的某个值（可以是特定的字段，也可以通过脚本计算而来），计算文档非重复的个数（去重计数），相当于sql中的distinct。

cardinality 求唯一值，即不重复的字段有多少（相当于mysql中的distinct）

（1）统计所有文档

POST product_list_info/_search
{"size": 0,"aggs": {"result": {"cardinality": {"field": "storename"}}}
}

结果：“value” : 103169

OpenAPI查询参数设计：

{"indexName": "product_list_info","map": {"size": 0,"aggs": {"result": {"cardinality": {"field": "storename"}}}}
}

（2）统计筛选后的文档

POST product_list_info/_search
{"size": 0,"query": {"constant_score": {"filter": {"match": {"threelevel": "手机"}}}},"aggs": {"result": {"cardinality": {"field": "storename"}}}
}

OpenAPI查询参数设计：

{"indexName": "product_list_info","map": {"size": 0,"query": {"constant_score": {"filter": {"match": {"threelevel": "手机"}}}},"aggs": {"result": {"cardinality": {"field": "storename"}}}}
}

三、多值分析API设计

1、Stats Aggregation

Stats Aggregation，统计聚合。它属于multi-value，基于文档的某个值（可以是特定的数值型字段，也可以通过脚本计算而来），计算出一些统计信息（min、max、sum、count、avg 5个值）

（1）统计所有文档

POST product_list_info/_search
{"size": 0,"aggs": {"result": {"stats": {"field": "price"}}}
}返回
"aggregations" : {"result" : {"count" : 5072447,"min" : 0.0,"max" : 9.9999999E7,"avg" : 920.1537270512633,"sum" : 4.66743101232E9

OpenAPI查询参数设计：

{"indexName": "product_list_info","map": {"size": 0,"aggs": {"result": {"stats": {"field": "price"}}}}
}

（2）统计筛选文档

POST product_list_info/_search
{"size": 0,"query": {"constant_score": {"filter": {"match": {"threelevel": "手机"}}}},"aggs": {"result": {"stats": {"field": "price"}}}
}

OpenAPI查询参数设计：

{"indexName": "product_list_info","map": {"size": 0,"query": {"constant_score": {"filter": {"match": {"threelevel": "手机"}}}},"aggs": {"result": {"stats": {"field": "price"}}}}
}

2、扩展状态统计

Extended Stats Aggregation，扩展统计聚合。它属于multi-value，比stats多4个统计结果：平方和、方差、标准差、平均值加/减两个标准差的区间

（1）统计所有文档

POST product_list_info/_search
{"size": 0,"aggs": {"result": {"extended_stats": {"field": "price"}}}
}
返回：
aggregations" : {"result" : {"count" : 5072447,"min" : 0.0,"max" : 9.9999999E7,"avg" : 920.1537270512633,"sum" : 4.66743101232E9,"sum_of_squares" : 2.0182209054045464E16,"variance" : 3.9779448262354884E9,"std_deviation" : 63070.950731977144,"std_deviation_bounds" : {"upper" : 127062.05519100555,"lower" : -125221.74773690302}

sum_of_squares:平方和
variance：方差
std_deviation：标准差
std_deviation_bounds：标准差的区间

OpenAPI查询参数设计：

{"indexName": "product_list_info","map": {"size": 0,"aggs": {"result": {"extended_stats": {"field": "price"}}}}
}

（2）统计筛选后的文档

POST product_list_info/_search
{"size": 1,"query": {"constant_score": {"filter": {"match": {"threelevel": "手机"}}}},"aggs": {"result": {"extended_stats": {"field": "price"}}}
}结果;
aggregations" : {"result" : {"count" : 12402,"min" : 0.0,"max" : 2474000.0,"avg" : 2768.595233833253,"sum" : 3.433611809E7,"sum_of_squares" : 6.445447222627729E12,"variance" : 5.120451870452684E8,"std_deviation" : 22628.41547800615,"std_deviation_bounds" : {"upper" : 48025.42618984555,"lower" : -42488.23572217905

sum_of_squares:平方和
variance：方差
std_deviation：标准差
std_deviation_bounds：标准差的区间

OpenAPI查询参数设计：

{"indexName": "product_list_info","map": {"size": 1,"query": {"constant_score": {"filter": {"match": {"threelevel": "手机"}}}},"aggs": {"czbk": {"extended_stats": {"field": "price"}}}}
}

3、百分位度量/百分比统计

Percentiles Aggregation，百分比聚合。它属于multi-value，对指定字段（脚本）的值按从小到大累计每个值对应的文档数的占比（占所有命中文档数的百分比），返回指定占比比例对应的值。默认返回[1, 5, 25, 50, 75, 95, 99 ]分位上的值。

它们表示了人们感兴趣的常用百分位数值。

（1）统计所有文档

POST product_list_info/_search
{"size": 0,"aggs": {"result": {"percentiles": {"field": "price"}}}
}返回:
aggregations" : {"result" : {"values" : {"1.0" : 0.0,"5.0" : 15.021825109603165,"25.0" : 58.669333121791,"50.0" : 139.7398105623917,"75.0" : 388.2363222057536,"95.0" : 3630.78148822216,"99.0" : 12561.562823894474}}

OpenAPI查询参数设计：

{"indexName": "product_list_info","map": {"size": 0,"aggs": {"result": {"percentiles": {"field": "price"}}}}
}

（2）统计筛选后的文档

POST product_list_info/_search
{"size": 0,"query": {"constant_score": {"filter": {"match": {"threelevel": "手机"}}}},"aggs": {"result": {"percentiles": {"field": "price"}}}
}

OpenAPI查询参数设计：

{"indexName": "product_list_info","map": {"size": 0,"query": {"constant_score": {"filter": {"match": {"threelevel": "手机"}}}},"aggs": {"result": {"percentiles": {"field": "price"}}}}
}

4、百分位等级/百分比排名聚合

百分比排名聚合：这里有另外一个紧密相关的度量叫 percentile_ranks 。 percentiles 度量告诉我们落在某个百分比以下的所有文档的最小值。

（1）统计所有文档

统计价格在15元之内统计价格在30元之内文档数据占有的百分比

tips：
统计数据会变化
这里的15和30；完全可以理解万SLA的200；比较字段不一样而已

POST product_list_info/_search
{"size": 0,"aggs": {"result": {"percentile_ranks": {"field": "price","values": [15,30]}}}
}返回：
价格在15元之内的文档数据占比是4.92%
价格在30元之内的文档数据占比是12.72%
aggregations" : {"result" : {"values" : {"15.0" : 4.92128378837021,"30.0" : 12.724827959646579}}
}

OpenAPI查询参数设计：

{"indexName": "product_list_info","map": {"size": 0,"aggs": {"result": {"percentile_ranks": {"field": "price","values": [15,30]}}}}
}

（2）统计过滤后的文档

POST product_list_info/_search
{"size": 0,"query": {"constant_score": {"filter": {"match": {"threelevel": "手机"}}}},"aggs": {"result": {"percentile_ranks": {"field": "price","values": [15,30]}}}
}

OpenAPI查询参数设计：

{"indexName": "product_list_info","map": {"size": 0,"query": {"constant_score": {"filter": {"match": {"threelevel": "手机"}}}},"aggs": {"result": {"percentile_ranks": {"field": "price","values": [15,30]}}}}
}

四、JavaAPI实现

调用metricAgg方法，传参CommonEntity 。

/** @Description: 指标聚合(Open)* @Method: metricAgg* @Param: [commonEntity]* @Update:* @since: 1.0.0* @Return: java.util.Map<java.lang.String,java.lang.Long>**/
public Map<Object, Object> metricAgg(CommonEntity commonEntity) throws Exception {//查询公共调用,将参数模板化SearchResponse response = getSearchResponse(commonEntity);//定义返回数据Map<Object, Object> map = new HashMap<Object, Object>();// 此处完全可以返回ParsedAggregation ，不用instance，弊端是返回的数据字段多、get的时候需要写死，下面循环map为的是动态获取keyMap<String, Aggregation> aggregationMap = response.getAggregations().asMap();// 将查询出来的数据放到本地局部线程变量中SearchTools.setResponseThreadLocal(response);//此处循环一次，目的是动态获取client端传来的【result】for (Map.Entry<String, Aggregation> m : aggregationMap.entrySet()) {//处理指标聚合metricResultConverter(map, m);}//公共数据处理mbCommonConverter(map);return map;
}
/** @Description: 查询公共调用,参数模板化* @Method: getSearchResponse* @Param: [commonEntity]* @Update:* @since: 1.0.0* @Return: org.elasticsearch.action.search.SearchResponse**/
private SearchResponse getSearchResponse(CommonEntity commonEntity) throws Exception {//定义查询请求SearchRequest searchRequest = new SearchRequest();//指定去哪个索引查询searchRequest.indices(commonEntity.getIndexName());//构建资源查询构建器，主要用于拼接查询条件SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();//将前端的dsl查询转化为XContentParserXContentParser parser = SearchTools.getXContentParser(commonEntity);//将parser解析成功查询APIsourceBuilder.parseXContent(parser);//将sourceBuilder赋给searchRequestsearchRequest.source(sourceBuilder);//执行查询SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);return response;
}
/** @Description: 指标聚合结果转化器* @Method: metricResultConverter* @Param: [map, m]* @Update:* @since: 1.0.0* @Return: void**/
private void metricResultConverter(Map<Object, Object> map, Map.Entry<String, Aggregation> m) {//平均值if (m.getValue() instanceof ParsedAvg) {map.put("value", ((ParsedAvg) m.getValue()).getValue());}//最大值else if (m.getValue() instanceof ParsedMax) {map.put("value", ((ParsedMax) m.getValue()).getValue());}//最小值else if (m.getValue() instanceof ParsedMin) {map.put("value", ((ParsedMin) m.getValue()).getValue());}//求和else if (m.getValue() instanceof ParsedSum) {map.put("value", ((ParsedSum) m.getValue()).getValue());}//不重复的值else if (m.getValue() instanceof ParsedCardinality) {map.put("value", ((ParsedCardinality) m.getValue()).getValue());}//扩展状态统计else if (m.getValue() instanceof ParsedExtendedStats) {map.put("count", ((ParsedExtendedStats) m.getValue()).getCount());map.put("min", ((ParsedExtendedStats) m.getValue()).getMin());map.put("max", ((ParsedExtendedStats) m.getValue()).getMax());map.put("avg", ((ParsedExtendedStats) m.getValue()).getAvg());map.put("sum", ((ParsedExtendedStats) m.getValue()).getSum());map.put("sum_of_squares", ((ParsedExtendedStats) m.getValue()).getSumOfSquares());map.put("variance", ((ParsedExtendedStats) m.getValue()).getVariance());map.put("std_deviation", ((ParsedExtendedStats) m.getValue()).getStdDeviation());map.put("lower", ((ParsedExtendedStats) m.getValue()).getStdDeviationBound(ExtendedStats.Bounds.LOWER));map.put("upper", ((ParsedExtendedStats) m.getValue()).getStdDeviationBound(ExtendedStats.Bounds.UPPER));}//状态统计else if (m.getValue() instanceof ParsedStats) {map.put("count", ((ParsedStats) m.getValue()).getCount());map.put("min", ((ParsedStats) m.getValue()).getMin());map.put("max", ((ParsedStats) m.getValue()).getMax());map.put("avg", ((ParsedStats) m.getValue()).getAvg());map.put("sum", ((ParsedStats) m.getValue()).getSum());}//百分位等级else if (m.getValue() instanceof ParsedTDigestPercentileRanks) {for (Iterator<Percentile> iterator = ((ParsedTDigestPercentileRanks) m.getValue()).iterator(); iterator.hasNext(); ) {Percentile p = (Percentile) iterator.next();map.put(p.getValue(), p.getPercent());}}//百分位度量else if (m.getValue() instanceof ParsedTDigestPercentiles) {for (Iterator<Percentile> iterator = ((ParsedTDigestPercentiles) m.getValue()).iterator(); iterator.hasNext(); ) {Percentile p = (Percentile) iterator.next();map.put(p.getPercent(), p.getValue());}}}/** @Description: 公共数据处理(指标聚合、桶聚合)* @Method: mbCommonConverter* @Param: []* @Update:* @since: 1.0.0* @Return: void**/
private void mbCommonConverter(Map<Object, Object> map) {if (!CollectionUtils.isEmpty(ResponseThreadLocal.get())) {//从线程中取出数据map.put("list", ResponseThreadLocal.get());//清空本地线程局部变量中的数据，防止内存泄露ResponseThreadLocal.clear();}}