当前位置: 首页 > news >正文

Elasticsearch 去重后求和

标题的要求可以用如下 SQL 表示

select sum(column2) from (select distinct(column1),column2 from table)t

要如何用 DSL 实现呢,先准备下索引和数据

PUT test_index
{"mappings": {"properties": {"column1": {"type": "keyword"},"column2": {"type": "long"}}}
}
PUT test_index/_doc/1
{"column1": "1","column2": 2
}PUT test_index/_doc/2
{"column1": "1","column2": 2
}PUT test_index/_doc/3
{"column1": "2","column2": 1
}PUT test_index/_doc/4
{"column1": "2","column2": 1
}

我首先想到的是 collapse 搭配 cardinality,再sum,那效果如何呢

GET test_index/_search
{"collapse": {"field": "column1"},"aggs": {"distinct_column": {"cardinality": {"field": "column1"}},"distinct_sum":{"sum": {"field": "column2"}},"all_sum":{"sum": {"field": "column2"}}}
}

结果,count是去重了的,但sum没有

{"took" : 5,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 4,"relation" : "eq"},"max_score" : null,"hits" : [{"_index" : "test_index","_type" : "_doc","_id" : "1","_score" : 1.0,"_source" : {"column1" : "1","column2" : 2},"fields" : {"column1" : ["1"]}},{"_index" : "test_index","_type" : "_doc","_id" : "3","_score" : 1.0,"_source" : {"column1" : "2","column2" : 1},"fields" : {"column1" : ["2"]}}]},"aggregations" : {"distinct_sum" : {"value" : 6.0},"all_sum" : {"value" : 6.0},"distinct_column" : {"value" : 2}}
}

尝试多次未果后,找到了这个

Sum aggregation on Unique Data in ElasticSearch - Stack Overflow

那试下呗

GET test_index/_search
{"size": 0, "aggs": {"column1_count": {"terms": {"field": "column1","size": 100},"aggs": {"column2_avg": {"avg": {"field": "column2"}}}},"unique_count": {"cardinality": {"field": "column1"}},"unique_sum_column2":{"sum_bucket": {"buckets_path": "column1_count>column2_avg"}},"sum_column2":{"sum": {"field": "column2"}}}
}

非常好,达到期望,这个是先求平均值,再求和,为这思路叹服

{"took" : 5,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 4,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"sum_column2" : {"value" : 6.0},"column1_count" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "1","doc_count" : 2,"column2_avg" : {"value" : 2.0}},{"key" : "2","doc_count" : 2,"column2_avg" : {"value" : 1.0}}]},"unique_count" : {"value" : 2},"unique_sum_column2" : {"value" : 3.0}}
}
http://www.lryc.cn/news/305939.html

相关文章:

  • 考研数学——高数:函数与极限(3)
  • LeetCode49 字母异位词分组
  • 【Python】Windows本地映射远程Linux服务器上的端口(解决jupyter notebook无法启动问题)
  • C++面试:用户态和内核态的基本概念、区别
  • Vue计算属性computed()
  • JWT学习笔记
  • WSL里的Ubuntu 登录密码忘了怎么更改
  • 【软件测试面试】要你介绍项目-如何说?完美面试攻略...
  • 【Crypto | CTF】RSA打法 集合
  • 在springboot中调用openai Api并实现流式响应
  • C++构造函数重难点解析
  • QT day3 作业2.22
  • AR汽车行业解决方案系列之2-远程汽修
  • 每日五道java面试题之spring篇(五)
  • 挑战杯 基于YOLO实现的口罩佩戴检测 - python opemcv 深度学习
  • 12. Springboot集成Dubbo3(三)Dubbo-Admin
  • c语言的数据结构:找环状链表入口处
  • LabVIEW声速测定实验数据处理
  • 深入剖析C语言中的段错误:从内存模型到实战调试全方位解析
  • 1.操作Python入门Python安装和使用教程
  • STM32G030C8T6:定时器1ms中断(以64MHz外部晶振为例)
  • 人工智能聊天机器人如何帮助您实现工作与生活的平衡
  • 3分钟看懂设计模式01:策略模式
  • 数据结构与算法:算法详解
  • AOSP10 替换系统launcher
  • 视频互动游戏如何暴打海王和舔狗
  • 大学生多媒体课程学习网站thinkphp+vue
  • 信息系统项目管理师论文分享(质量管理)
  • Redis实现滑动窗口限流
  • SQL Server查询计划(Query Plan)——XML查询计划