Elasticsearch 聚合分析(Aggregation)_Java英雄之旅的博客-程序员ITS203

技术标签: Elasticsearch  elasticsearch  

Aggregation的语法

Metric - 单值输出 & 多值输出

Aggregation 属于 Search 的 一部分。一般情况下,建议将其 Size 指定为 0。以工资统计信息为例:

工资统计信息

先插入工资数据:

DELETE /employees
PUT /employees/
{
    
  "mappings" : {
    
      "properties" : {
    
        "age" : {
    
          "type" : "integer"
        },
        "gender" : {
    
          "type" : "keyword"
        },
        "job" : {
    
          "type" : "text",
          "fields" : {
    
            "keyword" : {
    
              "type" : "keyword",
              "ignore_above" : 50
            }
          }
        },
        "name" : {
    
          "type" : "keyword"
        },
        "salary" : {
    
          "type" : "integer"
        }
      }
    }
}

PUT /employees/_bulk
{
     "index" : {
      "_id" : "1" } }
{
     "name" : "Emma","age":32,"job":"Product Manager","gender":"female","salary":35000 }
{
     "index" : {
      "_id" : "2" } }
{
     "name" : "Underwood","age":41,"job":"Dev Manager","gender":"male","salary": 50000}
{
     "index" : {
      "_id" : "3" } }
{
     "name" : "Tran","age":25,"job":"Web Designer","gender":"male","salary":18000 }
{
     "index" : {
      "_id" : "4" } }
{
     "name" : "Rivera","age":26,"job":"Web Designer","gender":"female","salary": 22000}
{
     "index" : {
      "_id" : "5" } }
{
     "name" : "Rose","age":25,"job":"QA","gender":"female","salary":18000 }
{
     "index" : {
      "_id" : "6" } }
{
     "name" : "Lucy","age":31,"job":"QA","gender":"female","salary": 25000}
{
     "index" : {
      "_id" : "7" } }
{
     "name" : "Byrd","age":27,"job":"QA","gender":"male","salary":20000 }
{
     "index" : {
      "_id" : "8" } }
{
     "name" : "Foster","age":27,"job":"Java Programmer","gender":"male","salary": 20000}
{
     "index" : {
      "_id" : "9" } }
{
     "name" : "Gregory","age":32,"job":"Java Programmer","gender":"male","salary":22000 }
{
     "index" : {
      "_id" : "10" } }
{
     "name" : "Bryant","age":20,"job":"Java Programmer","gender":"male","salary": 9000}
{
     "index" : {
      "_id" : "11" } }
{
     "name" : "Jenny","age":36,"job":"Java Programmer","gender":"female","salary":38000 }
{
     "index" : {
      "_id" : "12" } }
{
     "name" : "Mcdonald","age":31,"job":"Java Programmer","gender":"male","salary": 32000}
{
     "index" : {
      "_id" : "13" } }
{
     "name" : "Jonthna","age":30,"job":"Java Programmer","gender":"female","salary":30000 }
{
     "index" : {
      "_id" : "14" } }
{
     "name" : "Marshall","age":32,"job":"Javascript Programmer","gender":"male","salary": 25000}
{
     "index" : {
      "_id" : "15" } }
{
     "name" : "King","age":33,"job":"Java Programmer","gender":"male","salary":28000 }
{
     "index" : {
      "_id" : "16" } }
{
     "name" : "Mccarthy","age":21,"job":"Javascript Programmer","gender":"male","salary": 16000}
{
     "index" : {
      "_id" : "17" } }
{
     "name" : "Goodwin","age":25,"job":"Javascript Programmer","gender":"male","salary": 16000}
{
     "index" : {
      "_id" : "18" } }
{
     "name" : "Catherine","age":29,"job":"Javascript Programmer","gender":"female","salary": 20000}
{
     "index" : {
      "_id" : "19" } }
{
     "name" : "Boone","age":30,"job":"DBA","gender":"male","salary": 30000}
{
     "index" : {
      "_id" : "20" } }
{
     "name" : "Kathy","age":29,"job":"DBA","gender":"female","salary": 20000}
# 多个 Metric 聚合,找到最低最高和平均工资
POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "max_salary": {
    
      "max": {
    
        "field": "salary"
      }
    },
    "min_salary": {
    
      "min": {
    
        "field": "salary"
      }
    },
    "avg_salary": {
    
      "avg": {
    
        "field": "salary"
      }
    }
  }
}
# Metric 聚合,找到最低的工资
POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "min_salary": {
    
      "min": {
    
        "field":"salary"
      }
    }
  }
}

# Metric 聚合,找到最高的工资
POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "max_salary": {
    
      "max": {
    
        "field":"salary"
      }
    }
  }
}

# 一个聚合,输出多值
POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "stats_salary": {
    
      "stats": {
    
        "field":"salary"
      }
    }
  }
}

Bucket - Terms & 数字范围

Bucket

按照⼀定的规则,将⽂档分配到不同的 桶中,从⽽达到分类的⽬的。ES 提供的 ⼀些常⻅见的 Bucket Aggregation:

  • terms
  • 数组类型:Range / Data Range,Histogram / Date Histogram
  • ⽀持嵌套:也就在桶⾥再做分桶

Terms aggregation

Terms aggretion 字段需要打开 fielddata,才能进行 Terms aggregation,keyword 默认支持 doc_values,Text 需要在 Mapping 中 enable。

# 对keword 进行聚合
POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "jobs": {
    
      "terms": {
    
        "field":"job.keyword"
      }
    }
  }
}


# 对 Text 字段进行 terms 聚合查询,失败
POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "jobs": {
    
      "terms": {
    
        "field":"job"
      }
    }
  }
}

# 对 Text 字段打开 fielddata,支持terms aggregation
PUT employees/_mapping
{
    
  "properties" : {
    
    "job":{
    
       "type":     "text",
       "fielddata": true
    }
  }
}

# 对 Text 字段进行 terms 分词。分词后的terms
POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "jobs": {
    
      "terms": {
    
        "field":"job"
      }
    }
  }
}

POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "jobs": {
    
      "terms": {
    
        "field":"job.keyword"
      }
    }
  }
}


Cardinality,类似 SQL 中的 Distinct

# 对job.keyword 和 job 进行 terms 聚合,分桶的总数并不一样
POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "cardinate": {
    
      "cardinality": {
    
        "field": "job"
      }
    }
  }
}

Bucket Size & Top Hits Demo

  • 应⽤场景:当获取分桶后,桶内最匹配的顶部⽂档列表
  • Size:按年龄分桶,找出指定数据量的分桶信息
  • Top Hits:查看各个⼯种中,年纪最⼤的 3 名员⼯
# 对性别的 keyword 进行聚合
POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "gender": {
    
      "terms": {
    
        "field":"gender"
      }
    }
  }
}


#指定 bucket 的 size
POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "ages_5": {
    
      "terms": {
    
        "field":"age",
        "size":3
      }
    }
  }
}



# 指定size,不同工种中,年纪最大的3个员工的具体信息
POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "jobs": {
    
      "terms": {
    
        "field":"job.keyword"
      },
      "aggs":{
    
        "old_employee":{
    
          "top_hits":{
    
            "size":3,
            "sort":[
              {
    
                "age":{
    
                  "order":"desc"
                }
              }
            ]
          }
        }
      }
    }
  }
}

Range & Histogram 聚合

#Salary Ranges 分桶,可以自己定义 key
POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "salary_range": {
    
      "range": {
    
        "field":"salary",
        "ranges":[
          {
    
            "to":10000
          },
          {
    
            "from":10000,
            "to":20000
          },
          {
    
            "key":">20000",
            "from":20000
          }
        ]
      }
    }
  }
}


#Salary Histogram,工资010万,以 5000一个区间进行分桶
POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "salary_histrogram": {
    
      "histogram": {
    
        "field":"salary",
        "interval":5000,
        "extended_bounds":{
    
          "min":0,
          "max":100000

        }
      }
    }
  }
}

多次嵌套

# 嵌套聚合1,按照工作类型分桶,并统计工资信息
POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "Job_salary_stats": {
    
      "terms": {
    
        "field": "job.keyword"
      },
      "aggs": {
    
        "salary": {
    
          "stats": {
    
            "field": "salary"
          }
        }
      }
    }
  }
}

# 多次嵌套。根据工作类型分桶,然后按照性别分桶,计算工资的统计信息
POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "Job_gender_stats": {
    
      "terms": {
    
        "field": "job.keyword"
      },
      "aggs": {
    
        "gender_stats": {
    
          "terms": {
    
            "field": "gender"
          },
          "aggs": {
    
            "salary_stats": {
    
              "stats": {
    
                "field": "salary"
              }
            }
          }
        }
      }
    }
  }
}

Pipeline 聚合分析

Pipeline:min_bucket

Parent Pipeline:Derivative

# 平均工资最低的工作类型
POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "jobs": {
    
      "terms": {
    
        "field": "job.keyword",
        "size": 10
      },
      "aggs": {
    
        "avg_salary": {
    
          "avg": {
    
            "field": "salary"
          }
        }
      }
    },
    "min_salary_by_job":{
    
      "min_bucket": {
    
        "buckets_path": "jobs>avg_salary"
      }
    }
  }
}


# 平均工资最高的工作类型
POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "jobs": {
    
      "terms": {
    
        "field": "job.keyword",
        "size": 10
      },
      "aggs": {
    
        "avg_salary": {
    
          "avg": {
    
            "field": "salary"
          }
        }
      }
    },
    "max_salary_by_job":{
    
      "max_bucket": {
    
        "buckets_path": "jobs>avg_salary"
      }
    }
  }
}


# 平均工资的平均工资
POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "jobs": {
    
      "terms": {
    
        "field": "job.keyword",
        "size": 10
      },
      "aggs": {
    
        "avg_salary": {
    
          "avg": {
    
            "field": "salary"
          }
        }
      }
    },
    "avg_salary_by_job":{
    
      "avg_bucket": {
    
        "buckets_path": "jobs>avg_salary"
      }
    }
  }
}


# 平均工资的统计分析
POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "jobs": {
    
      "terms": {
    
        "field": "job.keyword",
        "size": 10
      },
      "aggs": {
    
        "avg_salary": {
    
          "avg": {
    
            "field": "salary"
          }
        }
      }
    },
    "stats_salary_by_job":{
    
      "stats_bucket": {
    
        "buckets_path": "jobs>avg_salary"
      }
    }
  }
}


# 平均工资的百分位数
POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "jobs": {
    
      "terms": {
    
        "field": "job.keyword",
        "size": 10
      },
      "aggs": {
    
        "avg_salary": {
    
          "avg": {
    
            "field": "salary"
          }
        }
      }
    },
    "percentiles_salary_by_job":{
    
      "percentiles_bucket": {
    
        "buckets_path": "jobs>avg_salary"
      }
    }
  }
}



#按照年龄对平均工资求导
POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "age": {
    
      "histogram": {
    
        "field": "age",
        "min_doc_count": 1,
        "interval": 1
      },
      "aggs": {
    
        "avg_salary": {
    
          "avg": {
    
            "field": "salary"
          }
        },
        "derivative_avg_salary":{
    
          "derivative": {
    
            "buckets_path": "avg_salary"
          }
        }
      }
    }
  }
}


#Cumulative_sum
POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "age": {
    
      "histogram": {
    
        "field": "age",
        "min_doc_count": 1,
        "interval": 1
      },
      "aggs": {
    
        "avg_salary": {
    
          "avg": {
    
            "field": "salary"
          }
        },
        "cumulative_salary":{
    
          "cumulative_sum": {
    
            "buckets_path": "avg_salary"
          }
        }
      }
    }
  }
}

#Moving Function
POST employees/_search
{
    
  "size": 0,
  "aggs": {
    
    "age": {
    
      "histogram": {
    
        "field": "age",
        "min_doc_count": 1,
        "interval": 1
      },
      "aggs": {
    
        "avg_salary": {
    
          "avg": {
    
            "field": "salary"
          }
        },
        "moving_avg_salary":{
    
          "moving_fn": {
    
            "buckets_path": "avg_salary",
            "window":10,
            "script": "MovingFunctions.min(values)"
          }
        }
      }
    }
  }
}

相关文章

Metric Aggregation
Bucket Aggregationsedit

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/shuiCSDN/article/details/104094608

智能推荐

Wireshark简明教程_weixin_33937778的博客-程序员ITS203

正如您在Wireshark教程第一部分看到的一样,安装、运行Wireshark并开始分析网络是非常简单的。 使用Wireshark时最常见的问题,是当您使用默认设置时,会得到大量冗余信息,以至于很难找到自己需要的部分。过犹不及。这就是为什么过滤器会如此重要。它们可以帮助我们在庞杂的结果中迅速找到我们需要的信息。--捕捉过滤器:用于决定将什么样的信息记录在捕...

Cookie/Session机制详解(转载)_banjiangtian6973的博客-程序员ITS203

会话(Session)跟踪是Web程序中常用的技术,用来跟踪用户的整个会话。常用的会话跟踪技术是Cookie与Session。Cookie通过在客户端记录信息确定用户身份,Session通过在服务器端记录信息确定用户身份。本章将系统地讲述Cookie与Session机制,并比较说明什么时候不能用Cookie,什么时候不能用Session。1.1 Cookie机制在程序...

Linux查看端口占用_杨咩咩yang的博客-程序员ITS203_linux如何查看8080端口是否被占用

1.查看8080端口被哪个程序占用lsof -i:8080注意上图中的进程号,查此进程的运行主程序位置ls -l /proc/71435

远程操作mysql的shell脚本_LWH21的博客-程序员ITS203_shell 远程连接mysql

远程操作mysql的shell脚本话不多说 看代码!创建shell脚本touch test.sh编辑脚本vi test.sh编辑内容#!/usr/bin/expect#连接mysql服务器spawn ssh 服务器名称@服务器端口send "yes\r"#输入服务器密码expect "*password:" { send "服务器密码*\r" }#输入mysql连接命令expect "$ " { send "mysql -u root -p数据库密码*\r" }#选择数据

MVC拦截器,MVC过滤器,MVC ActionFilterAttribute拦截器过滤器,OnActionExecuting_weixin_34334744的博客-程序员ITS203

该过滤拦截器动态拦截字符串和实体类检查是否有关键字,对字符串和动态实体类进行修改很再提交。第一步:新的拦截器类名并继承ActionFilterAttribute :CustomerFilterAttribute:ActionFilterAttribute 第二步:在方法OnActionExecuting中实现第三步:在对应的Action或者...

Zookeeper(2)-分布式锁的基础实现_wadfdhsajd的博客-程序员ITS203

在进行分布式锁操作之前,我们得知道什么是分布式锁。在单体应用中,使用 API 自带的 javaLock 或者是 synchronize 就可以解决线程多带来的并发问题。但是在中,上述的方法并不能解决服务与服务之间的并发问题。分布式锁一般用在分布式系统或者多个应用中,用来控制同一任务是否执行或者任务的执行顺序。在项目中,部署了多了应用,在执行定时任务时就会遇到同一任务可能执行多次的情况,我们可以借助分布式锁,保证在同一时间只有一个tomcat应用执行了定时任务使用 Zookeeper 创建临时顺序节点,判断

随便推点

服务器文件管理器在哪里打开,服务器怎么打开任务管理器_瑞恩的奇幻博物馆的博客-程序员ITS203

服务器怎么打开任务管理器 内容精选换一换远程登录服务器出现蓝屏或黑屏,可能是由于explorer.exe进程异常导致的桌面无法显示。这是由于Windows服务器的explorer.exe进程异常导致的。explorer.exe是Windows程序管理器或者文件资源管理器,它用于管理Windows图形壳,包括桌面和文件管理,删除该程序会导致Windows图形界面无法使用。打开云服当您发现云服务器的运...

TCP报文格式说明_five-five的博客-程序员ITS203_tcp报文格式

首先理解一个概念(TCP/UDP)都是为了建立连接,只有建立连接,之后才有http或者rpc什么什么的事TCP首部(报文)段概念图报文概念解释(1b=8位):端口号:用来标识同一台计算机的不同应用进程源端口(2个字节):源端口和IP地址的作用是标识报文的返回地址目的端口(2个字节)端口指明接收方计算机上的应用程序接口TCP报头中的源端口号和目的端口号同IP数据报中的源IP与目的IP唯一确定一条TCP连接序号(4个字节):TCP 是面向字节流的,在一个 TCP 连接中传输的字节流中的每

在mysql中显示所有数据库的语句是_MySQL查看或显示数据库(SHOW DATABASES语句)_英俊潇洒你冲哥的博客-程序员ITS203

数据库可以看作是一个专门存储数据对象的容器,每一个数据库都有唯一的名称,并且数据库的名称都是有实际意义的,这样就可以清晰的看出每个数据库用来存放什么数据。在 MySQL 数据库中存在系统数据库和自定义数据库,系统数据库是在安装 MySQL 后系统自带的数据库,自定义数据库是由用户定义创建的数据库。在 MySQL 中,可使用 SHOW DATABASES 语句来查看或显示当前用户权限范围以内的数据库...

寻找二叉树中两个子节点的最近父节点_hnkfwhw的博客-程序员ITS203

二叉树的节点定义为如下: <wbr>typedef struct Node{ </wbr><wbr>int value;</wbr><wbr>struct Node *Left;</wbr><wbr>struct Node *Right;</wbr>}*PNode;

IDEA 三种注释生成方式_向往程序媛的程序猿的博客-程序员ITS203_idea 注释生成

三种注释方式  行注释、块注释、方法或类说明注释。一、快捷键:Ctrl + /  使用Ctrl+ /, 添加行注释,再次使用,去掉行注释二、演示代码if (hallSites != null && hallSites.size() > 0) { //行注释 //最大行号 int maxRow = 0; //最大列数 int maxCol = 0;}块注释一、快捷键:Ctrl +Shift+/  使用Ctr..

Python——获取CPU的线程数量_songyuc的博客-程序员ITS203

1 前言在使用PyTorch搭建目标检测模型时,我们需要设置最佳的数据读取的进程数量;我觉得,这个数量最好就设置成CPU的线程数量;2 获取CPU的线程数量获取CPU线程数量的代码:from multiprocessing import cpu_countprint(cpu_count())备注:Python官方文档中关于multiprocessing.cpu_c...

推荐文章

热门文章

相关标签