Elasticsearch常用查询过滤接口与值得注意的问题

database

Elasticsearch常用查询过滤接口与值得注意的问题

简介

本文将介绍一些ES查询过滤的接口和一些值得问题。

在ES中主要是查询,并且只有在bool查询中才有过滤上下文,当然聚合函数中也可能出现过滤上下文。

过滤不计算相关性评分,并且能够缓存,所以应该优先考虑过滤。

具体的做法就是使用bool查询的filter,后面会详细介绍。

关于查询过滤的rest api介绍,可以参考Elasticsearch查询过滤解惑

数据准备bulk

首先,我们使用bulk添加一些测试数据:

public class BulkTest {

private static final String[] homes = {"河北省", "山西省", "辽宁省", "吉林省", "江苏省", "浙江省", "安徽省", "福建省", "江西省", "山东省", "河南省", "湖北省", "湖南省", "广东省", "海南省", "四川省", "贵州省", "云南省", "陕西省", "甘肃省", "青海省", "黑龙江省", "台湾省", "北京市", "天津市", "上海市", "重庆市", "广西壮族自治区", "西藏自治区", "宁夏回族自治区", "新疆维吾尔自治区", "内蒙古自治区", "香港特别行政区", "澳门特别行政区"};

private RestHighLevelClient client;

@Before

public void setUp() {

HttpHost host = new HttpHost("localhost", 9200, "http");

client = new RestHighLevelClient(RestClient.builder(host));

}

@Test

public void add() throws IOException {

BulkRequest request = new BulkRequest();

IndexRequest indexRequest;

List<UserInfo> userInfos = UserInfo.getUserInfo(10000);

String indexName = "user";

for(UserInfo userInfo : userInfos){

indexRequest = new IndexRequest(indexName).id(userInfo.id).source(JSON.toJSONString(userInfo), XContentType.JSON);

request.add(indexRequest);

}

client.bulk(request, RequestOptions.DEFAULT);

}

private static class UserInfo{

private String id;

private Long createTime;

private String createTimeStr;

private short status;

private String home;

private String option;

public static List<UserInfo> getUserInfo(int size){

LinkedList<UserInfo> userInfos = new LinkedList<>();

LocalDateTime now = LocalDateTime.now();

Random random = new Random();

int count = 1;

for(int i=0;i<size;i++){

UserInfo userInfo = new UserInfo();

LocalDateTime localDateTime = now.plusDays(random.nextInt(1000));

userInfo.setId(String.format("%s%05d",localDateTime.format(DatetimeUtil.YYYYMMDDHHMMSS_FORMATTER),count++));

userInfo.setCreateTimeStr(localDateTime.format(DatetimeUtil.DATE_TIME_FORMATTER));

userInfo.setCreateTime(DatetimeUtil.getLocalDateTimeMill(localDateTime));

userInfo.setHome(homes[random.nextInt(homes.length)]);

userInfo.setOption(homes[random.nextInt(homes.length)]);

userInfo.setStatus((short) random.nextInt(10));

userInfos.add(userInfo);

}

return userInfos;

}

public String getId() {

return id;

}

public void setId(String id) {

this.id = id;

}

public Long getCreateTime() {

return createTime;

}

public void setCreateTime(Long createTime) {

this.createTime = createTime;

}

public String getCreateTimeStr() {

return createTimeStr;

}

public void setCreateTimeStr(String createTimeStr) {

this.createTimeStr = createTimeStr;

}

public short getStatus() {

return status;

}

public void setStatus(short status) {

this.status = status;

}

public String getHome() {

return home;

}

public void setHome(String home) {

this.home = home;

}

public String getOption() {

return option;

}

public void setOption(String option) {

this.option = option;

}

}

}

bool查询

Java rest API中有一个QueryBuilders工厂类,可以创建各个查询bulider。

bool查询中最重要的是filter,表示过滤。

当然,也可以使用常见的must,表示必须满足, should,表示至少一个,must_not表示必须不。

下面直接上代码:

@Test

public void boolQueryBuilder() throws IOException {

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();

boolQueryBuilder.filter(QueryBuilders.termQuery("status", "5"));//bool查询中过滤上下文

searchSourceBuilder.query(boolQueryBuilder);

searchSourceBuilder.from(0);//从0开始

searchSourceBuilder.size(20);//默认10

searchSourceBuilder.sort("createTime", SortOrder.DESC);//排序

SearchRequest searchRequest = new SearchRequest(INDEX_NAME);

searchRequest.source(searchSourceBuilder);

SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);

SearchHits hits = search.getHits();

for (SearchHit hit : hits) {

System.out.println(hit.getSourceAsString());

}

}

上面代码就是通过过滤的方式找status字段为5的文档,并且给了获取开始条数和获取多少条。

默认从0开始,获取10条,不要深度分页,需要深度分页参考后面的scroll和search after

当然也可以通过sort指定排序字段。

term与terms查询

term查询,表示精确匹配,terms和term基本一样,但是terms允许设置多个值,只要有一个值精确匹配就算匹配成功。

可以直接使用term查询,但是还是建议尽量放将term查询放在bool查询的filter中。

@Test

public void term() throws IOException {

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("home.keyword", "四川省");

// TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("home", "四川省");

searchSourceBuilder.query(termQueryBuilder);

System.out.println(searchSourceBuilder.toString());

SearchRequest searchRequest = new SearchRequest(INDEX_NAME);

searchRequest.source(searchSourceBuilder);

SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);

SearchHits hits = search.getHits();

for (SearchHit hit : hits) {

System.out.println(hit.getSourceAsString());

}

}

@Test

public void terms() throws IOException {

BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();

TermsQueryBuilder termsQueryBuilder = QueryBuilders.termsQuery("status", 5,6);

boolQueryBuilder.filter(termsQueryBuilder);

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

searchSourceBuilder.query(boolQueryBuilder);

SearchRequest searchRequest = new SearchRequest(INDEX_NAME);

searchRequest.source(searchSourceBuilder);

SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);

SearchHits hits = search.getHits();

for (SearchHit hit : hits) {

System.out.println(hit.getSourceAsString());

}

}

注意:

如果使用明明有数据,但是使用term查询不出来,这个时候,也许你可以检查一下mapping对应的字段了。

很多朋友不喜欢设置mapping,或者设置了动态mapping,这样动态添加字符串类型的时候,ES就会自动生成一个text类型,并且设置fields,取前256字符设置为keyword。

这个时候,就要使用field-name.keyword这个字段来查询,而不是field-name字段。很多时候查询时间字符串不准确基本也是这个原因。

范围查询

范围查询非常简单,也非常常用:

@Test

public void range() throws IOException {

RangeQueryBuilder rangeQueryBuilder = QueryBuilders.rangeQuery("status").gte(4).lte(7);

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

searchSourceBuilder.query(rangeQueryBuilder);

SearchRequest searchRequest = new SearchRequest(INDEX_NAME);

searchRequest.source(searchSourceBuilder);

SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);

SearchHits hits = search.getHits();

for (SearchHit hit : hits) {

System.out.println(hit.getSourceAsString());

}

}

ids

通过id集合查询文档:

 @Test

public void ids() throws IOException {

IdsQueryBuilder idsQueryBuilder = QueryBuilders.idsQuery().addIds("20210313173331","20211101173331");

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

searchSourceBuilder.query(idsQueryBuilder);

SearchRequest searchRequest = new SearchRequest(INDEX_NAME);

searchRequest.source(searchSourceBuilder);

SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);

SearchHits hits = search.getHits();

for (SearchHit hit : hits) {

System.out.println(hit.getSourceAsString());

}

}

exists

查询存在指定字段的文档:

 @Test

public void exists() throws IOException {

//检查字段是否存在

// ExistsQueryBuilder existsQueryBuilder = QueryBuilders.existsQuery("status");

ExistsQueryBuilder existsQueryBuilder = QueryBuilders.existsQuery("hello");

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

searchSourceBuilder.query(existsQueryBuilder);

SearchRequest searchRequest = new SearchRequest(INDEX_NAME);

searchRequest.source(searchSourceBuilder);

SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);

SearchHits hits = search.getHits();

for (SearchHit hit : hits) {

System.out.println(hit.getSourceAsString());

}

}

match

@Test

public void match() throws IOException {

MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("home", "四川");

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

searchSourceBuilder.query(matchQueryBuilder);

SearchRequest searchRequest = new SearchRequest(INDEX_NAME);

searchRequest.source(searchSourceBuilder);

SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);

SearchHits hits = search.getHits();

for (SearchHit hit : hits) {

System.out.println(hit.getSourceAsString());

}

}

multi_match

multi_match和match差不多,但是可以指定多个字段搜索。

@Test

public void multiMatch() throws IOException {

MultiMatchQueryBuilder multiMatchQueryBuilder = QueryBuilders.multiMatchQuery("四川", "home", "option");

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

searchSourceBuilder.query(multiMatchQueryBuilder);

SearchRequest searchRequest = new SearchRequest(INDEX_NAME);

searchRequest.source(searchSourceBuilder);

SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);

SearchHits hits = search.getHits();

for (SearchHit hit : hits) {

System.out.println(hit.getSourceAsString());

}

}

scroll

有时候,在做统计的时候,可能需要搜索全部数据,如果数据量非常大,需要深度分页,简单的查询可能就不行了,这个时候就需要scroll。

scroll之所以有效,是因为它不做全局排序,这样在query阶段这个节点只需要查询自己的数据集,返回满足条件的id集合就可以了。

scroll会维护这个id集合的上下文一段时间,这样就可以查询全量数据。

import org.apache.http.HttpHost;

import org.elasticsearch.action.search.ClearScrollRequest;

import org.elasticsearch.action.search.ClearScrollResponse;

import org.elasticsearch.action.search.SearchRequest;

import org.elasticsearch.action.search.SearchResponse;

import org.elasticsearch.action.search.SearchScrollRequest;

import org.elasticsearch.client.RequestOptions;

import org.elasticsearch.client.RestClient;

import org.elasticsearch.client.RestHighLevelClient;

import org.elasticsearch.common.unit.TimeValue;

import org.elasticsearch.index.query.BoolQueryBuilder;

import org.elasticsearch.index.query.QueryBuilders;

import org.elasticsearch.index.query.RangeQueryBuilder;

import org.elasticsearch.search.Scroll;

import org.elasticsearch.search.SearchHit;

import org.elasticsearch.search.builder.SearchSourceBuilder;

import org.elasticsearch.search.fetch.subphase.FetchSourceContext;

import org.junit.Before;

import org.junit.Test;

import java.io.BufferedOutputStream;

import java.io.FileOutputStream;

import java.io.IOException;

public class ScrollTest {

private RestHighLevelClient client;

@Before

public void setUp() {

HttpHost host = new HttpHost("127.0.0.1", 9200, "http");

client = new RestHighLevelClient(RestClient.builder(host));

}

@Test

public void scroll() throws IOException {

long start = System.currentTimeMillis();

FileOutputStream fileOutputStream = new FileOutputStream("F:\tmp\long_scroll9.txt");

BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);

final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L));

SearchRequest searchRequest = new SearchRequest("user");

searchRequest.scroll(scroll);

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

//只获取指定字段

String[] fields = {"id","createTime"};

FetchSourceContext sourceContext = new FetchSourceContext(true,fields,null);

searchSourceBuilder.fetchSource(sourceContext);

BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();

// RangeQueryBuilder lastlogintime = QueryBuilders.rangeQuery("createTimeStr");

// lastlogintime.gte("2019-08-11 00:00:00");

RangeQueryBuilder lastlogintime = QueryBuilders.rangeQuery("createTime");

lastlogintime.gte(1597298833674L);

boolQueryBuilder.filter(lastlogintime);

// boolQueryBuilder.must(lastlogintime);

searchSourceBuilder.query(boolQueryBuilder);

// searchSourceBuilder.sort("_doc");

searchRequest.source(searchSourceBuilder);

SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

String scrollId = searchResponse.getScrollId();

int count = 0;

SearchHit[] searchHits = searchResponse.getHits().getHits();

count+=searchHits.length;

// print(searchHits);

print(bufferedOutputStream,searchHits);

while (searchHits != null && searchHits.length > 0) {

SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);

scrollRequest.scroll(scroll);

searchResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT);

scrollId = searchResponse.getScrollId();

searchHits = searchResponse.getHits().getHits();

count+=searchHits.length;

// print(searchHits);

print(bufferedOutputStream,searchHits);

}

bufferedOutputStream.close();

System.out.println(count);

ClearScrollRequest clearScrollRequest = new ClearScrollRequest();

clearScrollRequest.addScrollId(scrollId);

ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);

boolean succeeded = clearScrollResponse.isSucceeded();

System.out.println(succeeded);

System.out.println(System.currentTimeMillis() - start);

}

private static void print(BufferedOutputStream bufferedOutputStream, SearchHit[] searchHits) throws IOException {

for (SearchHit hit : searchHits) {

bufferedOutputStream.write(hit.getSourceAsString().getBytes());

bufferedOutputStream.write("

".getBytes());

}

}

private static void print(SearchHit[] searchHits){

for(SearchHit searchHit : searchHits){

System.out.println(searchHit.getSourceAsString());

}

}

}

search after

scroll也有自己的局限,例如在query阶段满足条件的ids特别多,整个过程就会变得非常慢。

这个时候就可以考虑使用search after,search after和scroll原理基本一样,不过search after是实时的。

import org.apache.http.HttpHost;

import org.elasticsearch.action.search.SearchRequest;

import org.elasticsearch.action.search.SearchResponse;

import org.elasticsearch.client.RequestOptions;

import org.elasticsearch.client.RestClient;

import org.elasticsearch.client.RestHighLevelClient;

import org.elasticsearch.index.query.BoolQueryBuilder;

import org.elasticsearch.index.query.QueryBuilders;

import org.elasticsearch.index.query.RangeQueryBuilder;

import org.elasticsearch.search.SearchHit;

import org.elasticsearch.search.builder.SearchSourceBuilder;

import org.junit.Before;

import org.junit.Test;

import java.io.BufferedOutputStream;

import java.io.FileOutputStream;

import java.io.IOException;

public class SearchAfterTest {

private static RestHighLevelClient client;

@Before

public void setUp() {

HttpHost host = new HttpHost("127.0.0.1", 9200, "http");

client = new RestHighLevelClient(RestClient.builder(host));

}

@Test

public void search() throws IOException {

long start = System.currentTimeMillis();

Object[] objects = null;

FileOutputStream fileOutputStream = new FileOutputStream("F:\tmp\safter_long_filter6.txt");

BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);

boolean type = true;

while (type) {

SearchHit[] hits = searchAfter(client, objects);

if(hits.length == 0){

break;

}

objects = hits[hits.length-1].getSortValues();

if (hits.length < 1000) {

type = false;

}

writeData(bufferedOutputStream,hits);

}

bufferedOutputStream.close();

System.out.println(System.currentTimeMillis() - start);

client.close();

}

private static void writeData(BufferedOutputStream bufferedOutputStream, SearchHit[] searchHits) throws IOException {

for (SearchHit hit : searchHits) {

bufferedOutputStream.write(hit.getSourceAsString().getBytes());

bufferedOutputStream.write("

".getBytes());

}

}

public static SearchHit[] searchAfter(RestHighLevelClient client, Object[] objects) throws IOException {

SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();

BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();

// RangeQueryBuilder lastlogintime = QueryBuilders.rangeQuery("createTimeStr");

RangeQueryBuilder lastlogintime = QueryBuilders.rangeQuery("createTimeStr.keyword");

lastlogintime.gte("2020-08-11 00:00:00");

// RangeQueryBuilder lastlogintime = QueryBuilders.rangeQuery("createTime");

// lastlogintime.gte(1597298833674L);

// lastlogintime.gte(1597298833674L).lte(1597298833674L);

boolQueryBuilder.filter(lastlogintime);

sourceBuilder.query(boolQueryBuilder);

sourceBuilder.size(1000);

// sourceBuilder.sort("_id", SortOrder.DESC);

if(objects != null) {

sourceBuilder.searchAfter(objects);

}

SearchRequest searchRequest = new SearchRequest();

searchRequest.indices("user");

searchRequest.source(sourceBuilder);

SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);

SearchHit[] hits = response.getHits().getHits();

return hits;

}

}

© 著作权归作者所有

打赏

点赞 (0)

收藏 (0)

分享

微博

QQ

微信

打印

举报

上一篇:

log4j2初遇

下一篇:

Elasticsearch查询过滤解惑

trayvon

开源项目作者

作为一个开源项目作者,是时候站出来拯救世界了!

领取时间:2019/10/24

领取条件:开源项目被开源中国收录的开发者可领取

粉丝 19

博文 195

码字总数 284244

作品 1

程序员

关注

私信

提问

加载中

请先登录后再评论。

删除一条评论

评论删除后,数据将无法恢复

取消

确定

相关文章

最新文章

Netty那点事(三)Channel与Pipeline

Channel是理解和使用Netty的核心。Channel的涉及内容较多,这里我使用由浅入深的介绍方法。在这篇文章中,我们主要介绍Channel部分中Pipeline实现机制。为了避免枯燥,借用一下《盗梦空间》的...

黄亿华

2013/11/24

2W

22

用vertx实现高吞吐量的站点计数器

工具:vertx,redis,mongodb,log4j 源代码地址:https://github.com/jianglibo/visitrank 先看架构图: 如果你不熟悉vertx,请先google一下。我这里将vertx当作一个容器,上面所有的圆圈要...

jianglibo

2014/04/03

4.3K

3

SQLServer实现split分割字符串到列

网上已有人实现sqlserver的split函数可将字符串分割成行,但是我们习惯了split返回数组或者列表,因此这里对其做一些改动,最终实现也许不尽如意,但是也能解决一些问题。 先贴上某大牛写的s...

cwalet

2014/05/21

9.7K

0

Nutch学习笔记4-Nutch 1.7 的 索引篇 ElasticSearch

上一篇讲解了爬取和分析的流程,很重要的收获就是: 解析过程中,会根据页面的ContentType获得一系列的注册解析器, 依次调用每个解析器,当其中一个解析成功后就返回,否则继续执行下一个解...

强子哥哥

2014/06/26

712

0

5分钟 maven3 快速入门指南

前提条件 你首先需要了解如何在电脑上安装软件。如果你不知道如何做到这一点,请询问你办公室,学校里的人,或花钱找人来解释这个给你。 不建议给Maven的服务邮箱来发邮件寻求支持。 安装Mav...

fanl1982

2014/01/23

1.2W

7

没有更多内容

加载失败,请刷新页面

加载更多

下一页

自制超声波驱狗器(第三版)

文档标识符:Ultrasonic_Dog_Repellent_II_T-D-P7 作者:DLHC 最后修改日期:2020.8.13 本文链接: https://www.cnblogs.com/DLHC-TECH/p/Ultrasonic_Dog_Repellent_II_T-D-P7.html “威力”......

osc_t4kk3au7

21分钟前

0

0

测试框架mocha入门

单元测试 今天带你了解下测试框架mocha,这是一个js的测试框架,而且适用于node和浏览器环境。通过它,我们可以为我们模块、组件级别以上的代码编写单元测试用例,保证代码输出质量。 一、安...

字节逆旅

昨天

0

0

ElasticSearch 7.8.1集群搭建

通往集群的大门 集群由什么用? 高可用   高可用(High Availability)是分布式系统架构设计中必须考虑的因素之一,它通常是指,通过设计减少系统不能提供服务的时间。如果系统每运行100个时间...

osc_hwc3munb

22分钟前

13

0

如何面对人生危机?

点击蓝字关注,回复“职场进阶”获取职场进阶精品资料一份 一名读者提问:洋哥,我7年前从大厂出来,创业多年。连续失败,没买车也没房,女朋友也和我分手了,父母也对我失望至极。最近我开始...

张善友

今天

0

0

手写AOP实现过程

一.手写Aop前基础知识 1.aop是什么? 面向切面编程(AOP):是一种编程范式,提供从另一个角度来考虑程序结构从而完善面向对象编程(OOP)。 在进行OOP开发时,都是基于对组件(比如类)进行开发...

osc_qyg23ccq

23分钟前

0

0

没有更多内容

加载失败,请刷新页面

加载更多

下一页

OSCHINA 社区

关于我们

联系我们

合作伙伴

Open API

在线工具

码云 Gitee.com

企业研发管理

CopyCat-代码克隆检测

实用在线工具

微信公众号

OSCHINA APP

聚合全网技术文章,根据你的阅读喜好进行个性推荐

下载 APP

©OSCHINA(OSChina.NET)

工信部

开源软件推进联盟

指定官方社区

深圳市奥思网络科技有限公司版权所有

粤ICP备12009483号

顶部

以上是 Elasticsearch常用查询过滤接口与值得注意的问题 的全部内容, 来源链接: utcz.com/z/535167.html

回到顶部