Elasticsearch常用查询过滤接口与值得注意的问题
Elasticsearch常用查询过滤接口与值得注意的问题
简介
本文将介绍一些ES查询过滤的接口和一些值得问题。
在ES中主要是查询,并且只有在bool查询中才有过滤上下文,当然聚合函数中也可能出现过滤上下文。
过滤不计算相关性评分,并且能够缓存,所以应该优先考虑过滤。
具体的做法就是使用bool查询的filter,后面会详细介绍。
关于查询过滤的rest api介绍,可以参考Elasticsearch查询过滤解惑
数据准备bulk
首先,我们使用bulk添加一些测试数据:
public class BulkTest { private static final String[] homes = {"河北省", "山西省", "辽宁省", "吉林省", "江苏省", "浙江省", "安徽省", "福建省", "江西省", "山东省", "河南省", "湖北省", "湖南省", "广东省", "海南省", "四川省", "贵州省", "云南省", "陕西省", "甘肃省", "青海省", "黑龙江省", "台湾省", "北京市", "天津市", "上海市", "重庆市", "广西壮族自治区", "西藏自治区", "宁夏回族自治区", "新疆维吾尔自治区", "内蒙古自治区", "香港特别行政区", "澳门特别行政区"};
private RestHighLevelClient client;
@Before
public void setUp() {
HttpHost host = new HttpHost("localhost", 9200, "http");
client = new RestHighLevelClient(RestClient.builder(host));
}
@Test
public void add() throws IOException {
BulkRequest request = new BulkRequest();
IndexRequest indexRequest;
List<UserInfo> userInfos = UserInfo.getUserInfo(10000);
String indexName = "user";
for(UserInfo userInfo : userInfos){
indexRequest = new IndexRequest(indexName).id(userInfo.id).source(JSON.toJSONString(userInfo), XContentType.JSON);
request.add(indexRequest);
}
client.bulk(request, RequestOptions.DEFAULT);
}
private static class UserInfo{
private String id;
private Long createTime;
private String createTimeStr;
private short status;
private String home;
private String option;
public static List<UserInfo> getUserInfo(int size){
LinkedList<UserInfo> userInfos = new LinkedList<>();
LocalDateTime now = LocalDateTime.now();
Random random = new Random();
int count = 1;
for(int i=0;i<size;i++){
UserInfo userInfo = new UserInfo();
LocalDateTime localDateTime = now.plusDays(random.nextInt(1000));
userInfo.setId(String.format("%s%05d",localDateTime.format(DatetimeUtil.YYYYMMDDHHMMSS_FORMATTER),count++));
userInfo.setCreateTimeStr(localDateTime.format(DatetimeUtil.DATE_TIME_FORMATTER));
userInfo.setCreateTime(DatetimeUtil.getLocalDateTimeMill(localDateTime));
userInfo.setHome(homes[random.nextInt(homes.length)]);
userInfo.setOption(homes[random.nextInt(homes.length)]);
userInfo.setStatus((short) random.nextInt(10));
userInfos.add(userInfo);
}
return userInfos;
}
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public Long getCreateTime() {
return createTime;
}
public void setCreateTime(Long createTime) {
this.createTime = createTime;
}
public String getCreateTimeStr() {
return createTimeStr;
}
public void setCreateTimeStr(String createTimeStr) {
this.createTimeStr = createTimeStr;
}
public short getStatus() {
return status;
}
public void setStatus(short status) {
this.status = status;
}
public String getHome() {
return home;
}
public void setHome(String home) {
this.home = home;
}
public String getOption() {
return option;
}
public void setOption(String option) {
this.option = option;
}
}
}
bool查询
Java rest API中有一个QueryBuilders工厂类,可以创建各个查询bulider。
bool查询中最重要的是filter,表示过滤。
当然,也可以使用常见的must,表示必须满足, should,表示至少一个,must_not表示必须不。
下面直接上代码:
@Testpublic void boolQueryBuilder() throws IOException {
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
boolQueryBuilder.filter(QueryBuilders.termQuery("status", "5"));//bool查询中过滤上下文
searchSourceBuilder.query(boolQueryBuilder);
searchSourceBuilder.from(0);//从0开始
searchSourceBuilder.size(20);//默认10
searchSourceBuilder.sort("createTime", SortOrder.DESC);//排序
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
searchRequest.source(searchSourceBuilder);
SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = search.getHits();
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsString());
}
}
上面代码就是通过过滤的方式找status字段为5的文档,并且给了获取开始条数和获取多少条。
默认从0开始,获取10条,不要深度分页,需要深度分页参考后面的scroll和search after
当然也可以通过sort指定排序字段。
term与terms查询
term查询,表示精确匹配,terms和term基本一样,但是terms允许设置多个值,只要有一个值精确匹配就算匹配成功。
可以直接使用term查询,但是还是建议尽量放将term查询放在bool查询的filter中。
@Testpublic void term() throws IOException {
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("home.keyword", "四川省");
// TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("home", "四川省");
searchSourceBuilder.query(termQueryBuilder);
System.out.println(searchSourceBuilder.toString());
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
searchRequest.source(searchSourceBuilder);
SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = search.getHits();
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsString());
}
}
@Test
public void terms() throws IOException {
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
TermsQueryBuilder termsQueryBuilder = QueryBuilders.termsQuery("status", 5,6);
boolQueryBuilder.filter(termsQueryBuilder);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(boolQueryBuilder);
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
searchRequest.source(searchSourceBuilder);
SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = search.getHits();
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsString());
}
}
注意:
如果使用明明有数据,但是使用term查询不出来,这个时候,也许你可以检查一下mapping对应的字段了。
很多朋友不喜欢设置mapping,或者设置了动态mapping,这样动态添加字符串类型的时候,ES就会自动生成一个text类型,并且设置fields,取前256字符设置为keyword。
这个时候,就要使用field-name.keyword这个字段来查询,而不是field-name字段。很多时候查询时间字符串不准确基本也是这个原因。
范围查询
范围查询非常简单,也非常常用:
@Testpublic void range() throws IOException {
RangeQueryBuilder rangeQueryBuilder = QueryBuilders.rangeQuery("status").gte(4).lte(7);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(rangeQueryBuilder);
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
searchRequest.source(searchSourceBuilder);
SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = search.getHits();
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsString());
}
}
ids
通过id集合查询文档:
@Testpublic void ids() throws IOException {
IdsQueryBuilder idsQueryBuilder = QueryBuilders.idsQuery().addIds("20210313173331","20211101173331");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(idsQueryBuilder);
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
searchRequest.source(searchSourceBuilder);
SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = search.getHits();
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsString());
}
}
exists
查询存在指定字段的文档:
@Testpublic void exists() throws IOException {
//检查字段是否存在
// ExistsQueryBuilder existsQueryBuilder = QueryBuilders.existsQuery("status");
ExistsQueryBuilder existsQueryBuilder = QueryBuilders.existsQuery("hello");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(existsQueryBuilder);
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
searchRequest.source(searchSourceBuilder);
SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = search.getHits();
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsString());
}
}
match
@Testpublic void match() throws IOException {
MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("home", "四川");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(matchQueryBuilder);
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
searchRequest.source(searchSourceBuilder);
SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = search.getHits();
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsString());
}
}
multi_match
multi_match和match差不多,但是可以指定多个字段搜索。
@Testpublic void multiMatch() throws IOException {
MultiMatchQueryBuilder multiMatchQueryBuilder = QueryBuilders.multiMatchQuery("四川", "home", "option");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(multiMatchQueryBuilder);
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
searchRequest.source(searchSourceBuilder);
SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = search.getHits();
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsString());
}
}
scroll
有时候,在做统计的时候,可能需要搜索全部数据,如果数据量非常大,需要深度分页,简单的查询可能就不行了,这个时候就需要scroll。
scroll之所以有效,是因为它不做全局排序,这样在query阶段这个节点只需要查询自己的数据集,返回满足条件的id集合就可以了。
scroll会维护这个id集合的上下文一段时间,这样就可以查询全量数据。
import org.apache.http.HttpHost;import org.elasticsearch.action.search.ClearScrollRequest;
import org.elasticsearch.action.search.ClearScrollResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchScrollRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.RangeQueryBuilder;
import org.elasticsearch.search.Scroll;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.FetchSourceContext;
import org.junit.Before;
import org.junit.Test;
import java.io.BufferedOutputStream;
import java.io.FileOutputStream;
import java.io.IOException;
public class ScrollTest {
private RestHighLevelClient client;
@Before
public void setUp() {
HttpHost host = new HttpHost("127.0.0.1", 9200, "http");
client = new RestHighLevelClient(RestClient.builder(host));
}
@Test
public void scroll() throws IOException {
long start = System.currentTimeMillis();
FileOutputStream fileOutputStream = new FileOutputStream("F:\tmp\long_scroll9.txt");
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L));
SearchRequest searchRequest = new SearchRequest("user");
searchRequest.scroll(scroll);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
//只获取指定字段
String[] fields = {"id","createTime"};
FetchSourceContext sourceContext = new FetchSourceContext(true,fields,null);
searchSourceBuilder.fetchSource(sourceContext);
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
// RangeQueryBuilder lastlogintime = QueryBuilders.rangeQuery("createTimeStr");
// lastlogintime.gte("2019-08-11 00:00:00");
RangeQueryBuilder lastlogintime = QueryBuilders.rangeQuery("createTime");
lastlogintime.gte(1597298833674L);
boolQueryBuilder.filter(lastlogintime);
// boolQueryBuilder.must(lastlogintime);
searchSourceBuilder.query(boolQueryBuilder);
// searchSourceBuilder.sort("_doc");
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
String scrollId = searchResponse.getScrollId();
int count = 0;
SearchHit[] searchHits = searchResponse.getHits().getHits();
count+=searchHits.length;
// print(searchHits);
print(bufferedOutputStream,searchHits);
while (searchHits != null && searchHits.length > 0) {
SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
scrollRequest.scroll(scroll);
searchResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT);
scrollId = searchResponse.getScrollId();
searchHits = searchResponse.getHits().getHits();
count+=searchHits.length;
// print(searchHits);
print(bufferedOutputStream,searchHits);
}
bufferedOutputStream.close();
System.out.println(count);
ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
clearScrollRequest.addScrollId(scrollId);
ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
boolean succeeded = clearScrollResponse.isSucceeded();
System.out.println(succeeded);
System.out.println(System.currentTimeMillis() - start);
}
private static void print(BufferedOutputStream bufferedOutputStream, SearchHit[] searchHits) throws IOException {
for (SearchHit hit : searchHits) {
bufferedOutputStream.write(hit.getSourceAsString().getBytes());
bufferedOutputStream.write("
".getBytes());
}
}
private static void print(SearchHit[] searchHits){
for(SearchHit searchHit : searchHits){
System.out.println(searchHit.getSourceAsString());
}
}
}
search after
scroll也有自己的局限,例如在query阶段满足条件的ids特别多,整个过程就会变得非常慢。
这个时候就可以考虑使用search after,search after和scroll原理基本一样,不过search after是实时的。
import org.apache.http.HttpHost;import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.RangeQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.junit.Before;
import org.junit.Test;
import java.io.BufferedOutputStream;
import java.io.FileOutputStream;
import java.io.IOException;
public class SearchAfterTest {
private static RestHighLevelClient client;
@Before
public void setUp() {
HttpHost host = new HttpHost("127.0.0.1", 9200, "http");
client = new RestHighLevelClient(RestClient.builder(host));
}
@Test
public void search() throws IOException {
long start = System.currentTimeMillis();
Object[] objects = null;
FileOutputStream fileOutputStream = new FileOutputStream("F:\tmp\safter_long_filter6.txt");
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
boolean type = true;
while (type) {
SearchHit[] hits = searchAfter(client, objects);
if(hits.length == 0){
break;
}
objects = hits[hits.length-1].getSortValues();
if (hits.length < 1000) {
type = false;
}
writeData(bufferedOutputStream,hits);
}
bufferedOutputStream.close();
System.out.println(System.currentTimeMillis() - start);
client.close();
}
private static void writeData(BufferedOutputStream bufferedOutputStream, SearchHit[] searchHits) throws IOException {
for (SearchHit hit : searchHits) {
bufferedOutputStream.write(hit.getSourceAsString().getBytes());
bufferedOutputStream.write("
".getBytes());
}
}
public static SearchHit[] searchAfter(RestHighLevelClient client, Object[] objects) throws IOException {
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
// RangeQueryBuilder lastlogintime = QueryBuilders.rangeQuery("createTimeStr");
RangeQueryBuilder lastlogintime = QueryBuilders.rangeQuery("createTimeStr.keyword");
lastlogintime.gte("2020-08-11 00:00:00");
// RangeQueryBuilder lastlogintime = QueryBuilders.rangeQuery("createTime");
// lastlogintime.gte(1597298833674L);
// lastlogintime.gte(1597298833674L).lte(1597298833674L);
boolQueryBuilder.filter(lastlogintime);
sourceBuilder.query(boolQueryBuilder);
sourceBuilder.size(1000);
// sourceBuilder.sort("_id", SortOrder.DESC);
if(objects != null) {
sourceBuilder.searchAfter(objects);
}
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("user");
searchRequest.source(sourceBuilder);
SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHit[] hits = response.getHits().getHits();
return hits;
}
}
© 著作权归作者所有
打赏
点赞 (0)
收藏 (0)
分享
微博
微信
打印
举报
上一篇:
log4j2初遇
下一篇:
Elasticsearch查询过滤解惑
trayvon
开源项目作者
作为一个开源项目作者,是时候站出来拯救世界了!
领取时间:2019/10/24
领取条件:开源项目被开源中国收录的开发者可领取
粉丝 19
博文 195
码字总数 284244
作品 1
程序员
关注
私信
提问
加载中
请先登录后再评论。
删除一条评论
评论删除后,数据将无法恢复
取消
确定
相关文章
最新文章
Netty那点事(三)Channel与Pipeline
Channel是理解和使用Netty的核心。Channel的涉及内容较多,这里我使用由浅入深的介绍方法。在这篇文章中,我们主要介绍Channel部分中Pipeline实现机制。为了避免枯燥,借用一下《盗梦空间》的...
黄亿华
2013/11/24
2W
22
用vertx实现高吞吐量的站点计数器
工具:vertx,redis,mongodb,log4j 源代码地址:https://github.com/jianglibo/visitrank 先看架构图: 如果你不熟悉vertx,请先google一下。我这里将vertx当作一个容器,上面所有的圆圈要...
jianglibo
2014/04/03
4.3K
3
SQLServer实现split分割字符串到列
网上已有人实现sqlserver的split函数可将字符串分割成行,但是我们习惯了split返回数组或者列表,因此这里对其做一些改动,最终实现也许不尽如意,但是也能解决一些问题。 先贴上某大牛写的s...
cwalet
2014/05/21
9.7K
0
Nutch学习笔记4-Nutch 1.7 的 索引篇 ElasticSearch
上一篇讲解了爬取和分析的流程,很重要的收获就是: 解析过程中,会根据页面的ContentType获得一系列的注册解析器, 依次调用每个解析器,当其中一个解析成功后就返回,否则继续执行下一个解...
强子哥哥
2014/06/26
712
0
5分钟 maven3 快速入门指南
前提条件 你首先需要了解如何在电脑上安装软件。如果你不知道如何做到这一点,请询问你办公室,学校里的人,或花钱找人来解释这个给你。 不建议给Maven的服务邮箱来发邮件寻求支持。 安装Mav...
fanl1982
2014/01/23
1.2W
7
没有更多内容
加载失败,请刷新页面
加载更多
下一页
自制超声波驱狗器(第三版)
文档标识符:Ultrasonic_Dog_Repellent_II_T-D-P7 作者:DLHC 最后修改日期:2020.8.13 本文链接: https://www.cnblogs.com/DLHC-TECH/p/Ultrasonic_Dog_Repellent_II_T-D-P7.html “威力”......
osc_t4kk3au7
21分钟前
0
0
测试框架mocha入门
单元测试 今天带你了解下测试框架mocha,这是一个js的测试框架,而且适用于node和浏览器环境。通过它,我们可以为我们模块、组件级别以上的代码编写单元测试用例,保证代码输出质量。 一、安...
字节逆旅
昨天
0
0
ElasticSearch 7.8.1集群搭建
通往集群的大门 集群由什么用? 高可用 高可用(High Availability)是分布式系统架构设计中必须考虑的因素之一,它通常是指,通过设计减少系统不能提供服务的时间。如果系统每运行100个时间...
osc_hwc3munb
22分钟前
13
0
如何面对人生危机?
点击蓝字关注,回复“职场进阶”获取职场进阶精品资料一份 一名读者提问:洋哥,我7年前从大厂出来,创业多年。连续失败,没买车也没房,女朋友也和我分手了,父母也对我失望至极。最近我开始...
张善友
今天
0
0
手写AOP实现过程
一.手写Aop前基础知识 1.aop是什么? 面向切面编程(AOP):是一种编程范式,提供从另一个角度来考虑程序结构从而完善面向对象编程(OOP)。 在进行OOP开发时,都是基于对组件(比如类)进行开发...
osc_qyg23ccq
23分钟前
0
0
没有更多内容
加载失败,请刷新页面
加载更多
下一页
OSCHINA 社区
关于我们
联系我们
合作伙伴
Open API
在线工具
码云 Gitee.com
企业研发管理
CopyCat-代码克隆检测
实用在线工具
微信公众号
OSCHINA APP
聚合全网技术文章,根据你的阅读喜好进行个性推荐
下载 APP
©OSCHINA(OSChina.NET)
工信部
开源软件推进联盟
指定官方社区
深圳市奥思网络科技有限公司版权所有
粤ICP备12009483号
顶部
以上是 Elasticsearch常用查询过滤接口与值得注意的问题 的全部内容, 来源链接: utcz.com/z/535167.html