Druid0.17入门(4)——数据查询方式大全
本文介绍Druid查询数据的方式,首先我们保证数据已经成功载入。
Druid查询基于HTTP,Druid提供了查询视图,并对结果进行了格式化。
Druid提供了三种查询方式,SQL,原生JSON,CURL。
一、SQL查询
我们用wiki的数据为例
查询10条最多的页面编辑
SELECT page, COUNT(*) AS EditsFROM wikipedia
WHERE TIMESTAMP "2015-09-12 00:00:00" <= "__time" AND "__time" < TIMESTAMP "2015-09-13 00:00:00"
GROUP BY page
ORDER BY Edits DESC
LIMIT 10
我们在Query视图中操作
会有提示
选择Smart query limit会自动限制行数
Druid还提供了命令行查询sql 可以运行bin/dsql进行操作
Welcome to dsql, the command-line client for Druid SQL.Type "h" for help.
dsql>
提交sql
dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP "2015-09-12 00:00:00" AND TIMESTAMP "2015-09-13 00:00:00" GROUP BY page ORDER BY Edits DESC LIMIT 10;┌──────────────────────────────────────────────────────────┬───────┐
│ page │ Edits │
├──────────────────────────────────────────────────────────┼───────┤
│ Wikipedia:Vandalismusmeldung │ 33 │
│ User:Cyde/List of candidates for speedy deletion/Subpage │ 28 │
│ Jeremy Corbyn │ 27 │
│ Wikipedia:Administrators" noticeboard/Incidents │ 21 │
│ Flavia Pennetta │ 20 │
│ Total Drama Presents: The Ridonculous Race │ 18 │
│ User talk:Dudeperson176123 │ 18 │
│ Wikipédia:Le Bistro/12 septembre 2015 │ 18 │
│ Wikipedia:In the news/Candidates │ 17 │
│ Wikipedia:Requests for page protection │ 17 │
└──────────────────────────────────────────────────────────┴───────┘
Retrieved 10 rows in 0.06s.
还可以通过Http发送SQL
curl -X "POST" -H "Content-Type:application/json" -d @quickstart/tutorial/wikipedia-top-pages-sql.json http://localhost:8888/druid/v2/sql
可以得到如下结果
[ {
"page": "Wikipedia:Vandalismusmeldung",
"Edits": 33
},
{
"page": "User:Cyde/List of candidates for speedy deletion/Subpage",
"Edits": 28
},
{
"page": "Jeremy Corbyn",
"Edits": 27
},
{
"page": "Wikipedia:Administrators" noticeboard/Incidents",
"Edits": 21
},
{
"page": "Flavia Pennetta",
"Edits": 20
},
{
"page": "Total Drama Presents: The Ridonculous Race",
"Edits": 18
},
{
"page": "User talk:Dudeperson176123",
"Edits": 18
},
{
"page": "Wikipédia:Le Bistro/12 septembre 2015",
"Edits": 18
},
{
"page": "Wikipedia:In the news/Candidates",
"Edits": 17
},
{
"page": "Wikipedia:Requests for page protection",
"Edits": 17
}
]
更多SQL示例
时间查询
SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeletedFROM wikipedia WHERE "__time" BETWEEN TIMESTAMP "2015-09-12 00:00:00" AND TIMESTAMP "2015-09-13 00:00:00"
GROUP BY 1
分组查询
SELECT channel, page, SUM(added)FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP "2015-09-12 00:00:00" AND TIMESTAMP "2015-09-13 00:00:00"
GROUP BY channel, page
ORDER BY SUM(added) DESC
查询原始数据
SELECT user, pageFROM wikipedia WHERE "__time" BETWEEN TIMESTAMP "2015-09-12 02:00:00" AND TIMESTAMP "2015-09-12 03:00:00"
LIMIT 5
定时查询
也可以在dsql里操作
dsql> EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP "2015-09-12 00:00:00" AND TIMESTAMP "2015-09-13 00:00:00" GROUP BY page ORDER BY Edits DESC LIMIT 10;│ DruidQueryRel(query=[{"queryType":"topN","dataSource":{"type":"table","name":"wikipedia"},"virtualColumns":[],"dimension":{"type":"default","dimension":"page","outputName":"d0","outputType":"STRING"},"metric":{"type":"numeric","metric":"a0"},"threshold":10,"intervals":{"type":"intervals","intervals":["2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.001Z"]},"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count","name":"a0"}],"postAggregations":[],"context":{},"descending":false}], signature=[{d0:STRING, a0:LONG}]) │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Retrieved 1 row in 0.03s.
二、原生JSON查询
Druid支持基于Json的查询
{ "queryType" : "topN",
"dataSource" : "wikipedia",
"intervals" : ["2015-09-12/2015-09-13"],
"granularity" : "all",
"dimension" : "page",
"metric" : "count",
"threshold" : 10,
"aggregations" : [
{
"type" : "count",
"name" : "count"
}
]
}
把json粘贴到json 查询模式窗口
Json查询是通过向router和broker发送请求
curl -X POST "<queryable_host>:<port>/druid/v2/?pretty" -H "Content-Type:application/json" -H "Accept:application/json" -d @<query_json_file>
Druid提供了丰富的查询方式
Aggregation查询
Timeseries查询
{ "queryType": "timeseries",
"dataSource": "sample_datasource",
"granularity": "day",
"descending": "true",
"filter": {
"type": "and",
"fields": [
{ "type": "selector", "dimension": "sample_dimension1", "value": "sample_value1" },
{ "type": "or",
"fields": [
{ "type": "selector", "dimension": "sample_dimension2", "value": "sample_value2" },
{ "type": "selector", "dimension": "sample_dimension3", "value": "sample_value3" }
]
}
]
},
"aggregations": [
{ "type": "longSum", "name": "sample_name1", "fieldName": "sample_fieldName1" },
{ "type": "doubleSum", "name": "sample_name2", "fieldName": "sample_fieldName2" }
],
"postAggregations": [
{ "type": "arithmetic",
"name": "sample_divide",
"fn": "/",
"fields": [
{ "type": "fieldAccess", "name": "postAgg__sample_name1", "fieldName": "sample_name1" },
{ "type": "fieldAccess", "name": "postAgg__sample_name2", "fieldName": "sample_name2" }
]
}
],
"intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ]
}
TopN查询
{ "queryType": "topN",
"dataSource": "sample_data",
"dimension": "sample_dim",
"threshold": 5,
"metric": "count",
"granularity": "all",
"filter": {
"type": "and",
"fields": [
{
"type": "selector",
"dimension": "dim1",
"value": "some_value"
},
{
"type": "selector",
"dimension": "dim2",
"value": "some_other_val"
}
]
},
"aggregations": [
{
"type": "longSum",
"name": "count",
"fieldName": "count"
},
{
"type": "doubleSum",
"name": "some_metric",
"fieldName": "some_metric"
}
],
"postAggregations": [
{
"type": "arithmetic",
"name": "average",
"fn": "/",
"fields": [
{
"type": "fieldAccess",
"name": "some_metric",
"fieldName": "some_metric"
},
{
"type": "fieldAccess",
"name": "count",
"fieldName": "count"
}
]
}
],
"intervals": [
"2013-08-31T00:00:00.000/2013-09-03T00:00:00.000"
]
}
GroupBy查询
{ "queryType": "groupBy",
"dataSource": "sample_datasource",
"granularity": "day",
"dimensions": ["country", "device"],
"limitSpec": { "type": "default", "limit": 5000, "columns": ["country", "data_transfer"] },
"filter": {
"type": "and",
"fields": [
{ "type": "selector", "dimension": "carrier", "value": "AT&T" },
{ "type": "or",
"fields": [
{ "type": "selector", "dimension": "make", "value": "Apple" },
{ "type": "selector", "dimension": "make", "value": "Samsung" }
]
}
]
},
"aggregations": [
{ "type": "longSum", "name": "total_usage", "fieldName": "user_count" },
{ "type": "doubleSum", "name": "data_transfer", "fieldName": "data_transfer" }
],
"postAggregations": [
{ "type": "arithmetic",
"name": "avg_usage",
"fn": "/",
"fields": [
{ "type": "fieldAccess", "fieldName": "data_transfer" },
{ "type": "fieldAccess", "fieldName": "total_usage" }
]
}
],
"intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ],
"having": {
"type": "greaterThan",
"aggregation": "total_usage",
"value": 100
}
}
Metadata查询
TimeBoundary 查询
{ "queryType" : "timeBoundary",
"dataSource": "sample_datasource",
"bound" : < "maxTime" | "minTime" > # optional, defaults to returning both timestamps if not set
"filter" : { "type": "and", "fields": [<filter>, <filter>, ...] } # optional
}
SegmentMetadata查询
{ "queryType":"segmentMetadata",
"dataSource":"sample_datasource",
"intervals":["2013-01-01/2014-01-01"]
}
DatasourceMetadata查询
{ "queryType" : "dataSourceMetadata",
"dataSource": "sample_datasource"
}
Search查询
{ "queryType": "search",
"dataSource": "sample_datasource",
"granularity": "day",
"searchDimensions": [
"dim1",
"dim2"
],
"query": {
"type": "insensitive_contains",
"value": "Ke"
},
"sort" : {
"type": "lexicographic"
},
"intervals": [
"2013-01-01T00:00:00.000/2013-01-03T00:00:00.000"
]
}
查询建议
用Timeseries和TopN替代GroupBy
取消查询
DELETE /druid/v2/{queryId}
curl -X DELETE "http://host:port/druid/v2/abc123"
查询失败
{ "error" : "Query timeout",
"errorMessage" : "Timeout waiting for task.",
"errorClass" : "java.util.concurrent.TimeoutException",
"host" : "druid1.example.com:8083"
}
三、CURL
基于Http的查询
curl -X "POST" -H "Content-Type:application/json" -d @quickstart/tutorial/wikipedia-top-pages.json http://localhost:8888/druid/v2?pretty
四、客户端查询
客户端查询是基于json的
具体查看 https://druid.apache.org/libraries.html
比如python查询的pydruid
from pydruid.client import *from pylab import plt
query = PyDruid(druid_url_goes_here, "druid/v2")
ts = query.timeseries(
datasource="twitterstream",
granularity="day",
intervals="2014-02-02/p4w",
aggregations={"length": doublesum("tweet_length"), "count": doublesum("count")},
post_aggregations={"avg_tweet_length": (Field("length") / Field("count"))},
filter=Dimension("first_hashtag") == "sochi2014"
)
df = query.export_pandas()
df["timestamp"] = df["timestamp"].map(lambda x: x.split("T")[0])
df.plot(x="timestamp", y="avg_tweet_length", ylim=(80, 140), rot=20,
title="Sochi 2014")
plt.ylabel("avg tweet length (chars)")
plt.show()
实时流式计算整理了Druid入门指南
持续更新中~
更多实时数据分析相关博文与科技资讯,欢迎关注 “实时流式计算”
获取《Druid实时大数据分析》电子书,请在公号后台回复 “Druid”
以上是 Druid0.17入门(4)——数据查询方式大全 的全部内容, 来源链接: utcz.com/z/533655.html