Druid0.17入门(4)——数据查询方式大全

database

本文介绍Druid查询数据的方式,首先我们保证数据已经成功载入。

Druid查询基于HTTP,Druid提供了查询视图,并对结果进行了格式化。

Druid提供了三种查询方式,SQL,原生JSON,CURL。

一、SQL查询

我们用wiki的数据为例

查询10条最多的页面编辑

SELECT page, COUNT(*) AS Edits

FROM wikipedia

WHERE TIMESTAMP "2015-09-12 00:00:00" <= "__time" AND "__time" < TIMESTAMP "2015-09-13 00:00:00"

GROUP BY page

ORDER BY Edits DESC

LIMIT 10

我们在Query视图中操作

会有提示

选择Smart query limit会自动限制行数

Druid还提供了命令行查询sql 可以运行bin/dsql进行操作

Welcome to dsql, the command-line client for Druid SQL.

Type "h" for help.

dsql>

提交sql

dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP "2015-09-12 00:00:00" AND TIMESTAMP "2015-09-13 00:00:00" GROUP BY page ORDER BY Edits DESC LIMIT 10;

┌──────────────────────────────────────────────────────────┬───────┐

│ page │ Edits │

├──────────────────────────────────────────────────────────┼───────┤

│ Wikipedia:Vandalismusmeldung │ 33 │

│ User:Cyde/List of candidates for speedy deletion/Subpage │ 28 │

│ Jeremy Corbyn │ 27 │

│ Wikipedia:Administrators" noticeboard/Incidents │ 21 │

│ Flavia Pennetta │ 20 │

│ Total Drama Presents: The Ridonculous Race │ 18 │

│ User talk:Dudeperson176123 │ 18 │

│ Wikipédia:Le Bistro/12 septembre 2015 │ 18 │

│ Wikipedia:In the news/Candidates │ 17 │

│ Wikipedia:Requests for page protection │ 17 │

└──────────────────────────────────────────────────────────┴───────┘

Retrieved 10 rows in 0.06s.

还可以通过Http发送SQL

curl -X "POST" -H "Content-Type:application/json" -d @quickstart/tutorial/wikipedia-top-pages-sql.json http://localhost:8888/druid/v2/sql

可以得到如下结果

[

{

"page": "Wikipedia:Vandalismusmeldung",

"Edits": 33

},

{

"page": "User:Cyde/List of candidates for speedy deletion/Subpage",

"Edits": 28

},

{

"page": "Jeremy Corbyn",

"Edits": 27

},

{

"page": "Wikipedia:Administrators" noticeboard/Incidents",

"Edits": 21

},

{

"page": "Flavia Pennetta",

"Edits": 20

},

{

"page": "Total Drama Presents: The Ridonculous Race",

"Edits": 18

},

{

"page": "User talk:Dudeperson176123",

"Edits": 18

},

{

"page": "Wikipédia:Le Bistro/12 septembre 2015",

"Edits": 18

},

{

"page": "Wikipedia:In the news/Candidates",

"Edits": 17

},

{

"page": "Wikipedia:Requests for page protection",

"Edits": 17

}

]

更多SQL示例

时间查询

SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted

FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP "2015-09-12 00:00:00" AND TIMESTAMP "2015-09-13 00:00:00"

GROUP BY 1

分组查询

SELECT channel, page, SUM(added)

FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP "2015-09-12 00:00:00" AND TIMESTAMP "2015-09-13 00:00:00"

GROUP BY channel, page

ORDER BY SUM(added) DESC

查询原始数据

SELECT user, page

FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP "2015-09-12 02:00:00" AND TIMESTAMP "2015-09-12 03:00:00"

LIMIT 5

定时查询

也可以在dsql里操作

dsql> EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP "2015-09-12 00:00:00" AND TIMESTAMP "2015-09-13 00:00:00" GROUP BY page ORDER BY Edits DESC LIMIT 10;

│ DruidQueryRel(query=[{"queryType":"topN","dataSource":{"type":"table","name":"wikipedia"},"virtualColumns":[],"dimension":{"type":"default","dimension":"page","outputName":"d0","outputType":"STRING"},"metric":{"type":"numeric","metric":"a0"},"threshold":10,"intervals":{"type":"intervals","intervals":["2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.001Z"]},"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count","name":"a0"}],"postAggregations":[],"context":{},"descending":false}], signature=[{d0:STRING, a0:LONG}]) │

└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Retrieved 1 row in 0.03s.

二、原生JSON查询

Druid支持基于Json的查询

{

"queryType" : "topN",

"dataSource" : "wikipedia",

"intervals" : ["2015-09-12/2015-09-13"],

"granularity" : "all",

"dimension" : "page",

"metric" : "count",

"threshold" : 10,

"aggregations" : [

{

"type" : "count",

"name" : "count"

}

]

}

把json粘贴到json 查询模式窗口

Json查询是通过向router和broker发送请求

curl -X POST "<queryable_host>:<port>/druid/v2/?pretty" -H "Content-Type:application/json" -H "Accept:application/json" -d @<query_json_file>

Druid提供了丰富的查询方式

Aggregation查询

Timeseries查询

{

"queryType": "timeseries",

"dataSource": "sample_datasource",

"granularity": "day",

"descending": "true",

"filter": {

"type": "and",

"fields": [

{ "type": "selector", "dimension": "sample_dimension1", "value": "sample_value1" },

{ "type": "or",

"fields": [

{ "type": "selector", "dimension": "sample_dimension2", "value": "sample_value2" },

{ "type": "selector", "dimension": "sample_dimension3", "value": "sample_value3" }

]

}

]

},

"aggregations": [

{ "type": "longSum", "name": "sample_name1", "fieldName": "sample_fieldName1" },

{ "type": "doubleSum", "name": "sample_name2", "fieldName": "sample_fieldName2" }

],

"postAggregations": [

{ "type": "arithmetic",

"name": "sample_divide",

"fn": "/",

"fields": [

{ "type": "fieldAccess", "name": "postAgg__sample_name1", "fieldName": "sample_name1" },

{ "type": "fieldAccess", "name": "postAgg__sample_name2", "fieldName": "sample_name2" }

]

}

],

"intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ]

}

TopN查询

{

"queryType": "topN",

"dataSource": "sample_data",

"dimension": "sample_dim",

"threshold": 5,

"metric": "count",

"granularity": "all",

"filter": {

"type": "and",

"fields": [

{

"type": "selector",

"dimension": "dim1",

"value": "some_value"

},

{

"type": "selector",

"dimension": "dim2",

"value": "some_other_val"

}

]

},

"aggregations": [

{

"type": "longSum",

"name": "count",

"fieldName": "count"

},

{

"type": "doubleSum",

"name": "some_metric",

"fieldName": "some_metric"

}

],

"postAggregations": [

{

"type": "arithmetic",

"name": "average",

"fn": "/",

"fields": [

{

"type": "fieldAccess",

"name": "some_metric",

"fieldName": "some_metric"

},

{

"type": "fieldAccess",

"name": "count",

"fieldName": "count"

}

]

}

],

"intervals": [

"2013-08-31T00:00:00.000/2013-09-03T00:00:00.000"

]

}

GroupBy查询

{

"queryType": "groupBy",

"dataSource": "sample_datasource",

"granularity": "day",

"dimensions": ["country", "device"],

"limitSpec": { "type": "default", "limit": 5000, "columns": ["country", "data_transfer"] },

"filter": {

"type": "and",

"fields": [

{ "type": "selector", "dimension": "carrier", "value": "AT&T" },

{ "type": "or",

"fields": [

{ "type": "selector", "dimension": "make", "value": "Apple" },

{ "type": "selector", "dimension": "make", "value": "Samsung" }

]

}

]

},

"aggregations": [

{ "type": "longSum", "name": "total_usage", "fieldName": "user_count" },

{ "type": "doubleSum", "name": "data_transfer", "fieldName": "data_transfer" }

],

"postAggregations": [

{ "type": "arithmetic",

"name": "avg_usage",

"fn": "/",

"fields": [

{ "type": "fieldAccess", "fieldName": "data_transfer" },

{ "type": "fieldAccess", "fieldName": "total_usage" }

]

}

],

"intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ],

"having": {

"type": "greaterThan",

"aggregation": "total_usage",

"value": 100

}

}

Metadata查询

TimeBoundary 查询

{

"queryType" : "timeBoundary",

"dataSource": "sample_datasource",

"bound" : < "maxTime" | "minTime" > # optional, defaults to returning both timestamps if not set

"filter" : { "type": "and", "fields": [<filter>, <filter>, ...] } # optional

}

SegmentMetadata查询

{

"queryType":"segmentMetadata",

"dataSource":"sample_datasource",

"intervals":["2013-01-01/2014-01-01"]

}

DatasourceMetadata查询

{

"queryType" : "dataSourceMetadata",

"dataSource": "sample_datasource"

}

Search查询

{

"queryType": "search",

"dataSource": "sample_datasource",

"granularity": "day",

"searchDimensions": [

"dim1",

"dim2"

],

"query": {

"type": "insensitive_contains",

"value": "Ke"

},

"sort" : {

"type": "lexicographic"

},

"intervals": [

"2013-01-01T00:00:00.000/2013-01-03T00:00:00.000"

]

}

查询建议

用Timeseries和TopN替代GroupBy

取消查询

DELETE /druid/v2/{queryId}

curl -X DELETE "http://host:port/druid/v2/abc123"

查询失败

{

"error" : "Query timeout",

"errorMessage" : "Timeout waiting for task.",

"errorClass" : "java.util.concurrent.TimeoutException",

"host" : "druid1.example.com:8083"

}

三、CURL

基于Http的查询

curl -X "POST" -H "Content-Type:application/json" -d @quickstart/tutorial/wikipedia-top-pages.json http://localhost:8888/druid/v2?pretty

四、客户端查询

客户端查询是基于json的

具体查看 https://druid.apache.org/libraries.html

比如python查询的pydruid

from pydruid.client import *

from pylab import plt

query = PyDruid(druid_url_goes_here, "druid/v2")

ts = query.timeseries(

datasource="twitterstream",

granularity="day",

intervals="2014-02-02/p4w",

aggregations={"length": doublesum("tweet_length"), "count": doublesum("count")},

post_aggregations={"avg_tweet_length": (Field("length") / Field("count"))},

filter=Dimension("first_hashtag") == "sochi2014"

)

df = query.export_pandas()

df["timestamp"] = df["timestamp"].map(lambda x: x.split("T")[0])

df.plot(x="timestamp", y="avg_tweet_length", ylim=(80, 140), rot=20,

title="Sochi 2014")

plt.ylabel("avg tweet length (chars)")

plt.show()

实时流式计算整理了Druid入门指南

持续更新中~

更多实时数据分析相关博文与科技资讯,欢迎关注 “实时流式计算”

获取《Druid实时大数据分析》电子书,请在公号后台回复 “Druid”

以上是 Druid0.17入门(4)——数据查询方式大全 的全部内容, 来源链接: utcz.com/z/533655.html

回到顶部