Hive分区表(static、dynamic)
前言
简单理解分区表就是HDFS中文件夹分而治之,查询的时候可以进一步缩小数据搜索范围。
举个例子,订单表中承载了所有订单业务数据,由于单表查询数据压力很大,所以采用create_time
字段作为分区标识,把当日的数据存在2020.06.11中,次日的数据则存储在2020.06.12中。
查询数据时则只需要添加where字段则可进一步缩小数据的查询范围。
静态分区
建表语句
create table static_partition(id int,
name string,
age int,
sex string
)
comment '静态分区表'
partitioned by (day int)
row format delimited fields terminated by ',';
1、查询static_partition
表数据
hive (xinxing)> select * from static_partition;OK
static_partition.id static_partition.name static_partition.age static_partition.sex static_partition.day
- 由此可见,新增了一个字段
day
,也就是我们刚才的分区字段
2、查询static_partition
表结构
hive (xinxing)> desc formatted static_partition;
- 表结构中新增了分区明细
3、把数据导入至Hive中
1,xinxing,22,boy
2,laocao,24,boy
3,shiqin,19,girl
4,xiaoming,50,boy
5,xiaohong,49,girl
hive (xinxing)> load data local inpath '/home/hadoop/data/student.txt' into table static_partition partition(day = 11);hive (xinxing)> load data local inpath '/home/hadoop/data/student.txt' into table static_partition partition(day = 12);
4、在HDFS
web页查看static_partition
表
- 分区表第11日的路径:/user/hive/warehouse/xinxing.db/static_partition/day=11
- 分区表第12日的路径:/user/hive/warehouse/xinxing.db/static_partition/day=12
5、再次查询static_partition
表数据
hive (xinxing)> select * from static_partition;OK
static_partition.id static_partition.name static_partition.age static_partition.sex static_partition.day
1 xinxing 22 boy 11
2 laocao 24 boy 11
3 shiqin 19 girl 11
4 xiaoming 50 boy 11
5 xiaohong 49 girl 11
1 xinxing 22 boy 12
2 laocao 24 boy 12
3 shiqin 19 girl 12
4 xiaoming 50 boy 12
5 xiaohong 49 girl 12
Time taken: 0.555 seconds, Fetched: 10 row(s)
hive (xinxing)> select * from static_partition where day=12;
OK
static_partition.id static_partition.name static_partition.age static_partition.sex static_partition.day
1 xinxing 22 boy 12
2 laocao 24 boy 12
3 shiqin 19 girl 12
4 xiaoming 50 boy 12
5 xiaohong 49 girl 12
Time taken: 0.239 seconds, Fetched: 5 row(s)
hive (xinxing)>
动态分区
为什么要使用动态分区。分区是按天的,我不可能说365天每天都手动去分区。如果需求除了按天分区还要按小时分区呢,那我们什么都不要做了,一天到晚维护这张表就行。所有我们要使用动态分区。
建表语句
create table dynamic_partition(id int,
name string,
age int,
sex string
)
comment '动态分区表'
partitioned by (day int)
row format delimited fields terminated by ',';
1、查询dynamic_partition
表数据
hive (xinxing)> select * from dynamic_partition;OK
dynamic_partition.id dynamic_partition.name dynamic_partition.age dynamic_partition.sex dynamic_partition.day
Time taken: 0.049 seconds
- 好像跟静态分区没什么区别
2、把数据导入至Hive中
hive (xinxing)> load data inpath '/data/student.txt' into table static_partition partition(day);
- 细节来了,导入报错了
FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
失败:SemanticException [错误10096]:动态分区严格模式至少需要一个静态分区列。要关闭此功能,请设置hive.exec.dynamic.partition.mode = nonstrict
查看默认分区模式
hive (xinxing)> set hive.exec.dynamic.partition.mode;hive.exec.dynamic.partition.mode=strict
更改分区模式从新执行
hive (xinxing)>set hive.exec.dynamic.partition.mode=nonstrict;hive (xinxing)> load data inpath '/data/student.txt' into table static_partition partition(day);
- 再次报错
hive (xinxing)> load data inpath '/data/student.txt' into table static_partition partition(day); FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Invalid partition key & values; keys [day, ], values [])
MetaException(消息:无效的分区键和值;键[day,],值[]) - 看来动态分区并不能通过load data的方式写入数据
3、查询表数据以及dynamic_partition
表的HDFS目录结构
- 先查询一次,好对比
hive (xinxing)> select id,name,age,sex,day from dynamic_partition;OK
id name age sex day
Time taken: 0.041 seconds
- 啥也没有,干净得很
4、使用insert来把查询到的数据插入到动态分区表中
hive (xinxing)> insert into table dynamic_partition partition(day) select id,name,age,sex,day from static_partition;
此处不一定非要查
static_partition
表,普通表也可以,我偷懒,直接拿过来用。此步骤执行是要跑Map的。
5、再次查询表数据以及dynamic_partition
表的HDFS目录结构
hive (xinxing)> select * from static_partition;OK
static_partition.id static_partition.name static_partition.age static_partition.sex static_partition.day
1 xinxing 22 boy 11
2 laocao 24 boy 11
3 shiqin 19 girl 11
4 xiaoming 50 boy 11
5 xiaohong 49 girl 11
1 xinxing 22 boy 12
2 laocao 24 boy 12
3 shiqin 19 girl 12
4 xiaoming 50 boy 12
5 xiaohong 49 girl 12
Time taken: 0.555 seconds, Fetched: 10 row(s)
- 是不是来了,数据都有了,搞定
多分区字段
create table dynamic_partition(id int,
name string,
age int,
sex string
)
comment '动态分区表'
partitioned by (day int, hour int)
row format delimited fields terminated by ',';
- 没什么区别,就是在天的基础上加个小时字段,我就不在这里演示。
删除分区
单分区
alter table static_partition drop partition(day=11);
多分区
删除11号当天24点这一个小时的数据
alter table static_partition drop partition(day=11,hour=24);
删除11号的所有数据
alter table static_partition drop partition(day=11);
删除每一天24点这一小时的数据
(一个月30天全部的24点数据都会被删除)
alter table static_partition drop partition(hour=24);
!!注意:不能直接把day这个分区字段直接删除
以上是 Hive分区表(static、dynamic) 的全部内容, 来源链接: utcz.com/a/23536.html