Hive分区表(static、dynamic)

前言

简单理解分区表就是HDFS中文件夹分而治之,查询的时候可以进一步缩小数据搜索范围。

举个例子,订单表中承载了所有订单业务数据,由于单表查询数据压力很大,所以采用create_time字段作为分区标识,把当日的数据存在2020.06.11中,次日的数据则存储在2020.06.12中。

查询数据时则只需要添加where字段则可进一步缩小数据的查询范围。

静态分区

建表语句

create table static_partition(

id int,

name string,

age int,

sex string

)

comment '静态分区表'

partitioned by (day int)

row format delimited fields terminated by ',';

1、查询static_partition表数据

hive (xinxing)> select * from static_partition;

OK

static_partition.id static_partition.name static_partition.age static_partition.sex static_partition.day

  • 由此可见,新增了一个字段day,也就是我们刚才的分区字段

2、查询static_partition表结构

hive (xinxing)> desc formatted static_partition;

  • 表结构中新增了分区明细

3、把数据导入至Hive中

1,xinxing,22,boy

2,laocao,24,boy

3,shiqin,19,girl

4,xiaoming,50,boy

5,xiaohong,49,girl

hive (xinxing)> load data local inpath '/home/hadoop/data/student.txt' into table static_partition partition(day = 11);

hive (xinxing)> load data local inpath '/home/hadoop/data/student.txt' into table static_partition partition(day = 12);

4、在HDFSweb页查看static_partition

  • 分区表第11日的路径:/user/hive/warehouse/xinxing.db/static_partition/day=11
  • 分区表第12日的路径:/user/hive/warehouse/xinxing.db/static_partition/day=12

5、再次查询static_partition表数据

hive (xinxing)> select * from static_partition;

OK

static_partition.id static_partition.name static_partition.age static_partition.sex static_partition.day

1 xinxing 22 boy 11

2 laocao 24 boy 11

3 shiqin 19 girl 11

4 xiaoming 50 boy 11

5 xiaohong 49 girl 11

1 xinxing 22 boy 12

2 laocao 24 boy 12

3 shiqin 19 girl 12

4 xiaoming 50 boy 12

5 xiaohong 49 girl 12

Time taken: 0.555 seconds, Fetched: 10 row(s)

hive (xinxing)> select * from static_partition where day=12;

OK

static_partition.id static_partition.name static_partition.age static_partition.sex static_partition.day

1 xinxing 22 boy 12

2 laocao 24 boy 12

3 shiqin 19 girl 12

4 xiaoming 50 boy 12

5 xiaohong 49 girl 12

Time taken: 0.239 seconds, Fetched: 5 row(s)

hive (xinxing)>

动态分区

为什么要使用动态分区。分区是按天的,我不可能说365天每天都手动去分区。如果需求除了按天分区还要按小时分区呢,那我们什么都不要做了,一天到晚维护这张表就行。所有我们要使用动态分区。

建表语句

create table dynamic_partition(

id int,

name string,

age int,

sex string

)

comment '动态分区表'

partitioned by (day int)

row format delimited fields terminated by ',';

1、查询dynamic_partition表数据

hive (xinxing)> select * from dynamic_partition;

OK

dynamic_partition.id dynamic_partition.name dynamic_partition.age dynamic_partition.sex dynamic_partition.day

Time taken: 0.049 seconds

  • 好像跟静态分区没什么区别

2、把数据导入至Hive中

hive (xinxing)> load data  inpath '/data/student.txt' into table static_partition partition(day);

  • 细节来了,导入报错了FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
    失败:SemanticException [错误10096]:动态分区严格模式至少需要一个静态分区列。要关闭此功能,请设置hive.exec.dynamic.partition.mode = nonstrict

查看默认分区模式

hive (xinxing)> set hive.exec.dynamic.partition.mode;

hive.exec.dynamic.partition.mode=strict

更改分区模式从新执行

hive (xinxing)>set hive.exec.dynamic.partition.mode=nonstrict;

hive (xinxing)> load data inpath '/data/student.txt' into table static_partition partition(day);

  • 再次报错hive (xinxing)> load data inpath '/data/student.txt' into table static_partition partition(day); FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Invalid partition key & values; keys [day, ], values [])MetaException(消息:无效的分区键和值;键[day,],值[])
  • 看来动态分区并不能通过load data的方式写入数据

3、查询表数据以及dynamic_partition表的HDFS目录结构

  • 先查询一次,好对比

hive (xinxing)> select id,name,age,sex,day from dynamic_partition;

OK

id name age sex day

Time taken: 0.041 seconds

  • 啥也没有,干净得很

4、使用insert来把查询到的数据插入到动态分区表中

hive (xinxing)> insert into table dynamic_partition partition(day)  select id,name,age,sex,day from static_partition;

此处不一定非要查static_partition表,普通表也可以,我偷懒,直接拿过来用。

此步骤执行是要跑Map的。

5、再次查询表数据以及dynamic_partition表的HDFS目录结构

hive (xinxing)> select * from static_partition;

OK

static_partition.id static_partition.name static_partition.age static_partition.sex static_partition.day

1 xinxing 22 boy 11

2 laocao 24 boy 11

3 shiqin 19 girl 11

4 xiaoming 50 boy 11

5 xiaohong 49 girl 11

1 xinxing 22 boy 12

2 laocao 24 boy 12

3 shiqin 19 girl 12

4 xiaoming 50 boy 12

5 xiaohong 49 girl 12

Time taken: 0.555 seconds, Fetched: 10 row(s)

  • 是不是来了,数据都有了,搞定

多分区字段

create table dynamic_partition(

id int,

name string,

age int,

sex string

)

comment '动态分区表'

partitioned by (day int, hour int)

row format delimited fields terminated by ',';

  • 没什么区别,就是在天的基础上加个小时字段,我就不在这里演示。

删除分区

单分区

alter table static_partition drop partition(day=11);

多分区

删除11号当天24点这一个小时的数据

alter table static_partition drop partition(day=11,hour=24);


删除11号的所有数据

alter table static_partition drop partition(day=11);


删除每一天24点这一小时的数据

(一个月30天全部的24点数据都会被删除)

alter table static_partition drop partition(hour=24);


!!注意:不能直接把day这个分区字段直接删除

以上是 Hive分区表(static、dynamic) 的全部内容, 来源链接: utcz.com/a/23536.html

回到顶部