> 文章列表 > Hive msck 描述

Hive msck 描述

Hive msck 描述

MSCK SQL 语法如下:

MSCK [REPAIR] TABLE table_name [ADD/DROP/SYNC PARTITIONS];

1. 背景

先创建3个分区,把分区文件删除。这时 metastore 有这个3个分区,文件上不存在。再在文件系统上创建其他两个分区,这两个分区在 metastore 不存在。

1.1 初始化SQL

drop table if exists t_drop_partitions;
create table  t_drop_partitions(c1 string)partitioned by (pt string,hr string) stored as textfile location '/tmp/t_drop_partitions';
alter table t_drop_partitions add partition(pt='20230101',hr='00') location '/tmp/t_drop_partitions/pt=20230101/hr=00';
alter table t_drop_partitions add partition(pt='20230101',hr='01') location '/tmp/t_drop_partitions/pt=20230101/hr=01';
alter table t_drop_partitions add partition(pt='20230101',hr='02') location '/tmp/t_drop_partitions/pt=20230101/hr=02';
! hadoop fs -rm -r /tmp/t_drop_partitions/pt=20230101/hr=00;
! hadoop fs -rm -r /tmp/t_drop_partitions/pt=20230101/hr=01;
! hadoop fs -rm -r /tmp/t_drop_partitions/pt=20230101/hr=02;
! hadoop fs -mkdir -p /tmp/t_drop_partitions/pt=20230101/hr=03;
! hadoop fs -mkdir -p /tmp/t_drop_partitions/pt=20230101/hr=04;
  • show partitions t_drop_partitions;
    显示有 3 个分区
hive>  show partitions t_drop_partitions;
OK
pt=20230101/hr=00
pt=20230101/hr=01
pt=20230101/hr=02
  • ls 文件系统显示2个目录
 ! hadoop fs -ls /tmp/t_drop_partitions/pt=20230101;Found 2 items
drwxr-xr-x   - hive hadoop          0 2023-04-23 14:52 /tmp/t_drop_partitions/pt=20230101/hr=03
drwxr-xr-x   - hive hadoop          0 2023-04-23 14:52 /tmp/t_drop_partitions/pt=20230101/hr=04

2. MSCK TABLE

2.1 MSCK TABLE 没有选项

执行初始化SQL。

hive> MSCK TABLE t_drop_partitions;
OK
Partitions not in metastore:	t_drop_partitions:pt=20230101/hr=03	t_drop_partitions:pt=20230101/hr=04
Partitions missing from filesystem:	t_drop_partitions:pt=20230101/hr=00	t_drop_partitions:pt=20230101/hr=01	t_drop_partitions:pt=20230101/hr=02
Time taken: 0.094 seconds, Fetched: 2 row(s)

MSCK TABLE 不做任何操作,仅仅显示 metastore 和 文件系统中不匹配的分区。
Partitions not in metastore: 这些分区存在于文件系统,但是不在 Metastore 。
Partitions missing from filesystem: 这些存在于Metastore,但是不在文件系统 。

  • show partitions 还是显示3个分区
hive> show partitions t_drop_partitions;
OK
pt=20230101/hr=00
pt=20230101/hr=01
pt=20230101/hr=02
Time taken: 0.105 s

2.2 MSCK REPAIR TABLE

MSCK [REPAIR] TABLE 对表的分区进行修复。后面可以根修复的选项,默认是 ADD。
ADD PARTITIONS: 把文件系统上存在,metastore 上不存在的分区添加到 metastore。
DROP PARTITIONS: 把 metastore 上存在,文件系统上不存在的分区从 metastore 删除。
SYNC PARTITIONS: 代表 ADD PARTITIONS 和 DROP PARTITIONS 同时选中。

2.2. 1ADD PARTITIONS

执行初始化SQL。
执行 MSCK REPAIR TABLE。

hive>  MSCK REPAIR TABLE t_drop_partitions;
OK
Partitions not in metastore:	t_drop_partitions:pt=20230101/hr=03	t_drop_partitions:pt=20230101/hr=04
Partitions missing from filesystem:	t_drop_partitions:pt=20230101/hr=00	t_drop_partitions:pt=20230101/hr=01	t_drop_partitions:pt=20230101/hr=02
Repair: Added partition to metastore t_drop_partitions:pt=20230101/hr=03
Repair: Added partition to metastore t_drop_partitions:pt=20230101/hr=04

Repair: Added partition to metastore: 部分显示了 metastore 添加了哪些分区。

hive> show partitions t_drop_partitions;
OK
pt=20230101/hr=00
pt=20230101/hr=01
pt=20230101/hr=02
pt=20230101/hr=03
pt=20230101/hr=04

现在有5个分区。

2.2. DROP PARTITIONS

执行初始化SQL。
执行 MSCK REPAIR TABLE xxx DROP PARTITIONS。

hive>  MSCK REPAIR TABLE t_drop_partitions DROP PARTITIONS;
OK
Partitions not in metastore:	t_drop_partitions:pt=20230101/hr=03	t_drop_partitions:pt=20230101/hr=04
Partitions missing from filesystem:	t_drop_partitions:pt=20230101/hr=00	t_drop_partitions:pt=20230101/hr=01	t_drop_partitions:pt=20230101/hr=02
Repair: Dropped partition from metastore test.t_drop_partitions:pt=20230101/hr=02
Repair: Dropped partition from metastore test.t_drop_partitions:pt=20230101/hr=01
Repair: Dropped partition from metastore test.t_drop_partitions:pt=20230101/hr=00

Repair: Dropped partition: 部分显示了从metastore 删除了哪些分区。

hive> show partitions t_drop_partitions;
OK
Time taken: 0.066 seconds

现在有0个分区。

2.3. SYNC PARTITIONS

执行初始化SQL。
执行 MSCK REPAIR TABLE xxx SYNC PARTITIONS。

Partitions not in metastore:	t_drop_partitions:pt=20230101/hr=03	t_drop_partitions:pt=20230101/hr=04
Partitions missing from filesystem:	t_drop_partitions:pt=20230101/hr=00	t_drop_partitions:pt=20230101/hr=01	t_drop_partitions:pt=20230101/hr=02
Repair: Added partition to metastore t_drop_partitions:pt=20230101/hr=03
Repair: Added partition to metastore t_drop_partitions:pt=20230101/hr=04
Repair: Dropped partition from metastore test.t_drop_partitions:pt=20230101/hr=00
Repair: Dropped partition from metastore test.t_drop_partitions:pt=20230101/hr=02
Repair: Dropped partition from metastore test.t_drop_partitions:pt=20230101/hr=01
Time taken: 0.225 seconds, Fetched: 7 row(s)

Repair: Added partition to metastore: 显示了在 metastore 添加了哪些分区。
Repair: Dropped partition from metastore : 显示了从metastore 删除了哪些分区。

hive> show partitions t_drop_partitions;
OK
Time taken: 0.066 seconds

现在有0个分区。