> 文章列表 > CollapsingMergeTree

CollapsingMergeTree

CollapsingMergeTree

CollapsingMergeTree就是⼀种通过以增代删的思路,⽀持⾏级数据修改和删除的表引擎。它通过定义⼀个sign标记字段,记录数据⾏ 的状态

如果sign标记为1,则表示这是⼀⾏有效的数据;

如果sign标记为-1,则表示这⾏数据需要被删除。

当CollapsingMergeTree分区合并时,同⼀数据分区内排序相同,sign标记为1和-1的⼀组数据会被抵消删除。 这种1和-1相互抵消的操作,犹如将⼀张瓦楞纸折叠了⼀般。这种直观 的⽐喻,想必也正是折叠合并树(CollapsingMergeTree)名称的由来。

  • 多⾏的排序相同的状态为1的数据会折叠成⼀⾏ , 保留最后⼀⾏;
  • 两⾏排序相同的数据, 状态为 1 和 -1 删除这两⾏数据

1、正常删除

drop table if exists tb_cps_merge_tree1;
CREATE TABLE tb_cps_merge_tree1
(user_id UInt64,name    String,age     UInt8,sign    Int8
)ENGINE = CollapsingMergeTree(sign)ORDER BY user_id;
-- 插⼊数据
insert into tb_cps_merge_tree1 values(1,'xiaoluo',23,1),(2,'xiaoyu',24,1),(3,'xiaofeng',25,1) ;
insert into tb_cps_merge_tree1 values(1,'xiaoluo_',23,-1),(2,'xiaoyu_',24,-1),(3,'xiaofeng2',25,1) ;
-- 合并优化
optimize table tb_cps_merge_tree1 ;
-- 实现了数据的删除和已经存在数据的更新
SELECT *
FROM tb_cps_merge_tree1

CollapsingMergeTree虽然解决了主键相同的数据即时删除的问题,但是状态持续变化且多线程并⾏写⼊情况下,状态⾏与取消⾏位置可能乱序,导致⽆法正常折叠。只有保证⽼的状态⾏在在取消⾏的上⾯, 新的状态⾏在取消⾏的下⾯! 但是多线程⽆法保证写的顺序!


2、未正常删除 

drop table if exists tb_cps_merge_tree2;
CREATE TABLE tb_cps_merge_tree2
(user_id UInt64,name    String,age     UInt8,sign    Int8
)ENGINE = CollapsingMergeTree(sign)ORDER BY user_id;insert into tb_cps_merge_tree2 values(1,'xiaoluo_',23,-1),(2,'xiaoyu_',24,-1),(3,'xiaofeng2',25,1) ;
insert into tb_cps_merge_tree2 values(1,'xiaoluo',23,1),(2,'xiaoyu',24,1),(3,'xiaofeng',25,1) ;
-- 合并优化
optimize table tb_cps_merge_tree2 ;select * from tb_cps_merge_tree2;


3、数据更新

假如有相同的排序数据,并且状态都是1,可以实现数据的更新 ,

drop table if exists tb_cps_merge_tree2;
CREATE TABLE tb_cps_merge_tree2
(user_id UInt64,name    String,age     UInt8,sign    Int8
)ENGINE = CollapsingMergeTree(sign)ORDER BY user_id;insert into tb_cps_merge_tree2 values(1,'xiaoluo_',23,1),(2,'xiaoyu_',24,1),(3,'xiaofeng2',25,1) ;
insert into tb_cps_merge_tree2 values(1,'xiaoluo',23,1),(2,'xiaoyu',24,1),(3,'xiaofeng',25,1) ;
-- 合并优化
optimize table tb_cps_merge_tree2 ;select * from tb_cps_merge_tree2;


4、利用查询实现强制删除效果

如果我们不能保证折叠的⾏在状态⾏的下⾯,数据⽆法保证可以正常删除 

----查询正确的数据
select tb_cps_merge_tree2.*
from tb_cps_merge_tree2join(select user_id,sum(sign) as sum_signfrom tb_cps_merge_tree2group by user_idhaving sum_sign = 1) ton tb_cps_merge_tree2.user_id = t.user_id;