CollapsingMergeTree
CollapsingMergeTree就是⼀种通过以增代删的思路,⽀持⾏级数据修改和删除的表引擎。它通过定义⼀个sign标记位字段,记录数据⾏ 的状态。
如果sign标记为1,则表示这是⼀⾏有效的数据;
如果sign标记为-1,则表示这⾏数据需要被删除。
当CollapsingMergeTree分区合并时,同⼀数据分区内排序相同,sign标记为1和-1的⼀组数据会被抵消删除。 这种1和-1相互抵消的操作,犹如将⼀张瓦楞纸折叠了⼀般。这种直观 的⽐喻,想必也正是折叠合并树(CollapsingMergeTree)名称的由来。
- 多⾏的排序相同的状态为1的数据会折叠成⼀⾏ , 保留最后⼀⾏;
- 两⾏排序相同的数据, 状态为 1 和 -1 删除这两⾏数据;
1、正常删除
drop table if exists tb_cps_merge_tree1;
CREATE TABLE tb_cps_merge_tree1
(user_id UInt64,name String,age UInt8,sign Int8
)ENGINE = CollapsingMergeTree(sign)ORDER BY user_id;
-- 插⼊数据
insert into tb_cps_merge_tree1 values(1,'xiaoluo',23,1),(2,'xiaoyu',24,1),(3,'xiaofeng',25,1) ;
insert into tb_cps_merge_tree1 values(1,'xiaoluo_',23,-1),(2,'xiaoyu_',24,-1),(3,'xiaofeng2',25,1) ;
-- 合并优化
optimize table tb_cps_merge_tree1 ;
-- 实现了数据的删除和已经存在数据的更新
SELECT *
FROM tb_cps_merge_tree1
CollapsingMergeTree虽然解决了主键相同的数据即时删除的问题,但是状态持续变化且多线程并⾏写⼊情况下,状态⾏与取消⾏位置可能乱序,导致⽆法正常折叠。只有保证⽼的状态⾏在在取消⾏的上⾯, 新的状态⾏在取消⾏的下⾯! 但是多线程⽆法保证写的顺序!
2、未正常删除
drop table if exists tb_cps_merge_tree2;
CREATE TABLE tb_cps_merge_tree2
(user_id UInt64,name String,age UInt8,sign Int8
)ENGINE = CollapsingMergeTree(sign)ORDER BY user_id;insert into tb_cps_merge_tree2 values(1,'xiaoluo_',23,-1),(2,'xiaoyu_',24,-1),(3,'xiaofeng2',25,1) ;
insert into tb_cps_merge_tree2 values(1,'xiaoluo',23,1),(2,'xiaoyu',24,1),(3,'xiaofeng',25,1) ;
-- 合并优化
optimize table tb_cps_merge_tree2 ;select * from tb_cps_merge_tree2;
3、数据更新
假如有相同的排序数据,并且状态都是1,可以实现数据的更新 ,
drop table if exists tb_cps_merge_tree2;
CREATE TABLE tb_cps_merge_tree2
(user_id UInt64,name String,age UInt8,sign Int8
)ENGINE = CollapsingMergeTree(sign)ORDER BY user_id;insert into tb_cps_merge_tree2 values(1,'xiaoluo_',23,1),(2,'xiaoyu_',24,1),(3,'xiaofeng2',25,1) ;
insert into tb_cps_merge_tree2 values(1,'xiaoluo',23,1),(2,'xiaoyu',24,1),(3,'xiaofeng',25,1) ;
-- 合并优化
optimize table tb_cps_merge_tree2 ;select * from tb_cps_merge_tree2;
4、利用查询实现强制删除效果
如果我们不能保证折叠的⾏在状态⾏的下⾯,数据⽆法保证可以正常删除
----查询正确的数据
select tb_cps_merge_tree2.*
from tb_cps_merge_tree2join(select user_id,sum(sign) as sum_signfrom tb_cps_merge_tree2group by user_idhaving sum_sign = 1) ton tb_cps_merge_tree2.user_id = t.user_id;