2023-04-11 monetdb-BAT及投影限制处理-分析
摘要:
monetdb-BAT及投影限制处理-分析
BAT说明
官方说明:
Design Overview | MonetDB Docs
@item @strong{BAT Alignment:}
Due to the mapping of multi-ary datamodels onto the BAT model, we
expect many correspondences among BATs, e.g.
@emph{bat(oid,attr1),.. bat(oid,attrN)} vertical
decompositions. Frequent activities will be to jump from one
attribute to the other (`bunhopping'). If the head columns are
equal lists in two BATs, merge or even array lookups can be used
instead of hash lookups. The alignment interface makes these
relations explicitly manageable.
数据结构定义:
typedef struct BAT {/* static bat properties */bat batCacheid; /* index into BBP */oid hseqbase; /* head seq base *//* dynamic bat properties */MT_Id creator_tid; /* which thread created it */boolbatCopiedtodisk:1, /* once written */batDirtyflushed:1, /* was dirty before commit started? (not used) */batDirtydesc:1, /* bat descriptor dirty marker (not used) */batTransient:1; /* should the BAT persist on disk? */uint8_t /* adjacent bit fields are packed together (if they fit) */batRestricted:2; /* access privileges */role_t batRole; /* role of the bat */uint16_t unused; /* value=0 for now (sneakily used by mat.c) */int batSharecnt; /* incoming view count *//* delta status administration */BUN batInserted; /* start of inserted elements */BUN batCount; /* tuple count */BUN batCapacity; /* tuple capacity *//* dynamic column properties */COLrec T; /* column info */MT_Lock theaplock; /* lock protecting heap reference changes */MT_RWLock thashlock; /* lock specifically for hash management */MT_Lock batIdxLock; /* lock to manipulate other indexes/properties */
} BAT;
typedef struct {str id; /* label for column */uint16_t width; /* byte-width of the atom array */int8_t type; /* type id. */uint8_t shift; /* log2 of bun width */bool varsized:1, /* varsized/void (true) or fixedsized (false) */key:1, /* no duplicate values present */nonil:1, /* there are no nils in the column */nil:1, /* there is a nil in the column */sorted:1, /* column is sorted in ascending order */revsorted:1; /* column is sorted in descending order */BUN nokey[2]; /* positions that prove key==FALSE */BUN nosorted; /* position that proves sorted==FALSE */BUN norevsorted; /* position that proves revsorted==FALSE */oid seq; /* start of dense sequence */Heap *heap; /* space for the column. */BUN baseoff; /* offset in heap->base (in whole items) */Heap *vheap; /* space for the varsized data. */Hash *hash; /* hash table */Imprints *imprints; /* column imprints index */Heap *orderidx; /* order oid index */PROPrec *props; /* list of dynamic properties stored in the bat descriptor */
} COLrec;
个人理解:
- 官方写的比较抽象, 二进制关系表,需要结合具体的例子在实际中进行理解
- 我个人理解上,可以将BAT当作附加了限制和属性的列
- 每个BAT,其实都可以理解为是特定列的一部分, 具体结合以下例子来理解
DML
表结构和数据
create table b(b1 int, b2 varchar(2), primary key(b1)) ;create table a(a1 int, a2 varchar(2), foreign key(a1) references b(b1)) ;insert into a values(1, 'a1');insert into a values(null, 'a2');insert into a values(3, 'a3');
查询SQL
select d2 from d where d1 > 2;
执行日志:
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker8 monetdb5/mal/mal_interpreter.c:644 runMALsequence calling sql.tid
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker24 gdk/gdk_bat.c:296 COLnew_intern -> tmp_773#0@0[str]TESRN2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker24 monetdb5/mal/mal_interpreter.c:644 runMALsequence calling sql.bind
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker8 gdk/gdk_bat.c:296 COLnew_intern -> tmp_743#0@0[void]TSRK
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker8 gdk/gdk_bat.c:328 BATdense 0,0,5-> tmp_743#5@0[void]TDSKN
2023-04-11 20:36:34 M_DEBUG BAT_ DFLOWworker24 gdk/gdk_bbp.c:2903 getBBPdescriptor set to loading BAT 207
2023-04-11 20:36:34 M_DEBUG IO_ DFLOWworker24 gdk/gdk_bbp.c:2911 getBBPdescriptor load tmp_317
2023-04-11 20:36:34 M_DEBUG IO_ DFLOWworker24 gdk/gdk_storage.c:631 DESCload DESCload: 03/317
2023-04-11 20:36:34 M_DEBUG IO_ DFLOWworker24 gdk/gdk_storage.c:530 GDKload GDKload: name=03/317.tail, ext=, mode 1
2023-04-11 20:36:34 M_DEBUG IO_ DFLOWworker24 gdk/gdk_storage.c:552 GDKload read(dst 0x7f9fa8000e60, n_expected 20, fd 9) = 20
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker24 gdk/gdk_align.c:154 VIEWcreate tmp_317#5@0[int]P -> tmp_760#5@0[int]V
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker24 gdk/gdk_batop.c:1903 BATslice b=tmp_317#5@0[int]P,lo=0,hi=5 -> tmp_760#5@0[int]V2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker24 monetdb5/mal/mal_interpreter.c:644 runMALsequence calling sql.bind
2023-04-11 20:36:34 M_DEBUG BAT_ DFLOWworker24 gdk/gdk_bbp.c:2903 getBBPdescriptor set to loading BAT 208
2023-04-11 20:36:34 M_DEBUG IO_ DFLOWworker24 gdk/gdk_bbp.c:2911 getBBPdescriptor load tmp_320
2023-04-11 20:36:34 M_DEBUG IO_ DFLOWworker24 gdk/gdk_storage.c:631 DESCload DESCload: 03/320
2023-04-11 20:36:34 M_DEBUG IO_ DFLOWworker24 gdk/gdk_storage.c:530 GDKload GDKload: name=03/320.tail1, ext=, mode 1
2023-04-11 20:36:34 M_DEBUG IO_ DFLOWworker24 gdk/gdk_storage.c:552 GDKload read(dst 0x7f9fa8001450, n_expected 5, fd 9) = 52023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker27 monetdb5/mal/mal_interpreter.c:672 runMALsequence calling algebra.thetaselect
2023-04-11 20:36:34 M_DEBUG IO_ DFLOWworker24 gdk/gdk_storage.c:530 GDKload GDKload: name=03/320, ext=theap, mode 1
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker27 gdk/gdk_batop.c:1996 BATordered Fixed norevsorted(1) for tmp_760#5@0[int]V!r
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker27 gdk/gdk_batop.c:1996 BATordered Fixed nosorted(2) for tmp_760#5@0[int]V!s!r (78 usec)
2023-04-11 20:36:34 M_DEBUG IO_ DFLOWworker24 gdk/gdk_storage.c:552 GDKload read(dst 0x7f9fa800ce50, n_expected 8267, fd 9) = 8267
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker27 gdk/gdk_bat.c:296 COLnew_intern -> tmp_717#0@0[oid]TSRN
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker24 gdk/gdk_align.c:154 VIEWcreate tmp_320#5@0[str]PESKH -> tmp_744#5@0[str]VESK
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker24 gdk/gdk_batop.c:1903 BATslice b=tmp_320#5@0[str]PESKH,lo=0,hi=5 -> tmp_744#5@0[str]VESK
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker27 gdk/gdk_select.c:55 virtualize tmp_717#1@0[oid]TDSRKN,seq=4
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker27 gdk/gdk_select.c:1955 BATselect b=tmp_760#5@0[int]V!s!r,s=tmp_743#5@0[void]TDSKN,anti=false -> tmp_717#1@0[void]TDSRKN select: densescan v >= vl (canditer_next_dense) (465 usec)2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker31 monetdb5/mal/mal_interpreter.c:672 runMALsequence calling algebra.projection
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker31 gdk/gdk_align.c:154 VIEWcreate tmp_744#5@0[str]VESK -> tmp_743#5@4[str]VESK
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker31 gdk/gdk_batop.c:1903 BATslice b=tmp_744#5@0[str]VESK,lo=4,hi=5 -> tmp_743#1@4[str]VESRK
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker31 gdk/gdk_project.c:818 BATproject2 l=tmp_717#1@0[void]TDSRKN r1=tmp_744#5@0[str]VESK r2=00 -> tmp_743#1@0[str]VESRK sharing string heap (slice) 69us2023-04-11 20:36:34 M_DEBUG ALGO client1 monetdb5/mal/mal_interpreter.c:644 runMALsequence calling sql.resultSet
TRACE分析
+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| usec | statement |
+======+======================================================================================================================================================================+
| 58 | X_0=0@0:void := querylog.define("trace select * from a where a1>=1;":str, "default_pipe":str, 20:int); |
| 268 | X_1=[2]:bat[:int] := bat.pack(0:int, 0:int); |
| 304 | X_2=0:int := sql.mvc(); |
| 154 | C_3=[1]:bat[:oid] := sql.tid(X_2=0:int, "sys":str, "a":str); |
| 767 | X_4=[2]:bat[:int] := bat.pack(32:int, 2:int); |
| 751 | X_5=[2]:bat[:str] := bat.pack("a1":str, "a2":str); |
| 821 | X_6=[2]:bat[:str] := bat.pack("sys.a":str, "sys.a":str); |
| 730 | X_7=[1]:bat[:int] := sql.bind(X_2=0:int, "sys":str, "a":str, "a1":str, 0:int); |
| 372 | X_8=[1]:bat[:str] := sql.bind(X_2=0:int, "sys":str, "a":str, "a2":str, 0:int); |
| 276 | C_9=[1]:bat[:oid] := algebra.thetaselect(X_7=[1]:bat[:int], C_3=[1]:bat[:oid], 1:int, ">=":str); # select: sorted |
| 2190 | X_10=[2]:bat[:str] := bat.pack("int":str, "varchar":str); |
| 270 | X_11=[1]:bat[:int] := algebra.projection(C_9=[1]:bat[:oid], X_7=[1]:bat[:int]); |
| 259 | X_12=[1]:bat[:str] := algebra.projection(C_9=[1]:bat[:oid], X_8=[1]:bat[:str]); |
| 30 | X_13=0@0:void := language.pass(X_7=[1]:bat[:int]); |
| 23 | X_14=0@0:void := language.pass(C_9=[1]:bat[:oid]); |
| 3426 | barrier X_15=false:bit := language.dataflow(); |
| 191 | X_16=4:int := sql.resultSet(X_6=[2]:bat[:str], X_5=[2]:bat[:str], X_10=[2]:bat[:str], X_4=[2]:bat[:int], X_1=[2]:bat[:int], X_11=[1]:bat[:int], X_12=[1]:bat[:str]); |
+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
执行分析:
一. 预处理, 列数据和列属性绑定
相关函数
-
sql.tid
-
sql.bind
绑定表a的属性a1
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker24 monetdb5/mal/mal_interpreter.c:644 runMALsequence calling sql.bind
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker8 gdk/gdk_bat.c:296 COLnew_intern -> tmp_743#0@0[void]TSRK
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker8 gdk/gdk_bat.c:328 BATdense 0,0,5-> tmp_743#5@0[void]TDSKN
2023-04-11 20:36:34 M_DEBUG BAT_ DFLOWworker24 gdk/gdk_bbp.c:2903 getBBPdescriptor set to loading BAT 207
2023-04-11 20:36:34 M_DEBUG IO_ DFLOWworker24 gdk/gdk_bbp.c:2911 getBBPdescriptor load tmp_317
2023-04-11 20:36:34 M_DEBUG IO_ DFLOWworker24 gdk/gdk_storage.c:631 DESCload DESCload: 03/317
2023-04-11 20:36:34 M_DEBUG IO_ DFLOWworker24 gdk/gdk_storage.c:530 GDKload GDKload: name=03/317.tail, ext=, mode 1
2023-04-11 20:36:34 M_DEBUG IO_ DFLOWworker24 gdk/gdk_storage.c:552 GDKload read(dst 0x7f9fa8000e60, n_expected 20, fd 9) = 20
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker24 gdk/gdk_align.c:154 VIEWcreate tmp_317#5@0[int]P -> tmp_760#5@0[int]V
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker24 gdk/gdk_batop.c:1903 BATslice b=tmp_317#5@0[int]P,lo=0,hi=5 -> tmp_760#5@0[int]V
- a1的属性类型为int, 定长, 直接存储在heap的tail文件里
- 列的文件为 03/317.tail
绑定表a的属性a2
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker24 monetdb5/mal/mal_interpreter.c:644 runMALsequence calling sql.bind
2023-04-11 20:36:34 M_DEBUG BAT_ DFLOWworker24 gdk/gdk_bbp.c:2903 getBBPdescriptor set to loading BAT 208
2023-04-11 20:36:34 M_DEBUG IO_ DFLOWworker24 gdk/gdk_bbp.c:2911 getBBPdescriptor load tmp_320
2023-04-11 20:36:34 M_DEBUG IO_ DFLOWworker24 gdk/gdk_storage.c:631 DESCload DESCload: 03/320
2023-04-11 20:36:34 M_DEBUG IO_ DFLOWworker24 gdk/gdk_storage.c:530 GDKload GDKload: name=03/320.tail1, ext=, mode 1
2023-04-11 20:36:34 M_DEBUG IO_ DFLOWworker24 gdk/gdk_storage.c:552 GDKload read(dst 0x7f9fa8001450, n_expected 5, fd 9) = 5
- a2的属性类型为varchar, 存储为变长, heap的tail文件存放指向数据的地址
- 列的文件为 03/320.tail1
二. 执行查询限制处理
查询执行过程
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker27 monetdb5/mal/mal_interpreter.c:672 runMALsequence calling algebra.thetaselect
2023-04-11 20:36:34 M_DEBUG IO_ DFLOWworker24 gdk/gdk_storage.c:530 GDKload GDKload: name=03/320, ext=theap, mode 1
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker27 gdk/gdk_batop.c:1996 BATordered Fixed norevsorted(1) for tmp_760#5@0[int]V!r
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker27 gdk/gdk_batop.c:1996 BATordered Fixed nosorted(2) for tmp_760#5@0[int]V!s!r (78 usec)
2023-04-11 20:36:34 M_DEBUG IO_ DFLOWworker24 gdk/gdk_storage.c:552 GDKload read(dst 0x7f9fa800ce50, n_expected 8267, fd 9) = 8267
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker27 gdk/gdk_bat.c:296 COLnew_intern -> tmp_717#0@0[oid]TSRN
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker24 gdk/gdk_align.c:154 VIEWcreate tmp_320#5@0[str]PESKH -> tmp_744#5@0[str]VESK
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker24 gdk/gdk_batop.c:1903 BATslice b=tmp_320#5@0[str]PESKH,lo=0,hi=5 -> tmp_744#5@0[str]VESK
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker27 gdk/gdk_select.c:55 virtualize tmp_717#1@0[oid]TDSRKN,seq=4
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker27 gdk/gdk_select.c:1955 BATselect b=tmp_760#5@0[int]V!s!r,s=tmp_743#5@0[void]TDSKN,anti=false -> tmp_717#1@0[void]TDSRKN select: densescan v >= vl (canditer_next_dense) (465 usec)
查询执行分析
对列属性a1创建view并且使用low和high进行slice进行向量化访问
VIEWcreate tmp_320#5@0[str]PESKH -> tmp_744#5@0[str]VESK
BATslice b=tmp_320#5@0[str]PESKH,lo=0,hi=5 -> tmp_744#5@0[str]VESK
查询访问后将结果生成新的BAT
COLnew_intern -> tmp_717#0@0[oid]TSRN
virtualize tmp_717#1@0[oid]TDSRKN,seq=4
BATselect b=tmp_760#5@0[int]V!s!r,s=tmp_743#5@0[void]TDSKN,anti=false -> tmp_717#1@0[void]TDSRKN select: densescan v >= vl (canditer_next_dense) (465 usec)
- 注意通过low和high来划定列的区间, 一次处理一个区间
- 列的数据在内存中的组织形式为heap
- 列的数据加载到内存, 有读文件后malloc内存, 以及mmap内存映射不同的方式
三. 投影, 计算返回的属性值
投影处理过程
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker31 monetdb5/mal/mal_interpreter.c:672 runMALsequence calling algebra.projection
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker31 gdk/gdk_align.c:154 VIEWcreate tmp_744#5@0[str]VESK -> tmp_743#5@4[str]VESK
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker31 gdk/gdk_batop.c:1903 BATslice b=tmp_744#5@0[str]VESK,lo=4,hi=5 -> tmp_743#1@4[str]VESRK
2023-04-11 20:36:34 M_DEBUG ALGO DFLOWworker31 gdk/gdk_project.c:818 BATproject2 l=tmp_717#1@0[void]TDSRKN r1=tmp_744#5@0[str]VESK r2=00 -> tmp_743#1@0[str]VESRK sharing string heap (slice) 69us
投影处理分析
- 经过查询处理后的结果tmp_717#1@0[void]TDSRKN,包含了选定行的迭代器
- 注意此处限制与投影处理时, 对于投影属性的处理 tmp_744#5@0[str]VESK