> 文章列表 > Greenplum数据库执行器——PartitionSelector执行节点

Greenplum数据库执行器——PartitionSelector执行节点

Greenplum数据库执行器——PartitionSelector执行节点

为了能够对分区表有优异的处理能力,对于查询优化系统来说一个最基本的能力就是做分区裁剪partition pruning,将query中并不涉及的分区提前排除掉。如下执行计划所示,由于单表谓词在parititon key上,在优化期间即可确定哪些可以分区可以避免访问,即静态pruning。

explain select * from test where collecttime > '2023-02-22 00:00:00' and collecttime < '2023-02-28 00:00:00';QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Gather Motion 12:1 (slice1; segments: 12) (cost=0.00..431.18 rows=39 width=1626)-> Sequence (cost=0.00..431.02 rows=4 width=1626)-> Partition Selector for test (dynamic scan id: 1) (cost=10.00...100.00 rows=9 width=4)Partition selected: 6 (out of 33)-> Dynamic Seq Scan on test (dynamic scan id: 1) (cost=0.00...431.02 rows=4 width=1626)Filter: ((collecttime > '2023-02-22 00:00:00'::timestamp without time zone) AND (collecttime < '2023-02-28 00:00:00'::timestamp without time zone))   
Optimizer: Pivotal Optimizer (GPORCA)  explain select * from test where collecttime > '2023-02-22 00:00:00' and collecttime < '2023-02-28 00:00:00'; QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Gather Motion 12:1 (slice1; segments: 12) (cost=0.00..431.18 rows=6 width=3748)-> Append (cost=0.00...0.00 rows=1 width=3748)-> Seq Scan on test_1_prt_p20230222_1 (cost=0.00..0.00 rows=1 width=3748)Filter: ((collecttime > '2023-02-22 00:00:00'::timestamp without time zone) AND (collecttime < '2023-02-28 00:00:00'::timestamp without time zone))...-> Seq Scan on test_1_prt_p20230227_1 (cost=0.00..0.00 rows=1 width=3748)Filter: ((collecttime > '2023-02-22 00:00:00'::timestamp without time zone) AND (collecttime < '2023-02-28 00:00:00'::timestamp without time zone))  
Optimizer:  Postgres query optimizer                                                           

如下语句date_id的值将需要根据date_dim表的动态输出决定,因此对orders表的partition pruning只能在执行期完成,类似的例子还有动态绑定变量,称为动态pruning。

SELECT avg(amount) FROM ordersWHERE date_id IN (SELECT date_id FROM date_dim WHERE year = 2013 AND month BETWEEN 10 AND 12);

引入了3个新的算子来做pruning(PartitionSelector、DynamicScan、Sequence):Sequence用于描述PartitionSelector->DynamicScan的生产消费关系,确定谁先执行;PartitionSelector根据谓词生成相关partition ids,主要实现了做partition选择的功能,并将筛选后的ids传递给DynamicScan;DynamicSeqScan物理scan算子,基于PartitionSelector传递的ids,在做table scan时跳过不需要的partition。上一节已经描述过DynamicScan的执行流程了,该节主要介绍Partition Selector。

Partition Selector

Partition Selector用于计算和传播(Compute and propagate)使用Dynamic table scan的分区表oid。执行PartitionSelector有两种途径:Constant partition elimination和Join partition elimination,如下图:
Greenplum数据库执行器——PartitionSelector执行节点
(a) 表示做full table scan,这时是没有filter的,PartitionSelector直接生成全量partition T1 -> T100给DynamicScan。
(b/c) 表示在partition key(PK)上有单表条件(PK=5/PK<30),这时可以只选出目标分区给DynamicScan。
(d) 表示了两表 R join T on R.A = T.PK,由于T表是partition table,其scan算子是DynamicScan。这里有所不同的是,对T的过滤需要基于R.A列的值,因此PartitionSelector要加在R的scan上方,用来获取R.A列的value用于过滤分区,并传递给右侧T的DynamicScan。另外可以看到这里不再有Sequence算子了,由于这里Join算子已经保障了R和T执行的先后顺序(先R后T),也就保证了PartitionSelector->DynamicScan的顺序,Sequence不再必要。

Constant partition elimination

PartitionSelector评估常量分区约束(constant partition constraints)来计算和传播(Compute and propagate)分区表oid。It only need to be called once.

 		1. Constant partition eliminationPlan structure:Sequence  # 是一个同步的概念,用于描述PartitionSelector->DynamicScan的生产消费关系,确定谁先执行|--PartitionSelector # PartitionSelector根据谓词生成相关partition ids,主要实现了做partition选择的功能,并将筛选后的ids传递给DynamicScan|--DynamicSeqScan # 物理scan算子,基于PartitionSelector传递的ids,在做table scan时跳过不需要的partition

Join partition elimination

PartitionSelector与DynamicSeqScan、DynamicIndexScan或DynamicBitmapHeapScan位于同一切片slice中。它是为来自其子节点的每个元组执行的。它使用输入元组评估分区约束,并传播匹配的分区表Oids。

 		2. Join partition eliminationPlan structure:...:|--DynamicSeqScan|--...|--PartitionSelector|--...

可以有其他节点使用PartSelected qual来过滤行,而不是动态表扫描,根据选择的分区来过滤行。目前,ORCA使用动态表扫描,而非ORCA计划器生成的计划可以使用具有PartSelected quals的门控Result节点来排除不需要的分区。Instead of a Dynamic Table Scan, there can be other nodes that use a PartSelected qual to filter rows, based on which partitions are selected. Currently, ORCA uses Dynamic Table Scans, while plans produced by the non-ORCA planner use gating Result nodes with PartSelected quals, to exclude unwanted partitions.

src/backend/gpopt/translate/CTranslatorDXLToPlStmt.cpp文件中的Plan *CTranslatorDXLToPlStmt::TranslateDXLPartSelector(const CDXLNode *partition_selector_dxlnode, CDXLTranslateContext *output_context, CDXLTranslationContextArray *ctxt_translation_prev_siblings)函数将DXL PartitionSelector转化为GPDB PartitionSelector节点。
src/backend/optimizer/plan/planpartition.c文件中的Plan *create_partition_selector_plan(PlannerInfo *root, PartitionSelectorPath *best_path)函数从PartitionSelectorPath结构体创建一个PartitionSelector plan。

ExecInitPartitionSelector

首先介绍PartitionSelector node,其通过给定的root表OID找到叶子子分区表的集合和可选选择谓词。它隐藏了分区选择和传播的逻辑,而不是用它来污染计划,使计划看起来一致且易于理解。很容易找到分区选择在计划中发生的位置。(It hides the logic of partition selection and propagation instead of polluting the plan with it to make a plan look consistent and easy to understand. It will be easy to locate where partition selection happens in a plan.)
PartitionSelection可以以如下三种方式运行,A PartitionSelection can work in three different ways:

  1. Dynamic selection, based on tuples that pass through it.
  2. Dynamic selection, with a projected tuple.
  3. Static selection, performed at beginning of execution.

PartitionSelector结构体除了包含DynamicScan算子唯一编号和PartitionSelector算子唯一编号、root表oid和该分区表的层级外。还包含了如上三种运行方式的变量。比如对于Dynamic selection, based on tuples that pass through it来说,levelExpressions就是保存了每个分区层级使用的谓词表达式、levelEqExpressions就是保存了每个分区层级使用的等值表达式、最后应用的剩余谓词residualPredicate、传播表达式propagationExpression。对于Dynamic selection, with a projected tuple,partTabTargetlist是用于动态选择的字段,通过将输入元组投影到具有与分区表中相同位置的分区键值的元组。对于Static selection, performed at beginning of execution,staticPartOids是静态选择的分区子表oid和staticScanIds用于传播静态选择的分区子表oid的扫描id。

typedef struct PartitionSelector{Plan		plan;Oid 		relid;  				/* OID of target relation */int 		nLevels;				/* number of partition levels */int32 		scanId; 				/* id of the corresponding dynamic scan */ // 每个DynamicScan算子有其唯一编号int32		selectorId;				/* id of this partition selector */  // 每个PartitionSelector算子有其唯一编号/* Fields for dynamic selection */List		*levelEqExpressions;	/* list of list of equality expressions used for individual levels */List		*levelExpressions;  	/* predicates used for individual levels */Node		*residualPredicate; 	/* residual predicate (to be applied at the end) */Node		*propagationExpression; /* propagation expression */Node		*printablePredicate;	/* printable predicate (for explain purposes) *//* Fields for dynamic selection, by projecting input tuples to a tuple that has the partitioning key values in the same positions as in the partitioned table. */List	    *partTabTargetlist;/* Fields for static selection */bool		staticSelection;    	/* static selection performed? */List		*staticPartOids;    	/* list of statically selected parts */List		*staticScanIds;     	/* scan ids used to propagate statically selected part oids */
} PartitionSelector;

ExecInitPartitionSelector函数为Orca生成的PartitionSelector节点创建运行时状态信息,并初始化外部子节点(如果存在)。Create the run-time state information for PartitionSelector node produced by Orca and initializes outer child if exists. 其主要执行的步骤如下:

  1. 调用initPartitonSelection创建PartitionSelectorState,其levelPartRules中的每个元素都是每层子分区的PartitionRule信息(我们知道分区表除了root表之外的inheritor和leaf分区在pg_partition_rule中都有一个条目,其信息就是PartitionRule);为levelEqExpressions、levelExpressions、residualPredicate、propagationExpression初始化表达式执行ExprState列表。
  2. 对于ParitionSelector来说,其没有Inner plan也就是没有右子树,这里仅仅需要调用ExecInitNode初始化psstate->lefttree即可,也就是Join partition elimination这种情况。
  3. 如果PartitionSeletor.partTabTargetlist不为NIL,partTabTargetlist是用于动态选择的字段,通过将输入元组投影到具有与分区表中相同位置的分区键值的元组。初始化投影,以生成一个元组,该元组的分区键列位于与分区表中相同的位置。
PartitionSelectorState *ExecInitPartitionSelector(PartitionSelector *node, EState *estate, int eflags){PartitionSelectorState *psstate = initPartitionSelection(node, estate);	ExecInitResultTupleSlot(estate, &psstate->ps); /* tuple table initialization */ExecAssignResultTypeFromTL(&psstate->ps);ExecAssignProjectionInfo(&psstate->ps, NULL);/* initialize child nodes */if (NULL != outerPlan(node)){ outerPlanState(psstate) = ExecInitNode(outerPlan(node), estate, eflags); }/* Initialize projection, to produce a tuple that has the partitioning key columns at the same positions as in the partitioned table. */if (node->partTabTargetlist){ // partition key target listList	   *exprStates = (List *) ExecInitExpr((Expr *) node->partTabTargetlist, (PlanState *) psstate);psstate->partTabDesc = ExecTypeFromTL(node->partTabTargetlist, false);psstate->partTabSlot = MakeSingleTupleTableSlot(psstate->partTabDesc);psstate->partTabProj = ExecBuildProjectionInfo(exprStates, psstate->ps.ps_ExprContext, psstate->partTabSlot,  ExecGetResultType(&psstate->ps));}return psstate;
}

Greenplum数据库执行器——PartitionSelector执行节点

ExecPartitionSelector

从root表获取PartitionNode和access method。首先看一下DynamicTableScanInfo的List *partsMetadata,其存放了所有相关分区表的分区元数据。getPartitionNodeAndAccessMethod从root表获取PartitionNode和accessMethod,其实就是调用findPartitionMetadataEntry函数,该函数从partsMetadata列表中查找到相应partition oid的PartitionMetadata对象(输入参数为partsMetadata: list of PartitionMetadata、partOid: Part Oid;输出参数为partsAndRules: output parameter for matched PartitionNode;accessMethods: output parameter for PartitionAccessMethods)。PartitionMetadata结构体其实就包含了PartitionNode *partsAndRulesPartitionAccessMethods *accessMethods两个成员

if (NULL == node->rootPartitionNode){getPartitionNodeAndAccessMethod(ps->relid,estate->dynamicTableScanInfo->partsMetadata,estate->es_query_cxt,&node->rootPartitionNode,&node->accessMethods);
}
/* ----------------------------------------------------------------*		getPartitionNodeAndAccessMethod  Retrieve PartitionNode and access method from root table* ---------------------------------------------------------------- */
void getPartitionNodeAndAccessMethod(Oid rootOid, List *partsMetadata, MemoryContext memoryContext, PartitionNode **partsAndRules, PartitionAccessMethods **accessMethods){findPartitionMetadataEntry(partsMetadata, rootOid, partsAndRules, accessMethods);(*accessMethods)->part_cxt = memoryContext;
}
void findPartitionMetadataEntry(List *partsMetadata, Oid partOid, PartitionNode **partsAndRules, PartitionAccessMethods **accessMethods){ListCell   *lc = NULL;foreach(lc, partsMetadata){PartitionMetadata *metadata = (PartitionMetadata *) lfirst(lc);*partsAndRules = findPartitionNodeEntry(metadata->partsAndRules, partOid);if (NULL != *partsAndRules){/* accessMethods define the lookup access methods for partitions, one for each level */*accessMethods = metadata->accessMethods;return;}}
}

描述findPartitionNodeEntry函数之前,需要了解一下pg_partition和pg_partition_rule系统表,每个分区父表(root table和intermediate table)在其中都有一个条目,且该条目有标识自己的oid,也就是Partition结构体中的partid;而parrelid字段就是该表的oid;parkind指明分区类型list或range;level是分区树的层级。PartitionRule结构体则代表pg_partition_rule系统表中的条目,除了root table,intermediate table和leaf table都应该有对应的条目,同样每个条目都有标识自己的oid(parruleid),paroid则代表该分区表的oid,通过parparentrule可以关联父表的rule条目,从而组成自下而上的父子链。

struct PartitionNode{NodeTag type;Partition *part;	List *rules; /* rules for this level */ struct PartitionRule *default_part;
};
typedef struct Partition{NodeTag type;Oid partid;			/* OID of row in pg_partition. */Oid parrelid;		/* OID in pg_class of top-level partitioned relation */char parkind;		/* 'r', 'l', or (unsupported) 'h' */int16 parlevel;		/* depth below parent partitioned table */bool paristemplate;	/* just a template, or really a part? */int16 parnatts;		/* number of partitioning attributes */AttrNumber *paratts;/* attribute number vector */ Oid *parclass;		/* operator class vector */
} Partition;
typedef struct PartitionRule{NodeTag		 type;Oid			 parruleid; Oid			 paroid; Oid			 parchildrelid; Oid			 parparentrule;bool		 parisdefault; char		*parname; Node		*parrangestart; bool		 parrangestartincl;Node		*parrangeend; bool		 parrangeendincl; Node		*parrangeevery; List		*parlistvalues;int16		 parruleord; List		*parreloptions; Oid			 partemplatespaceId; 	/* the tablespace id for the template (or InvalidOid for  non-template rules */struct PartitionNode *children; /* sub partition */
} PartitionRule;

findPartitionNodeEntry函数用于通过给定分区表oid查找到PartitonNode,如果partitionNode代表的是给定partOid,则返回该partitonNode;如果不是,有可能partOid是PartitionNode的子分区,通过递归子分区,获取到以partOid对应的PartitionNode并返回。

static PartitionNode *findPartitionNodeEntry(PartitionNode *partitionNode, Oid partOid){if (NULL == partitionNode){ return NULL; }if (partitionNode->part->parrelid == partOid){ return partitionNode; }	ListCell   *lcChild = NULL; /* check recursively in child parts in case we have the oid of an intermediate node */foreach(lcChild, partitionNode->rules){PartitionRule *childRule = (PartitionRule *) lfirst(lcChild);PartitionNode *childNode = findPartitionNodeEntry(childRule->children, partOid);if (NULL != childNode){ return childNode; }}	if (NULL != partitionNode->default_part){  /* check recursively in the default part, if any */childNode = findPartitionNodeEntry(partitionNode->default_part->children, partOid);}return childNode;
}

Static selection

如果PartitionSelector.staticSelection为true,需要执行Static selection, performed at beginning of execution。staticPartOids是静态选择的分区子表oid和staticScanIds用于传播静态选择的分区子表oid的扫描id。

TupleTableSlot *ExecPartitionSelector(PartitionSelectorState *node){PartitionSelector *ps = (PartitionSelector *) node->ps.plan; EState	   *estate = node->ps.state;if (ps->staticSelection){/* propagate the part oids obtained via static partition selection */partition_propagation(estate, ps->staticPartOids, ps->staticScanIds, ps->selectorId);return NULL;}

首先学习一下InsertPidIntoDynamicTableScanInfo函数用于将分区oid插入dynamicTableScanInfo的pidIndexes[scanId],如果partOid分区oid为非法oid,该函数将不会插入该oid,但是会确保dynamicTableScanInfo的pidIndexes[scanId]的HTAB(该HTAB的key为partition table oid,entry为partition table oid和selectorId列表的组合)存在。从DynamicSeqScan的逻辑可知,需要扫描的分区oid就存在于dynamicTableScanInfo的pidIndexes[scanId]中。partition_propagation函数就是将该selector为不同DynamicScan裁剪的分区oid放入dynamicTableScanInfo->pidIndexes缓冲中,供DynamicScan扫描。

static void partition_propagation(EState *estate, List *partOids, List *scanIds, int32 selectorId){ListCell *lcOid = NULL; ListCell *lcScanId = NULL;forboth (lcOid, partOids, lcScanId, scanIds) {Oid partOid = lfirst_oid(lcOid); int scanId = lfirst_int(lcScanId);InsertPidIntoDynamicTableScanInfo(estate, scanId, partOid, selectorId);}
}
void InsertPidIntoDynamicTableScanInfo(EState *estate, int32 index, Oid partOid, int32 selectorId){DynamicTableScanInfo *dynamicTableScanInfo = estate->dynamicTableScanInfo;MemoryContext oldCxt = MemoryContextSwitchTo(estate->es_query_cxt);if (index > dynamicTableScanInfo->numScans){ increaseScanArraySize(estate, index); } // 对array扩容if ((dynamicTableScanInfo->pidIndexes)[index - 1] == NULL){ dynamicTableScanInfo->pidIndexes[index - 1] = createPidIndex(estate, index); }	// 为该index创建HTABif (partOid != InvalidOid){bool found = false;PartOidEntry *hashEntry = hash_search(dynamicTableScanInfo->pidIndexes[index - 1], &partOid, HASH_ENTER, &found);if (found){hashEntry->selectorList = list_append_unique_int(hashEntry->selectorList, selectorId);}else{hashEntry->partOid = partOid; hashEntry->selectorList = list_make1_int(selectorId);}}MemoryContextSwitchTo(oldCxt);
}

Dynamic selection

如果NULL != outerPlanState(node),则需要处理Join partition elimination。调用ExecProcNode(outerPlan)执行partitionselector的左子树的执行计划。如果inputSlot为null,其实就是说明outerPlan没有匹配的tuple了,也就是说明不需要继续判定哪些分区需要扫描了。

		PlanState *outerPlan = outerPlanState(node);TupleTableSlot *inputSlot = ExecProcNode(outerPlan);if (TupIsNull(inputSlot)){ /* no more tuples from outerPlan *//* Make sure we have an entry for this scan id in dynamicTableScanInfo. Normally, this would've been done the first time a partition is selected, but we must ensure that there is an entry even if no partitions were selected. (The traditional Postgres planner uses this method.) */if (ps->partTabTargetlist) InsertPidIntoDynamicTableScanInfo(estate, ps->scanId, InvalidOid, ps->selectorId);else LogPartitionSelection(estate, ps->selectorId);return NULL;}

如果有partitioning投影Dynamic selection, with a projected tuple),将输入tuple投影成看上去来自partitioned table的tuple,然后使用selectPartitionMulti函数选择分区。(Postgres planner使用这种方式)

		ListCell   *lc;TupleTableSlot *slot = ExecProject(node->partTabProj, NULL); slot_getallattrs(slot);List	   *oids = selectPartitionMulti(node->rootPartitionNode,slot_get_values(slot),slot_get_isnull(slot),slot->tts_tupleDescriptor,node->accessMethods);foreach (lc, oids){InsertPidIntoDynamicTableScanInfo(estate, ps->scanId, lfirst_oid(lc), ps->selectorId);}list_free(oids);

selectPartitionMulti函数通过提供的partition node、tuple values nullness和partition state,找到匹配的叶子分区。该函数和selectPartition相似,但是对于nulls的处理不同。该函数的处理方式是,如果有partitioning key的值为null,则所有子分区都被认为是匹配的;如果没有partitioning key的值为null,调用selectPartition1获取对应partitioning key值下的子分区。The input values/isnull should match the layout of tuples in the partitioned table.

List *selectPartitionMulti(PartitionNode *partnode, Datum *values, bool *isnull, TupleDesc tupdesc, PartitionAccessMethods *accessMethods){List	   *leafPartitionOids = NIL;List	   *inputList = list_make1(partnode);while (list_length(inputList) > 0){ // 按层处理,这里的InputList是每层分区表的PartitionNode列表List	   *levelOutput = NIL;				ListCell   *lc = NULL; foreach(lc, inputList){ // 遍历所有PartitionNodePartitionNode *candidatePartNode = (PartitionNode *) lfirst(lc);bool		foundNull = false;for (int i = 0; i < candidatePartNode->part->parnatts; i++){ // 遍历该partition node的所有分区键AttrNumber	attno = candidatePartNode->part->paratts[i];				if (isnull[attno - 1]){ /* If corresponding value is null, then we should pick all of its children (or itself if it is a leaf partition) */foundNull = true;if (IsLeafPartitionNode(candidatePartNode)){ /* Extract out Oids of all children */leafPartitionOids = list_concat(leafPartitionOids, all_partition_relids(candidatePartNode));}else {levelOutput = list_concat(levelOutput, PartitionChildren(candidatePartNode));}}}/* If null was not found on the attribute, and if this is a leaf partition, then there will be an exact match. If it is not a leaf partition, then we have to find the right child to investigate. */if (!foundNull) {if (IsLeafPartitionNode(candidatePartNode)){Oid			matchOid = selectPartition1(candidatePartNode, values, isnull, tupdesc, accessMethods, NULL, NULL);if (matchOid != InvalidOid){ leafPartitionOids = lappend_oid(leafPartitionOids, matchOid); }}else{PartitionNode *childPartitionNode = NULL;selectPartition1(candidatePartNode, values, isnull, tupdesc, accessMethods, NULL, &childPartitionNode);if (childPartitionNode){ levelOutput = lappend(levelOutput, childPartitionNode); }}}}list_free(inputList); inputList = levelOutput; /* Start new level */}return leafPartitionOids;
}

如果没有partitioning投影,则利用levelEqExpressions和levelExpressions来选择分区。(ORCA使用这种方式)

typedef struct SelectedParts{ /*  SelectedParts node: This is the result of partition selection. It has a list of leaf part oids and the corresponding ScanIds to which they should be propagated */List *partOids;List *scanIds;
} SelectedParts;SelectedParts *selparts = processLevel(node, 0 /* level */, inputSlot);		if (NULL != ps->propagationExpression){ /* partition propagation */partition_propagation(estate, selparts->partOids, selparts->scanIds, ps->selectorId);}list_free(selparts->partOids); list_free(selparts->scanIds); pfree(selparts);

processLevel函数返回符合predicates处于指定分区层级的分区表oid。如果处理intermediate层级,存储符合要求的分区oid,继续下一个层级分区;如果处于叶子层级,获取满足要求的分区oid。其执行流程如下所示:

  1. 首先获取指定层级的谓词表达式equalityPredicate和generalPredicate,定位到root PartitionNode,如果level为0,则parentNode为node->rootPartitionNode;否则为node->levelPartRules[level - 1]->children。
  2. 调用partition_rules_for_equality_predicate和partition_rules_for_general_predicate获取匹配equalityPredicate和generalPredicate谓词的PartitionRule列表。如果没有上述谓词,则需要把下一个层级的PartitionRule全部返回。
  3. 遍历satisfied PartitionRules,如果当前层级为叶子层级,调用eval_part_qual函数处理residual predicate,调用eval_propagation_expression函数propagationExpression获取scanId,将partoid和scanId设置到selparts;否则需要递归调用processLevel(node, level + 1, inputTuple)到next level’s partition elimination
SelectedParts *processLevel(PartitionSelectorState *node, int level, TupleTableSlot *inputTuple){SelectedParts *selparts = makeNode(SelectedParts); selparts->partOids = NIL; selparts->scanIds = NIL;PartitionSelector *ps = (PartitionSelector *) node->ps.plan;/* get equality and general predicate for the current level */ // 获取指定层级的谓词表达式List	   *equalityPredicate = (List *) lfirst(list_nth_cell(ps->levelEqExpressions, level));Expr	   *generalPredicate = (Expr *) lfirst(list_nth_cell(ps->levelExpressions, level));/* get parent PartitionNode if in level 0, it's the root PartitionNode */PartitionNode *parentNode = node->rootPartitionNode;if (0 != level){parentNode = node->levelPartRules[level - 1]->children;}/* list of PartitionRule that satisfied the predicates */List	   *satisfiedRules = NIL;	if (NULL != equalityPredicate){ /* If equalityPredicate exists */List *chosenRules = partition_rules_for_equality_predicate(node, level, inputTuple, parentNode);satisfiedRules = list_concat(satisfiedRules, chosenRules);}else if (NULL != generalPredicate) /* If generalPredicate exists */{List	   *chosenRules = partition_rules_for_general_predicate(node, level, inputTuple, parentNode);satisfiedRules = list_concat(satisfiedRules, chosenRules);}else { /* None of the predicate exists *//** Neither equality predicate nor general predicate exists. Return all the next level PartitionRule.* WARNING: Do NOT use list_concat with satisfiedRules and parentNode->rules. list_concat will destructively modify satisfiedRules to point to parentNode->rules, which will then be freed when we free satisfiedRules. This does not apply when we execute partition_rules_for_general_predicate as it creates its own list. */ListCell   *lc = NULL; foreach(lc, parentNode->rules){PartitionRule *rule = (PartitionRule *) lfirst(lc); satisfiedRules = lappend(satisfiedRules, rule);}if (NULL != parentNode->default_part){satisfiedRules = lappend(satisfiedRules, parentNode->default_part);}}/* Based on the satisfied PartitionRules, go to next level or propagate PartOids if we are in the leaf level */ListCell   *lc = NULL;foreach(lc, satisfiedRules){PartitionRule *rule = (PartitionRule *) lfirst(lc);node->levelPartRules[level] = rule;		if (level == ps->nLevels - 1){ /* If we already in the leaf level */bool		shouldPropagate = true;			if (NULL != ps->residualPredicate){ /* if residual predicate exists */				shouldPropagate = eval_part_qual(node->ps.ps_ExprContext, inputTuple, node->residualPredicateExprStateList); /* evaluate residualPredicate */}if (shouldPropagate){if (NULL != ps->propagationExpression){if (!list_member_oid(selparts->partOids, rule->parchildrelid)){selparts->partOids = lappend_oid(selparts->partOids, rule->parchildrelid);int			scanId = eval_propagation_expression(node, rule->parchildrelid);selparts->scanIds = lappend_int(selparts->scanIds, scanId);}}}}else{ /* Recursively call this function for next level's partition elimination */SelectedParts *selpartsChild = processLevel(node, level + 1, inputTuple);selparts->partOids = list_concat(selparts->partOids, selpartsChild->partOids);selparts->scanIds = list_concat(selparts->scanIds, selpartsChild->scanIds);pfree(selpartsChild);}}list_free(satisfiedRules);	node->levelPartRules[level] = NULL; /* After finish iteration, reset this level's PartitionRule */return selparts;
}

Greenplum数据库执行器——PartitionSelector执行节点

https://zhuanlan.zhihu.com/p/367068030