PostgreSQL под капотом. Часть 5. Простой SELECT запрос19.05.2023 12:16

Приветствую!

Сегодня мы рассмотрим путь выполнения сложного SELECT запроса. Выполнение простого запроса было рассмотрено в предыдущей статье. На этот раз запрос выглядит следующим образом:

SELECT u.name 
FROM users AS u
WHERE u.age > 24
GROUP BY u.name
HAVING COUNT(*) = 1
ORDER BY u.name
LIMIT 50

Различия, при сравнении с выполнением простого запроса, имеются в 3 местах: парсинг, создание плана и выполнение. Поэтому рассмотрим только эти случаи.

Парсинг

Различия начинаются на этапе переписывания.

Напомню, что входная точка переписывания запроса — transformStmt (src/backend/parser/analyze.c). Эта функция различает основные типы запросов и запускает соответствующий обработчик. Для SELECT это снова transformSelectStmt (src/backend/parser/analyze.c).

FROM users AS u

Для обработки выражения FROM вызывается transformFromClause (src/backend/parser/parse_clause.c). Обработка будет вестись точно так же как и в простом запросе.

SELECT u.name

В простом запросе мы получали все столбцы, используя астериск (*). Для этого использовалась отдельная функция ExpandColumnRefStar (src/backend/parser/parse_target.c).

В случае когда в запросе нет астериска, вызывается общая функция transformTargetEntry (src/backend/parser/parse_target.c) — не важно столбец, выражение, вызов функции или константа.

/*
 * transformTargetEntry()
 *	Transform any ordinary "expression-type" node into a targetlist entry.
 *	This is exported so that parse_clause.c can generate targetlist entries
 *	for ORDER/GROUP BY items that are not already in the targetlist.
 *
 * node		the (untransformed) parse tree for the value expression.
 * expr		the transformed expression, or NULL if caller didn't do it yet.
 * exprKind expression kind (EXPR_KIND_SELECT_TARGET, etc)
 * colname	the column name to be assigned, or NULL if none yet set.
 * resjunk	true if the target should be marked resjunk, ie, it is not
 *			wanted in the final projected tuple.
 */
TargetEntry *
transformTargetEntry(ParseState *pstate, Node *node, Node *expr, ParseExprKind exprKind, char *colname, bool resjunk)

Внутри она состоит из 2 частей (вызовов функций):

transformExpr создает объект ParseState и передает ее transformExprRecurse — настоящему парсеру.

Внутри он запускает обработчик на основании тэга узла. Для u.name тэг — T_ColumnRef (выражение — ссылка на столбец). Для него запустится обработчик transformColumnRef

На вход он получает созданную структуру состояния парсинга ParseState и структуру столбца ColumnRef

/*
 * ColumnRef - specifies a reference to a column, or possibly a whole tuple
 *
 * The "fields" list must be nonempty.  It can contain String nodes
 * (representing names) and A_Star nodes (representing occurrence of a '*').
 * Currently, A_Star must appear only as the last list element --- the grammar
 * is responsible for enforcing this!
 *
 * Note: any container subscripting or selection of fields from composite columns
 * is represented by an A_Indirection node above the ColumnRef.  However,
 * for simplicity in the normal case, initial field selection from a table
 * name is represented within ColumnRef and not by adding A_Indirection.
 */
typedef struct ColumnRef
{
	NodeTag		type;
	List	   *fields;			/* field names (String nodes) or A_Star */
	int			location;		/* token location, or -1 if unknown */
} ColumnRef;

/*
 * State information used during parse analysis
 *
 * parentParseState: NULL in a top-level ParseState.  When parsing a subquery,
 * links to current parse state of outer query.
 *
 * p_sourcetext: source string that generated the raw parsetree being
 * analyzed, or NULL if not available.  (The string is used only to
 * generate cursor positions in error messages: we need it to convert
 * byte-wise locations in parse structures to character-wise cursor
 * positions.)
 *
 * p_rtable: list of RTEs that will become the rangetable of the query.
 * Note that neither relname nor refname of these entries are necessarily
 * unique; searching the rtable by name is a bad idea.
 *
 * p_joinexprs: list of JoinExpr nodes associated with p_rtable entries.
 * This is one-for-one with p_rtable, but contains NULLs for non-join
 * RTEs, and may be shorter than p_rtable if the last RTE(s) aren't joins.
 *
 * p_joinlist: list of join items (RangeTblRef and JoinExpr nodes) that
 * will become the fromlist of the query's top-level FromExpr node.
 *
 * p_namespace: list of ParseNamespaceItems that represents the current
 * namespace for table and column lookup.  (The RTEs listed here may be just
 * a subset of the whole rtable.  See ParseNamespaceItem comments below.)
 *
 * p_lateral_active: true if we are currently parsing a LATERAL subexpression
 * of this parse level.  This makes p_lateral_only namespace items visible,
 * whereas they are not visible when p_lateral_active is FALSE.
 *
 * p_ctenamespace: list of CommonTableExprs (WITH items) that are visible
 * at the moment.  This is entirely different from p_namespace because a CTE
 * is not an RTE, rather "visibility" means you could make an RTE from it.
 *
 * p_future_ctes: list of CommonTableExprs (WITH items) that are not yet
 * visible due to scope rules.  This is used to help improve error messages.
 *
 * p_parent_cte: CommonTableExpr that immediately contains the current query,
 * if any.
 *
 * p_target_relation: target relation, if query is INSERT/UPDATE/DELETE/MERGE
 *
 * p_target_nsitem: target relation's ParseNamespaceItem.
 *
 * p_is_insert: true to process assignment expressions like INSERT, false
 * to process them like UPDATE.  (Note this can change intra-statement, for
 * cases like INSERT ON CONFLICT UPDATE.)
 *
 * p_windowdefs: list of WindowDefs representing WINDOW and OVER clauses.
 * We collect these while transforming expressions and then transform them
 * afterwards (so that any resjunk tlist items needed for the sort/group
 * clauses end up at the end of the query tlist).  A WindowDef's location in
 * this list, counting from 1, is the winref number to use to reference it.
 *
 * p_expr_kind: kind of expression we're currently parsing, as per enum above;
 * EXPR_KIND_NONE when not in an expression.
 *
 * p_next_resno: next TargetEntry.resno to assign, starting from 1.
 *
 * p_multiassign_exprs: partially-processed MultiAssignRef source expressions.
 *
 * p_locking_clause: query's FOR UPDATE/FOR SHARE clause, if any.
 *
 * p_locked_from_parent: true if parent query level applies FOR UPDATE/SHARE
 * to this subquery as a whole.
 *
 * p_resolve_unknowns: resolve unknown-type SELECT output columns as type TEXT
 * (this is true by default).
 *
 * p_hasAggs, p_hasWindowFuncs, etc: true if we've found any of the indicated
 * constructs in the query.
 *
 * p_last_srf: the set-returning FuncExpr or OpExpr most recently found in
 * the query, or NULL if none.
 *
 * p_pre_columnref_hook, etc: optional parser hook functions for modifying the
 * interpretation of ColumnRefs and ParamRefs.
 *
 * p_ref_hook_state: passthrough state for the parser hook functions.
 */
struct ParseState
{
	ParseState *parentParseState;	/* stack link */
	const char *p_sourcetext;	/* source text, or NULL if not available */
	List	   *p_rtable;		/* range table so far */
	List	   *p_joinexprs;	/* JoinExprs for RTE_JOIN p_rtable entries */
	List	   *p_joinlist;		/* join items so far (will become FromExpr
								 * node's fromlist) */
	List	   *p_namespace;	/* currently-referenceable RTEs (List of
								 * ParseNamespaceItem) */
	bool		p_lateral_active;	/* p_lateral_only items visible? */
	List	   *p_ctenamespace; /* current namespace for common table exprs */
	List	   *p_future_ctes;	/* common table exprs not yet in namespace */
	CommonTableExpr *p_parent_cte;	/* this query's containing CTE */
	Relation	p_target_relation;	/* INSERT/UPDATE/DELETE/MERGE target rel */
	ParseNamespaceItem *p_target_nsitem;	/* target rel's NSItem, or NULL */
	bool		p_is_insert;	/* process assignment like INSERT not UPDATE */
	List	   *p_windowdefs;	/* raw representations of window clauses */
	ParseExprKind p_expr_kind;	/* what kind of expression we're parsing */
	int			p_next_resno;	/* next targetlist resno to assign */
	List	   *p_multiassign_exprs;	/* junk tlist entries for multiassign */
	List	   *p_locking_clause;	/* raw FOR UPDATE/FOR SHARE info */
	bool		p_locked_from_parent;	/* parent has marked this subquery
										 * with FOR UPDATE/FOR SHARE */
	bool		p_resolve_unknowns; /* resolve unknown-type SELECT outputs as
									 * type text */

	QueryEnvironment *p_queryEnv;	/* curr env, incl refs to enclosing env */

	/* Flags telling about things found in the query: */
	bool		p_hasAggs;
	bool		p_hasWindowFuncs;
	bool		p_hasTargetSRFs;
	bool		p_hasSubLinks;
	bool		p_hasModifyingCTE;

	Node	   *p_last_srf;		/* most recent set-returning func/op found */

	/*
	 * Optional hook functions for parser callbacks.  These are null unless
	 * set up by the caller of make_parsestate.
	 */
	PreParseColumnRefHook p_pre_columnref_hook;
	PostParseColumnRefHook p_post_columnref_hook;
	ParseParamRefHook p_paramref_hook;
	CoerceParamHook p_coerce_param_hook;
	void	   *p_ref_hook_state;	/* common passthrough link for above */
};

Внутри проверяется допустимость использования ссылки на столбец. Проверка выполняется с помощью проверки типа выражения узла. Тип выражения определяется перечислением ParseExprKind(src/include/parser/parse_node.h)

/*
 * Expression kinds distinguished by transformExpr().  Many of these are not
 * semantically distinct so far as expression transformation goes; rather,
 * we distinguish them so that context-specific error messages can be printed.
 *
 * Note: EXPR_KIND_OTHER is not used in the core code, but is left for use
 * by extension code that might need to call transformExpr().  The core code
 * will not enforce any context-driven restrictions on EXPR_KIND_OTHER
 * expressions, so the caller would have to check for sub-selects, aggregates,
 * window functions, SRFs, etc if those need to be disallowed.
 */
typedef enum ParseExprKind
{
	EXPR_KIND_NONE = 0,			/* "not in an expression" */
	EXPR_KIND_OTHER,			/* reserved for extensions */
	EXPR_KIND_JOIN_ON,			/* JOIN ON */
	EXPR_KIND_JOIN_USING,		/* JOIN USING */
	EXPR_KIND_FROM_SUBSELECT,	/* sub-SELECT in FROM clause */
	EXPR_KIND_FROM_FUNCTION,	/* function in FROM clause */
	EXPR_KIND_WHERE,			/* WHERE */
	EXPR_KIND_HAVING,			/* HAVING */
	EXPR_KIND_FILTER,			/* FILTER */
	EXPR_KIND_WINDOW_PARTITION, /* window definition PARTITION BY */
	EXPR_KIND_WINDOW_ORDER,		/* window definition ORDER BY */
	EXPR_KIND_WINDOW_FRAME_RANGE,	/* window frame clause with RANGE */
	EXPR_KIND_WINDOW_FRAME_ROWS,	/* window frame clause with ROWS */
	EXPR_KIND_WINDOW_FRAME_GROUPS,	/* window frame clause with GROUPS */
	EXPR_KIND_SELECT_TARGET,	/* SELECT target list item */
	EXPR_KIND_INSERT_TARGET,	/* INSERT target list item */
	EXPR_KIND_UPDATE_SOURCE,	/* UPDATE assignment source item */
	EXPR_KIND_UPDATE_TARGET,	/* UPDATE assignment target item */
	EXPR_KIND_MERGE_WHEN,		/* MERGE WHEN [NOT] MATCHED condition */
	EXPR_KIND_GROUP_BY,			/* GROUP BY */
	EXPR_KIND_ORDER_BY,			/* ORDER BY */
	EXPR_KIND_DISTINCT_ON,		/* DISTINCT ON */
	EXPR_KIND_LIMIT,			/* LIMIT */
	EXPR_KIND_OFFSET,			/* OFFSET */
	EXPR_KIND_RETURNING,		/* RETURNING */
	EXPR_KIND_VALUES,			/* VALUES */
	EXPR_KIND_VALUES_SINGLE,	/* single-row VALUES (in INSERT only) */
	EXPR_KIND_CHECK_CONSTRAINT, /* CHECK constraint for a table */
	EXPR_KIND_DOMAIN_CHECK,		/* CHECK constraint for a domain */
	EXPR_KIND_COLUMN_DEFAULT,	/* default value for a table column */
	EXPR_KIND_FUNCTION_DEFAULT, /* default parameter value for function */
	EXPR_KIND_INDEX_EXPRESSION, /* index expression */
	EXPR_KIND_INDEX_PREDICATE,	/* index predicate */
	EXPR_KIND_STATS_EXPRESSION, /* extended statistics expression */
	EXPR_KIND_ALTER_COL_TRANSFORM,	/* transform expr in ALTER COLUMN TYPE */
	EXPR_KIND_EXECUTE_PARAMETER,	/* parameter value in EXECUTE */
	EXPR_KIND_TRIGGER_WHEN,		/* WHEN condition in CREATE TRIGGER */
	EXPR_KIND_POLICY,			/* USING or WITH CHECK expr in policy */
	EXPR_KIND_PARTITION_BOUND,	/* partition bound expression */
	EXPR_KIND_PARTITION_EXPRESSION, /* PARTITION BY expression */
	EXPR_KIND_CALL_ARGUMENT,	/* procedure argument in CALL */
	EXPR_KIND_COPY_WHERE,		/* WHERE condition in COPY FROM */
	EXPR_KIND_GENERATED_COLUMN, /* generation expression for a column */
	EXPR_KIND_CYCLE_MARK,		/* cycle mark value */
} ParseExprKind;

Единственные недопустимые значения — EXPR_KIND_COLUMN_DEFAULT и EXPR_KIND_PARTITION_BOUND.

switch (pstate->p_expr_kind)
{
    // ...
    case EXPR_KIND_COLUMN_DEFAULT:
        err = _("cannot use column reference in DEFAULT expression");
        break;
    case EXPR_KIND_PARTITION_BOUND:
        err = _("cannot use column reference in partition bound expression");
        break;
    // ...
}

Стоит обратить внимание на алгоритм парсинга ссылок на столбцы.

Для обращений к столбцам таблиц может быть использовано несколько вариантов. Например, A.B. C или A.*

Всего может быть 7 вариантов действий. Каждый описан в комментарии функции

/*----------
 * The allowed syntaxes are:
 *
 * A		First try to resolve as unqualified column name;
 *			if no luck, try to resolve as unqualified table name (A.*).
 * A.B		A is an unqualified table name; B is either a
 *			column or function name (trying column name first).
 * A.B.C	schema A, table B, col or func name C.
 * A.B.C.D	catalog A, schema B, table C, col or func D.
 * A.*		A is an unqualified table name; means whole-row value.
 * A.B.*	whole-row value of table B in schema A.
 * A.B.C.*	whole-row value of table C in schema B in catalog A.
 *
 * We do not need to cope with bare "*"; that will only be accepted by
 * the grammar at the top level of a SELECT list, and transformTargetList
 * will take care of it before it ever gets here.  Also, "A.*" etc will
 * be expanded by transformTargetList if they appear at SELECT top level,
 * so here we are only going to see them as function or operator inputs.
 *
 * Currently, if a catalog name is given then it must equal the current
 * database name; we check it here and then discard it.
 *----------
 */

Для определения используемого варианта, используется поле fields структуры ColumnRef. Это список содержащий названия A, B, C, D (из комментария).

В зависимости от длины списка определяется и метод обработки. Например, если длина списка 2, то проверяются варианты A.B и A.*

switch (list_length(cref->fields))
{
  // ...
  case 2:
  {
      Node	   *field1 = (Node *) linitial(cref->fields);
      Node	   *field2 = (Node *) lsecond(cref->fields);

      Assert(IsA(field1, String));
      relname = strVal(field1);

      /* Locate the referenced nsitem */
      nsitem = refnameNamespaceItem(pstate, nspname, relname,
                                    cref->location,
                                    &levels_up);
      if (nsitem == NULL)
      {
          crerr = CRERR_NO_RTE;
          break;
      }

      /* Whole-row reference? */
      if (IsA(field2, A_Star))
      {
          node = transformWholeRowRef(pstate, nsitem, levels_up,
                                      cref->location);
          break;
      }

      Assert(IsA(field2, String));
      colname = strVal(field2);

      /* Try to identify as a column of the nsitem */
      node = scanNSItemForColumn(pstate, nsitem, levels_up, colname,
                                 cref->location);
      if (node == NULL)
      {
          /* Try it as a function call on the whole row */
          node = transformWholeRowRef(pstate, nsitem, levels_up,
                                      cref->location);
          node = ParseFuncOrColumn(pstate,
                                   list_make1(makeString(colname)),
                                   list_make1(node),
                                   pstate->p_last_srf,
                                   NULL,
                                   false,
                                   cref->location);
      }
      break;
  }
  // ...
}

Также внутри функции определено вспомогательное перечисление для корректной обработки ошибок. Так в описанном выше случае, если таблица не найдена, то переменной crerr присваивается значение CRERR_NO_RTE

enum
{
    CRERR_NO_COLUMN,
    CRERR_NO_RTE,
    CRERR_WRONG_DB,
    CRERR_TOO_MANY
} crerr = CRERR_NO_COLUMN;

// ...

if (nsitem == NULL)
{
    crerr = CRERR_NO_RTE;
    break;
}

После успешного парсинга столбца нужно определить его отображаемое название. Если мы указали псевдоним (»AS …»), то название уже задано, иначе пытаемся вывести. Для этого используется функция FigureColname (src/backend/parser/parse_target.c)

Название создается на основании типа узла. В случае столбца выбирается самое последнее название из списка (за исключением *)

switch (nodeTag(node))
{
    case T_ColumnRef:
        {
            char	   *fname = NULL;
            ListCell   *l;
  
            /* find last field name, if any, ignoring "*" */
            foreach(l, ((ColumnRef *) node)->fields)
            {
                Node	   *i = lfirst(l);
  
                if (IsA(i, String))
                    fname = strVal(i);
            }
            if (fname)
            {
                *name = fname;
                return 2;
            }
        }
        break;
    // ...
}

Сейчас мы имеем только названия столбцов, но куда они указывают не понятно. Последняя стадия — добавление OID таблицы, на которую столбец указывает. За это отвечает функция markTargetListOrigin (src/backend/parser/parse_target.c)

WHERE u.age > 24

Дальше идет парсинг предложения WHERE. За это отвечает transformWhereClause

В начале происходит парсинг самого выражения. В условии может быть любое выражение, поэтому для обработки вызывается transformExpr (общая функция парсинга выражений)

Выражение это такой же узел и для него определены свои тэги. Для выражений с бинарными/унарными операторами или SQL предложениями (LIKE, BETWEEN) тег узла равен T_A_Expr. А чтобы выявить подтип, используется дополнительное поле kind типа A_Expr_Kind (src/include/nodes/parsenodes.h)

/*
 * A_Expr - infix, prefix, and postfix expressions
 */
typedef enum A_Expr_Kind
{
	AEXPR_OP,					/* normal operator */
	AEXPR_OP_ANY,				/* scalar op ANY (array) */
	AEXPR_OP_ALL,				/* scalar op ALL (array) */
	AEXPR_DISTINCT,				/* IS DISTINCT FROM - name must be "=" */
	AEXPR_NOT_DISTINCT,			/* IS NOT DISTINCT FROM - name must be "=" */
	AEXPR_NULLIF,				/* NULLIF - name must be "=" */
	AEXPR_IN,					/* [NOT] IN - name must be "=" or "<>" */
	AEXPR_LIKE,					/* [NOT] LIKE - name must be "~~" or "!~~" */
	AEXPR_ILIKE,				/* [NOT] ILIKE - name must be "~~*" or "!~~*" */
	AEXPR_SIMILAR,				/* [NOT] SIMILAR - name must be "~" or "!~" */
	AEXPR_BETWEEN,				/* name must be "BETWEEN" */
	AEXPR_NOT_BETWEEN,			/* name must be "NOT BETWEEN" */
	AEXPR_BETWEEN_SYM,			/* name must be "BETWEEN SYMMETRIC" */
	AEXPR_NOT_BETWEEN_SYM		/* name must be "NOT BETWEEN SYMMETRIC" */
} A_Expr_Kind;

Для бинарной операции (сравнение) это поле равно AEXPR_OP. За его обработку отвечает функция transformAExprOp (src/backend/parser/parse_expr.c). В итоге получается такой порядок

switch (nodeTag(expr))
{
    case T_A_Expr:
	{
		A_Expr	   *a = (A_Expr *) expr;
		switch (a->kind)
		{
            case AEXPR_OP:
				result = transformAExprOp(pstate, a);
    			break;
            // ..
        }
        break;
    // ...
    }
}

СамtransformAExprOp просматривает несколько краевых случаев:

»= NULL»

Самый первый обрабатываемый случай — это трансформация »= NULL» в »IS NULL». В комментарии поясняется, что это сделано для совместимости с другими языками, нарушающими стандарты SQL

/*
 * Special-case "foo = NULL" and "NULL = foo" for compatibility with
 * standards-broken products (like Microsoft's).  Turn these into IS NULL
 * exprs. (If either side is a CaseTestExpr, then the expression was
 * generated internally from a CASE-WHEN expression, and
 * transform_null_equals does not apply.)
 */
if (Transform_null_equals &&
    list_length(a->name) == 1 &&
    strcmp(strVal(linitial(a->name)), "=") == 0 &&
    (exprIsNullConstant(lexpr) || exprIsNullConstant(rexpr)) &&
    (!IsA(lexpr, CaseTestExpr) && !IsA(rexpr, CaseTestExpr)))
{ 
  // ...
}

ROW op SUBSELECT

Другой особый случай — когда один из операндов подзапрос. Этом случае тип выражения меняется с EXPR_SUBLINK на ROWCOMPARE_SUBLINK. Сам тип представляется перечислением SubLinkType

/*
 * SubLink
 *
 * A SubLink represents a subselect appearing in an expression, and in some
 * cases also the combining operator(s) just above it.  The subLinkType
 * indicates the form of the expression represented:
 *	EXISTS_SUBLINK		EXISTS(SELECT ...)
 *	ALL_SUBLINK			(lefthand) op ALL (SELECT ...)
 *	ANY_SUBLINK			(lefthand) op ANY (SELECT ...)
 *	ROWCOMPARE_SUBLINK	(lefthand) op (SELECT ...)
 *	EXPR_SUBLINK		(SELECT with single targetlist item ...)
 *	MULTIEXPR_SUBLINK	(SELECT with multiple targetlist items ...)
 *	ARRAY_SUBLINK		ARRAY(SELECT with single targetlist item ...)
 *	CTE_SUBLINK			WITH query (never actually part of an expression)
 * For ALL, ANY, and ROWCOMPARE, the lefthand is a list of expressions of the
 * same length as the subselect's targetlist.  ROWCOMPARE will *always* have
 * a list with more than one entry; if the subselect has just one target
 * then the parser will create an EXPR_SUBLINK instead (and any operator
 * above the subselect will be represented separately).
 * ROWCOMPARE, EXPR, and MULTIEXPR require the subselect to deliver at most
 * one row (if it returns no rows, the result is NULL).
 * ALL, ANY, and ROWCOMPARE require the combining operators to deliver boolean
 * results.  ALL and ANY combine the per-row results using AND and OR
 * semantics respectively.
 * ARRAY requires just one target column, and creates an array of the target
 * column's type using any number of rows resulting from the subselect.
 *
 * SubLink is classed as an Expr node, but it is not actually executable;
 * it must be replaced in the expression tree by a SubPlan node during
 * planning.
 *
 * NOTE: in the raw output of gram.y, testexpr contains just the raw form
 * of the lefthand expression (if any), and operName is the String name of
 * the combining operator.  Also, subselect is a raw parsetree.  During parse
 * analysis, the parser transforms testexpr into a complete boolean expression
 * that compares the lefthand value(s) to PARAM_SUBLINK nodes representing the
 * output columns of the subselect.  And subselect is transformed to a Query.
 * This is the representation seen in saved rules and in the rewriter.
 *
 * In EXISTS, EXPR, MULTIEXPR, and ARRAY SubLinks, testexpr and operName
 * are unused and are always null.
 *
 * subLinkId is currently used only for MULTIEXPR SubLinks, and is zero in
 * other SubLinks.  This number identifies different multiple-assignment
 * subqueries within an UPDATE statement's SET list.  It is unique only
 * within a particular targetlist.  The output column(s) of the MULTIEXPR
 * are referenced by PARAM_MULTIEXPR Params appearing elsewhere in the tlist.
 *
 * The CTE_SUBLINK case never occurs in actual SubLink nodes, but it is used
 * in SubPlans generated for WITH subqueries.
 */
typedef enum SubLinkType
{
	EXISTS_SUBLINK,
	ALL_SUBLINK,
	ANY_SUBLINK,
	ROWCOMPARE_SUBLINK,
	EXPR_SUBLINK,
	MULTIEXPR_SUBLINK,
	ARRAY_SUBLINK,
	CTE_SUBLINK					/* for SubPlans only */
} SubLinkType;

Когда обе части — это ROW выражения тоже отдельный случай (ROW op ROW).

Обычный оператор

Точнее сказать, это все остальные случаи. Сюда мы попадаем, если другие условия не сработали.

При исполнении нашего запроса, мы попадаем в последнюю ветку. Внутри каждая часть выражения рекурсивно разбирается (вызов transformExprRecurse):

Левая часть u.name попадает под случай ColumnRef — ссылка на столбец. Она обрабатывается так же как и выше (в SELECT)
Правая часть 24 попадает под случай T_Const: узел — константа. Для разбора констант узлов используется функция make_const. Разбор константы — тот же самый switch/case. Причем парсинг числа — самый простой (всего 4 строки)

/*
 * make_const
 *
 *	Convert an A_Const node (as returned by the grammar) to a Const node
 *	of the "natural" type for the constant.  Note that this routine is
 *	only used when there is no explicit cast for the constant, so we
 *	have to guess what type is wanted.
 *
 *	For string literals we produce a constant of type UNKNOWN ---- whose
 *	representation is the same as cstring, but it indicates to later type
 *	resolution that we're not sure yet what type it should be considered.
 *	Explicit "NULL" constants are also typed as UNKNOWN.
 *
 *	For integers and floats we produce int4, int8, or numeric depending
 *	on the value of the number.  XXX We should produce int2 as well,
 *	but additional cleanup is needed before we can do that; there are
 *	too many examples that fail if we try.
 */
Const *
make_const(ParseState *pstate, A_Const *aconst)
{
  // ...
  switch (nodeTag(&aconst->val))
  {
      case T_Integer:
          val = Int32GetDatum(intVal(&aconst->val));
          typeid = INT4OID;
          typelen = sizeof(int32);
          typebyval = true;
          break;
      // ...
  }
  // ..
}

После вызывается функция make_op (src/backend/parser/parse_oper.c) для создания узла операции.

HAVING COUNT (*) = 1

Следующий этап — разбор предложения HAVING. Для разбора HAVING используется та же функция, что и при парсинге WHERE — transformWhereClause

Различие начинается в функции transformExprRecurse: левый оператор теперь не ссылка на столбец, а агрегирующая функция. Поэтому тэг не T_ColumnRef, а T_FuncCall — вызов функции. За ее обработку отвечает transformFuncCall(src/backend/parser/parse_expr.c)

Сама логика парсинга заключается в функции ParseFuncOrColumn

tab.col = col (tab)

В Postgres обращение к столбцу в таблице может быть записано 2 способами: через точку (table.column) и как вызов функции (column(table)). Это легаси, с которым нужно мириться.

Поэтому название функции имеет такое название.

В комментарии дано более подробное описание

/*
 *	Parse a function call
 *
 *	For historical reasons, Postgres tries to treat the notations tab.col
 *	and col(tab) as equivalent: if a single-argument function call has an
 *	argument of complex type and the (unqualified) function name matches
 *	any attribute of the type, we can interpret it as a column projection.
 *	Conversely a function of a single complex-type argument can be written
 *	like a column reference, allowing functions to act like computed columns.
 *
 *	If both interpretations are possible, we prefer the one matching the
 *	syntactic form, but otherwise the form does not matter.
 *
 *	Hence, both cases come through here.  If fn is null, we're dealing with
 *	column syntax not function syntax.  In the function-syntax case,
 *	the FuncCall struct is needed to carry various decoration that applies
 *	to aggregate and window functions.
 *
 *	Also, when fn is null, we return NULL on failure rather than
 *	reporting a no-such-function error.
 * ...
 */

Как происходит поиск функции

Информация о функциях (и других. Для получения информации о функции используется func_get_detail(src/backend/parser/parse_func.c).

Для поиска используются кандидаты. Поиск производится по названию (разбирается на неймспейс и имя) и аргументам (количеству и именам).

Поиск кандидатов реализуется в FuncnameGetCandidates(src/backend/catalog/namespace.c). Вначале из системного кэша получаются все вхождения функций (список) с совпадающим названием.

После происходит итерирование по всем элементам этого списка в поиске кандидатов. Функция может стать кандидатом только если:

Неймспейс совпадает (поиск происходил только по названию)
Произошло совпадение по OUT параметрам
Именованные аргументы совпали
Совпадения по типам переданных параметров

Когда список кандидатов готов, он подвергается дополнительным испытаниям.

Происходит поиск лучшего кандидата: сравниваются типы всех аргументов. Если они все совпадают, то функция найдена

/*
 * Quickly check if there is an exact match to the input datatypes (there
 * can be only one)
 */
for (best_candidate = raw_candidates;
     best_candidate != NULL;
     best_candidate = best_candidate->next)
{
    /* if nargs==0, argtypes can be null; don't pass that to memcmp */
    if (nargs == 0 ||
        memcmp(argtypes, best_candidate->args, nargs * sizeof(Oid)) == 0)
        break;
}

Если такой функции не нашлось и при этом у функции только 1 аргумент — делается предположение, что функция — каст к типу. Для этого имя функции интерпретируется как название типа и происходит попытка привести его к касту (coercion).

if (best_candidate == NULL)
{
    if (nargs == 1 && fargs != NIL && fargnames == NIL)
    {
        Oid	targetType = FuncNameAsType(funcname);

        if (OidIsValid(targetType))
        {
           // ...
        }
   }
}

Потом происходит поиск по совпадению аргументов с возможным приведением типов — func_match_argtypes(src/backend/parser/parse_func.c).

Если нашлась одна подходящая функция, то возвращается она.
Если под условие подошло сразу несколько то происходит последний рывок вызов func_select_candidate (src/backend/parser/parse_func.c), которая пытается решить конфликт.

Если дошли до этого момента — то функцию не удалось найти.

Системный кэш, syscache

Во время работы нам может понадобиться множество объектов. Например, информация о существующих функциях как в примере выше. Для ускорения работы, подобные «горячие» элементы хранятся в памяти. Доступ к ним производится через системный кэш.

За работу с ним отвечает модуль syscache (src/include/utils/syscache.h).

В этом заголовочном файле объявлены необходимые для работы перечисление SysCacheIdentifier (src/include/utils/syscache.h), сигнатуры функций для доступа к кэшу и несколько полезных макросов.

Перечисление SysCacheIdentifier определяет область данных, которые мы ищем.

/*
 *		SysCache identifiers.
 *
 *		The order of these identifiers must match the order
 *		of the entries in the array cacheinfo[] in syscache.c.
 *		Keep them in alphabetical order (renumbering only costs a
 *		backend rebuild).
 */

enum SysCacheIdentifier
{
	AGGFNOID = 0,
	AMNAME,
	AMOID,
	AMOPOPID,
	AMOPSTRATEGY,
	AMPROCNUM,
	ATTNAME,
	ATTNUM,
	AUTHMEMMEMROLE,
	AUTHMEMROLEMEM,
	AUTHNAME,
	AUTHOID,
	CASTSOURCETARGET,
	CLAAMNAMENSP,
	CLAOID,
	COLLNAMEENCNSP,
	COLLOID,
	CONDEFAULT,
	CONNAMENSP,
	CONSTROID,
	CONVOID,
	DATABASEOID,
	DEFACLROLENSPOBJ,
	ENUMOID,
	ENUMTYPOIDNAME,
	EVENTTRIGGERNAME,
	EVENTTRIGGEROID,
	FOREIGNDATAWRAPPERNAME,
	FOREIGNDATAWRAPPEROID,
	FOREIGNSERVERNAME,
	FOREIGNSERVEROID,
	FOREIGNTABLEREL,
	INDEXRELID,
	LANGNAME,
	LANGOID,
	NAMESPACENAME,
	NAMESPACEOID,
	OPERNAMENSP,
	OPEROID,
	OPFAMILYAMNAMENSP,
	OPFAMILYOID,
	PARAMETERACLNAME,
	PARAMETERACLOID,
	PARTRELID,
	PROCNAMEARGSNSP,
	PROCOID,
	PUBLICATIONNAME,
	PUBLICATIONNAMESPACE,
	PUBLICATIONNAMESPACEMAP,
	PUBLICATIONOID,
	PUBLICATIONREL,
	PUBLICATIONRELMAP,
	RANGEMULTIRANGE,
	RANGETYPE,
	RELNAMENSP,
	RELOID,
	REPLORIGIDENT,
	REPLORIGNAME,
	RULERELNAME,
	SEQRELID,
	STATEXTDATASTXOID,
	STATEXTNAMENSP,
	STATEXTOID,
	STATRELATTINH,
	SUBSCRIPTIONNAME,
	SUBSCRIPTIONOID,
	SUBSCRIPTIONRELMAP,
	TABLESPACEOID,
	TRFOID,
	TRFTYPELANG,
	TSCONFIGMAP,
	TSCONFIGNAMENSP,
	TSCONFIGOID,
	TSDICTNAMENSP,
	TSDICTOID,
	TSPARSERNAMENSP,
	TSPARSEROID,
	TSTEMPLATENAMENSP,
	TSTEMPLATEOID,
	TYPENAMENSP,
	TYPEOID,
	USERMAPPINGOID,
	USERMAPPINGUSERSERVER

#define SysCacheSize (USERMAPPINGUSERSERVER + 1)
};

Как понятно из сигнатуры, функции InitCatalogCache и InitCatalogCachePhase2 инициализируют кэш. Обе эти функции вызываются в InitPostgres на моменте старта базы данных. Используется 2 функции, так как инициализация разбита на 2 фазы: выделение памяти и общая инициализация (не загрузка кэша), соответственно.

На примере поиска необходимой функции, работа с кэшем выглядит следующим образом:

Вызываем макрос SearchSysCacheList* для списка значений по ключу. Под * имеется ввиду количество ключей по которым идет поиск. В случае поиска функции, использовалось значение 1

/* Search syscache by name only */
catlist = SearchSysCacheList1(PROCNAMEARGSNSP, CStringGetDatum(funcname));

Нам возвращается структура CatCList. Она представляет собой список из элементов кэша. Элементами этого списка являются CatCTup

/*
 * A CatCList describes the result of a partial search, ie, a search using
 * only the first K key columns of an N-key cache.  We store the keys used
 * into the keys attribute to represent the stored key set.  The CatCList
 * object contains links to cache entries for all the table rows satisfying
 * the partial key.  (Note: none of these will be negative cache entries.)
 *
 * A CatCList is only a member of a per-cache list; we do not currently
 * divide them into hash buckets.
 *
 * A list marked "dead" must not be returned by subsequent searches.
 * However, it won't be physically deleted from the cache until its
 * refcount goes to zero.  (A list should be marked dead if any of its
 * member entries are dead.)
 *
 * If "ordered" is true then the member tuples appear in the order of the
 * cache's underlying index.  This will be true in normal operation, but
 * might not be true during bootstrap or recovery operations. (namespace.c
 * is able to save some cycles when it is true.)
 */
typedef struct catclist
{
	int			cl_magic;		/* for identifying CatCList entries */
#define CL_MAGIC   0x52765103

	uint32		hash_value;		/* hash value for lookup keys */

	dlist_node	cache_elem;		/* list member of per-catcache list */

	/*
	 * Lookup keys for the entry, with the first nkeys elements being valid.
	 * All by-reference are separately allocated.
	 */
	Datum		keys[CATCACHE_MAXKEYS];

	int			refcount;		/* number of active references */
	bool		dead;			/* dead but not yet removed? */
	bool		ordered;		/* members listed in index order? */
	short		nkeys;			/* number of lookup keys specified */
	int			n_members;		/* number of member tuples */
	CatCache   *my_cache;		/* link to owning catcache */
	CatCTup    *members[FLEXIBLE_ARRAY_MEMBER]; /* members */
} CatCList;

typedef struct catctup
{
	int			ct_magic;		/* for identifying CatCTup entries */
#define CT_MAGIC   0x57261502

	uint32		hash_value;		/* hash value for this tuple's keys */

	/*
	 * Lookup keys for the entry. By-reference datums point into the tuple for
	 * positive cache entries, and are separately allocated for negative ones.
	 */
	Datum		keys[CATCACHE_MAXKEYS];

	/*
	 * Each tuple in a cache is a member of a dlist that stores the elements
	 * of its hash bucket.  We keep each dlist in LRU order to speed repeated
	 * lookups.
	 */
	dlist_node	cache_elem;		/* list member of per-bucket list */

	/*
	 * A tuple marked "dead" must not be returned by subsequent searches.
	 * However, it won't be physically deleted from the cache until its
	 * refcount goes to zero.  (If it's a member of a CatCList, the list's
	 * refcount must go to zero, too; also, remember to mark the list dead at
	 * the same time the tuple is marked.)
	 *
	 * A negative cache entry is an assertion that there is no tuple matching
	 * a particular key.  This is just as useful as a normal entry so far as
	 * avoiding catalog searches is concerned.  Management of positive and
	 * negative entries is identical.
	 */
	int			refcount;		/* number of active references */
	bool		dead;			/* dead but not yet removed? */
	bool		negative;		/* negative cache entry? */
	HeapTupleData tuple;		/* tuple management header */

	/*
	 * The tuple may also be a member of at most one CatCList.  (If a single
	 * catcache is list-searched with varying numbers of keys, we may have to
	 * make multiple entries for the same tuple because of this restriction.
	 * Currently, that's not expected to be common, so we accept the potential
	 * inefficiency.)
	 */
	struct catclist *c_list;	/* containing CatCList, or NULL if none */

	CatCache   *my_cache;		/* link to owning catcache */
	/* properly aligned tuple data follows, unless a negative entry */
} CatCTup;

Итерируемся по всем элементам members, получаем хранимые в нем данные и работаем с ними

for (i = 0; i < catlist->n_members; i++)
{
	HeapTuple	proctup = &catlist->members[i]->tuple;
    // ...
}

Возвращаем список обратно в кэш — ReleaseSysCacheList

ReleaseSysCacheList(catlist);

Осталось только выяснить как этот кэш заполняется и хранится.

Хранение

Сам кэш хранится в глобальной переменной SysCache. Это массив типа CatCache

typedef struct catcache
{
	int			id;				/* cache identifier --- see syscache.h */
	int			cc_nbuckets;	/* # of hash buckets in this cache */
	TupleDesc	cc_tupdesc;		/* tuple descriptor (copied from reldesc) */
	dlist_head *cc_bucket;		/* hash buckets */
	CCHashFN	cc_hashfunc[CATCACHE_MAXKEYS];	/* hash function for each key */
	CCFastEqualFN cc_fastequal[CATCACHE_MAXKEYS];	/* fast equal function for
													 * each key */
	int			cc_keyno[CATCACHE_MAXKEYS]; /* AttrNumber of each key */
	dlist_head	cc_lists;		/* list of CatCList structs */
	int			cc_ntup;		/* # of tuples currently in this cache */
	int			cc_nkeys;		/* # of keys (1..CATCACHE_MAXKEYS) */
	const char *cc_relname;		/* name of relation the tuples come from */
	Oid			cc_reloid;		/* OID of relation the tuples come from */
	Oid			cc_indexoid;	/* OID of index matching cache keys */
	bool		cc_relisshared; /* is relation shared across databases? */
	slist_node	cc_next;		/* list link */
	ScanKeyData cc_skey[CATCACHE_MAXKEYS];	/* precomputed key info for heap
											 * scans */

	/*
	 * Keep these at the end, so that compiling catcache.c with CATCACHE_STATS
	 * doesn't break ABI for other modules
	 */
#ifdef CATCACHE_STATS
	long		cc_searches;	/* total # searches against this cache */
	long		cc_hits;		/* # of matches against existing entry */
	long		cc_neg_hits;	/* # of matches against negative entry */
	long		cc_newloads;	/* # of successful loads of new entry */

	/*
	 * cc_searches - (cc_hits + cc_neg_hits + cc_newloads) is number of failed
	 * searches, each of which will result in loading a negative entry
	 */
	long		cc_invals;		/* # of entries invalidated from cache */
	long		cc_lsearches;	/* total # list-searches */
	long		cc_lhits;		/* # of matches against existing lists */
#endif
} CatCache;


static CatCache *SysCache[SysCacheSize];

Заполнение

Заполнение — лениво: во время вызова SearchCatCacheList мы итерируемся по имеющимся в памяти данным, но если в результате мы ничего не нашли, то входим в область заполнения кэша. Эта область окаймлена PG_TRY/PG_CATCH так как во время работы с таблицей может возникнуть исключение и тогда нужно удалить все добавленные элементы в хэш таблицу.

CatCList *
SearchCatCacheList(CatCache *cache, int nkeys, Datum v1, Datum v2, Datum v3)
{
    dlist_foreach(iter, &cache->cc_lists)
    {
        // Поиск в памяти
    }
    
    PG_TRY();
    {
        // Сканирование системной таблицы для поиска необходимых данных
    }
    PG_CATCH();
	{
		// Удаление добавленных данных из кэша

		PG_RE_THROW();
	}
	PG_END_TRY();
  
    foreach(ctlist_item, ctlist)
    {
        // Поиск в новых элементах таблицы
    }
}

ORDER BY u.name

После идет обработка ORDER BY u.name. За это отвечает transformSortClause(src/backend/parser/parse_clause.c)

Парсинг заключается в последовательной обработке выражений из списка сортировки (выражений, используемых для сортировки) и добавлению их в результирующий список. За их обработку отвечают 2 функции:

Сам разбор состоит из простого итерирования по списку и вызова transformTargetEntry для каждого элемента

foreach(olitem, orderlist)
{
    SortBy	   *sortby = (SortBy *) lfirst(olitem);
    TargetEntry *tle;

    if (useSQL99)
        tle = findTargetlistEntrySQL99(pstate, sortby->node,
                                       targetlist, exprKind);
    else
        tle = findTargetlistEntrySQL92(pstate, sortby->node,
                                       targetlist, exprKind);

    sortlist = addTargetToSortList(pstate, tle,
                                   sortlist, *targetlist, sortby);
}

Разница между SQL 92 и SQL 99

Одно из различий в стандартах SQL92 и SQL99 заключается в логике обработки списка выражений в ORDER BY и GROUP BY.

В стандарте SQL92 этот список состоит из названий столбцов или номера столбца (числа). Например, SELECT name, age FROM users ORDER BY 2 — будет сортировать по возрасту (age)

В стандарте SQL99 каждый элемент списка — это выражение, использующее столбцы таблиц. Например,

SELECT length(name), name from users ORDER BY age * 2

в ORDER BY содержится выражение, а не сырой столбец.

Важно отметить, что в логике парсинга сделана небольшая оптимизация: при парсинге стандартом SQL92, если проваливается попытка обнаружения ссылки на столбец из SELECT выражения, то запасным вариантом запускается парсинг SQL99 стандартом.

static TargetEntry *
findTargetlistEntrySQL92(ParseState *pstate, Node *node, List **tlist,
						 ParseExprKind exprKind)
{
    // ... Парсинг стандартом SQL92 ...

  	/*
	 * Otherwise, we have an expression, so process it per SQL99 rules.
	 */
	return findTargetlistEntrySQL99(pstate, node, tlist, exprKind);
}

Для определения какой метод разбора выбрать используется фича-флаг useSQL99, который передается в transformSortClause.

List *
transformSortClause(ParseState *pstate,
					List *orderlist,
					List **targetlist,
					ParseExprKind exprKind,
					bool useSQL99)
{
    // ...
    if (useSQL99)
        tle = findTargetlistEntrySQL99(pstate, sortby->node,
                                       targetlist, exprKind);
    else
        tle = findTargetlistEntrySQL92(pstate, sortby->node,
                                       targetlist, exprKind);
    // ...
}

GROUP BY u.name

После идет обработка GROUP BY. За это отвечает transformGroupClause (src/backend/parser/parse_clause.c).

В списке выражений группировки может быть 2 «граничных» случая, которые обрабатываются по разному:

Все выражения группировки хранятся в общем списке flat_grouplist. В процессе обработки каждого выражения группировки этот список обновляется.

За обработку выражения отвечает transformGroupClauseExpr. Разбор выражения ведется точно так же как и выражения сортировки — findTargetlistEntrySQL99 и findTargetlistEntrySQL92. Но дальше идет оптимизация: учет сортировки.

Для обработки таких случаев используется структура SortGroupClause (src/include/nodes/parsenodes.h).

/*
 * SortGroupClause -
 *		representation of ORDER BY, GROUP BY, PARTITION BY,
 *		DISTINCT, DISTINCT ON items
 * ...
 */
typedef struct SortGroupClause
{
	NodeTag		type;
	Index		tleSortGroupRef;	/* reference into targetlist */
	Oid			eqop;			/* the equality operator ('=' op) */
	Oid			sortop;			/* the ordering operator ('<' op), or 0 */
	bool		nulls_first;	/* do NULLs come before normal values? */
	bool		hashable;		/* can eqop be implemented by hashing? */
} SortGroupClause;

Внутри себя она хранит информацию необходимую для сортировки и группировки (операторы сравнения, флаг хэшируемости), а для оптимизации памяти вместо самого выражения — индекс на элемент в списке столбцов SELECT.

Работа transformGroupClauseExpr зависит от типа используемой группировки.

В случае GROUPING SET действия зависят от подтипа. Подтип определяется перечислением GroupingSetKind (src/include/nodes/parsenodes.h)

typedef enum GroupingSetKind
{
	GROUPING_SET_EMPTY,
	GROUPING_SET_SIMPLE,
	GROUPING_SET_ROLLUP,
	GROUPING_SET_CUBE,
	GROUPING_SET_SETS
} GroupingSetKind;

Остальные случаи используют transformGroupClauseExpr, для парсинга выражений.

Результатом работы этого этапа является список SortGroupClause — ссылок на элементы из результирующего списка выражений.

LIMIT 50

Последним обрабатывается выражение LIMIT. За его обработку отвечает transformLimitClause(src/backend/parser/parse_clause.c)

Обработка заключается в парсинге выражения. За это отвечает базовый transformExpr. Дополнительно проводится проверка на то, что выражение не ссылается на переменные в запросе. Для этого используется функция checkExprIsVarFree. Она обходит дерево запроса и проверяет существование переменных в любом из узлов.

Этот паттерн посетитель реализуется функцией query_tree_walker(src/backend/nodes/nodeFuncs.c)

bool
query_tree_walker(Query *query,
				  bool (*walker) (),
				  void *context,
				  int flags)

Сам посетитель — функция, которую передаем аргументом. query_tree_walker проходит по всем узлам запроса и применяет ее к каждому узлу.

Посетитель, проверяющий существование переменных, — contain_vars_of_level_walker(src/backend/optimizer/util/var.c).

Важно заметить, что transformLimitClause используется для парсинга как и LIMIT так и OFFSET. Разница заключается в дополнительной проверке для LIMIT: в FETCH FIRST … WITH TIES не разрешено использование NULL

/*
* Don't allow NULLs in FETCH FIRST .. WITH TIES.  This test is ugly and
* extremely simplistic, in that you can pass a NULL anyway by hiding it
* inside an expression -- but this protects ruleutils against emitting an
* unadorned NULL that's not accepted back by the grammar.
*/
if (exprKind == EXPR_KIND_LIMIT && limitOption == LIMIT_OPTION_WITH_TIES &&
    IsA(clause, A_Const) && castNode(A_Const, clause)->isnull)
  ereport(ERROR,
          (errcode(ERRCODE_INVALID_ROW_COUNT_IN_LIMIT_CLAUSE),
           errmsg("row count cannot be null in FETCH FIRST ... WITH TIES clause")));

Предобработка

Следующий важный этап — составление плана запроса. Но перед этим происходит предобработка некоторых выражений.

LIMIT

Первым производится обработка LIMIT/OFFSET выражения. За это отвечает функция preprocess_limit(src/backend/optimizer/plan/planner.c). Эта функция не просто парсит значение узла LIMIT, здесь происходит оценка возможного количества кортежей на выходе — tuple_fraction.

tuple_fraction

Одно из достоинств PostgreSQL — умный планировщик. Один из компонентов такой репутации — логика определения количества кортежей на выходе. В GUC присутствует параметр cursor_tuple_fraction, который представляет оценку количества записей, которые будут получены. По умолчанию, он равен 0,1 (10%)

Это значение используется для определения лучшего плана исполнения: если планируется получить мало строк, то выбирается план с самым быстрым стартом.

Но cursor_tuple_fraction — не конечное значение. Это значение регулируется — preprocess_limit возвращает новое значение tuple_fraction, которое и будет использоваться в процессе создания плана.

Код его расчета — дерево выбора

static double
preprocess_limit(PlannerInfo *root, double tuple_fraction,
				 int64 *offset_est, int64 *count_est)
{
    // ...

    if (*count_est != 0)
	{
		/*
		 * A LIMIT clause limits the absolute number of tuples returned.
		 * However, if it's not a constant LIMIT then we have to guess; for
		 * lack of a better idea, assume 10% of the plan's result is wanted.
		 */
		if (*count_est < 0 || *offset_est < 0)
		{
			/* LIMIT or OFFSET is an expression ... punt ... */
			limit_fraction = 0.10;
		}
		else
		{
			/* LIMIT (plus OFFSET, if any) is max number of tuples needed */
			limit_fraction = (double) *count_est + (double) *offset_est;
		}

		/*
		 * If we have absolute limits from both caller and LIMIT, use the
		 * smaller value; likewise if they are both fractional.  If one is
		 * fractional and the other absolute, we can't easily determine which
		 * is smaller, but we use the heuristic that the absolute will usually
		 * be smaller.
		 */
		if (tuple_fraction >= 1.0)
		{
			if (limit_fraction >= 1.0)
			{
				/* both absolute */
				tuple_fraction = Min(tuple_fraction, limit_fraction);
			}
			else
			{
				/* caller absolute, limit fractional; use caller's value */
			}
		}
		else if (tuple_fraction > 0.0)
		{
			if (limit_fraction >= 1.0)
			{
				/* caller fractional, limit absolute; use limit */
				tuple_fraction = limit_fraction;
			}
			else
			{
				/* both fractional */
				tuple_fraction = Min(tuple_fraction, limit_fraction);
			}
		}
		else
		{
			/* no info from caller, just use limit */
			tuple_fraction = limit_fraction;
		}
	}
	else if (*offset_est != 0 && tuple_fraction > 0.0)
	{
		/*
		 * We have an OFFSET but no LIMIT.  This acts entirely differently
		 * from the LIMIT case: here, we need to increase rather than decrease
		 * the caller's tuple_fraction, because the OFFSET acts to cause more
		 * tuples to be fetched instead of fewer.  This only matters if we got
		 * a tuple_fraction > 0, however.
		 *
		 * As above, use 10% if OFFSET is present but unestimatable.
		 */
		if (*offset_est < 0)
			limit_fraction = 0.10;
		else
			limit_fraction = (double) *offset_est;

		/*
		 * If we have absolute counts from both caller and OFFSET, add them
		 * together; likewise if they are both fractional.  If one is
		 * fractional and the other absolute, we want to take the larger, and
		 * we heuristically assume that's the fractional one.
		 */
		if (tuple_fraction >= 1.0)
		{
			if (limit_fraction >= 1.0)
			{
				/* both absolute, so add them together */
				tuple_fraction += limit_fraction;
			}
			else
			{
				/* caller absolute, limit fractional; use limit */
				tuple_fraction = limit_fraction;
			}
		}
		else
		{
			if (limit_fraction >= 1.0)
			{
				/* caller fractional, limit absolute; use caller's value */
			}
			else
			{
				/* both fractional, so add them together */
				tuple_fraction += limit_fraction;
				if (tuple_fraction >= 1.0)
					tuple_fraction = 0.0;	/* assume fetch all */
			}
		}
	}

	return tuple_fraction;

GROUP BY

Дальше происходит обработка выражения GROUP BY — вызов preprocess_groupclause (src/backend/optimizer/plan/planner.c).

Главная задача этой функции — синхронизировать GROUP BY и ORDER BY выражения. Если это получится сделать, то сортировку и группировку можно выполнить одной операцией.

SELECT…

За предобработку возвращаемого списка значений отвечает preprocess