Skip to main content

SQL Pipeline Syntax

Applies to: check marked yes Databricks Runtime 16.2 and later

Databricks supports SQL pipeline syntax which allows composing queries from combinations of chained operators.

  • Any query can have zero or more pipe operators as a suffix, delineated by the pipe character |>.
  • Each piped operation starts with one or more SQL keywords followed by its own grammar.
  • Operators can apply in any order, any number of times.
  • Typically FROM relation_name is used to start a pipeline, but any query can start a pipeline.

Syntax

{ FROM | TABLE } relation_name { |> piped_operation } [ ...]

Parameters

Example

This is query 13 from the TPC-H benchmark written in ANSI SQL:

SQL
> SELECT c_count, COUNT(*) AS custdist
FROM
(SELECT c_custkey, COUNT(o_orderkey) c_count
FROM customer
LEFT OUTER JOIN orders ON c_custkey = o_custkey
AND o_comment NOT LIKE '%unusual%packages%'
GROUP BY c_custkey
) AS c_orders
GROUP BY c_count
ORDER BY custdist DESC, c_count DESC;

To write the same logic using SQL pipe operators, you can express it like this:

SQL
> FROM customer
|> LEFT OUTER JOIN orders ON c_custkey = o_custkey
AND o_comment NOT LIKE '%unusual%packages%'
|> AGGREGATE COUNT(o_orderkey) c_count
GROUP BY c_custkey
|> AGGREGATE COUNT(*) AS custdist
GROUP BY c_count
|> ORDER BY custdist DESC, c_count DESC;