MISSING_AGGREGATION error class
The non-aggregating expression <expression>
is based on columns which are not participating in the GROUP BY clause.
Add the columns or the expression to the GROUP BY, aggregate the expression, or use <expressionAnyValue>
if you do not care which of the values within a group is returned.
Parameters
- expression: Non aggregating, non grouping expression in the
SELECT
list. - expressionAnyValue:
expression
wrapped in an any_value() aggregate function.
Explanation
Within the context of a query with a GROUP BY clause, the local column-references in the SELECT list must be:
- Consumed as an argument to an aggregate function, or
- Part of an expression which matches an expression on the
GROUP BY
clause.
A local column reference is a column that has been resolved to a table-reference in the query’s FROM clause.
In other words: Column-references must either be part of the grouping keys, or they must be part of the aggregation.
Databricks matches expressions on best effort:
For example it will recognize: SELECT c1 + 5 FROM T GROUP BY 5 + c1
as mathing expressions.
But SELECT c1 FROM T GROUP BY c1 + 5
is not a match.
Mitigation
The mitigation of the error depends on the cause:
-
Did you miss a grouping column?
Add
expression
, or the relevant subexpression ofexpression
to theGROUP BY
clause. -
Is the column reference part of a
GROUP BY
expression which differs fromepression
?Match the expression in the
SELECT
list or simplify theGROUP BY
expression. -
Are you missing the aggregation?
Wrap the column reference with an aggregate function. If you only want a representative value from the group, you can use any_value(epression).
Examples
-- Sample data
> CREATE OR REPLACE TEMPORARY VIEW tasks(name, firstname, task, cost) AS
VALUES ('Smith' , 'Sam' , 'UNPIVOT', 10),
('Smith' , 'Sam' , 'LATERAL', 5),
('Shuster', 'Sally' , 'DELETE' , 7),
('Shuster', 'Sally' , 'GRANT' , 8);
-- `name` and `firstname` are part of the group by coumns, but incomplete
> SELECT name, firstname, sum(cost) FROM tasks GROUP BY firstname || ' ' || name;
[MISSING_AGGREGATION] The expression "name" is neither present in the group by, nor is it an aggregate function.
-- Match the GROUP BY expression
> SELECT firstname || ' ' || name, sum(cost) FROM tasks GROUP BY firstname || ' ' || name;
Sam Smith 15
Sally Shuster 15
-- Break up the GROUP BY expression
> SELECT firstname, name, sum(cost) FROM tasks GROUP BY firstname, name;
Sam Smith 15
Sally Shuster 15
-- Missing grouping column
> SELECT name, firstname, sum(cost) FROM tasks GROUP BY name;
[MISSING_AGGREGATION] The expression "firstname" is neither present in the group by, nor is it an aggregate function.
-- Add the grouping column
> SELECT firstname, name, sum(cost) FROM tasks GROUP BY firstname, name;
Sam Smith 15
Sally Shuster 15
-- Missing aggregate
> SELECT firstname, name, sum(cost), task FROM tasks GROUP BY firstname, name;
[MISSING_AGGREGATION] The expression "task" is neither present in the group by, nor is it an aggregate function.
-- Add an aggregate
> SELECT firstname, name, sum(cost), array_agg(task) FROM tasks GROUP BY firstname, name;
Sam Smith 15 ["UNPIVOT","LATERAL"]
Sally Shuster 15 ["DELETE","GRANT"]
-- Return any task
> SELECT firstname, name, sum(cost), any_value(task) FROM tasks GROUP BY firstname, name;
Sam Smith 15 LATERAL
Sally Shuster 15 DELETE