Thursday, January 28, 2016

Aggregation without "GROUP BY" using OVER clause.

Do you think it is not possible to aggregate values without using clause "GROUP BY"?

I'll show you how to do it in SQL 2012/2014

At first, will create a table for ten years of sales and fill it with values:

USE TestDB;
GO
CREATE TABLE tbl_No_Croup_By (
    Period_Dt DATE,
    Sales_Amount Money
);
GO
CREATE CLUSTERED INDEX CLIX_No_Croup_By ON tbl_No_Croup_By(Period_Dt);
GO
INSERT INTO tbl_No_Croup_By(Period_Dt, Sales_Amount)
SELECT DATEADD(day,CAST(NewID() AS VARBINARY(64)) % 1827,'2010-01-01')
    , CAST(ABS(CAST(NewID() AS VARBINARY(64)) % 100000)/100. AS MONEY)
FROM sys.messages;
GO
SELECT * FROM tbl_No_Croup_By;
GO

Now we have following aggregation request:
Report all sales aggregated by a year with a percentage of each year within whole data set;

Not a big deal. That how it would look like in classic scenario:
;WITH AnnualSales AS (
    SELECT YEAR(Period_Dt) AS YearPeriod,
        SUM(Sales_Amount) AS AnnualSale
    FROM tbl_No_Croup_By
    GROUP BY YEAR(Period_Dt)
), TotalSales AS ( SELECT SUM(AnnualSale) AS TotalSale FROM AnnualSales)
SELECT a.YearPeriod, a.AnnualSale
    , t.TotalSale     , CAST(ROUND(CAST(a.AnnualSale * 100. / t.TotalSale AS SMALLMONEY),2) AS VARCHAR) + '%' AS Year_Total
FROM AnnualSales AS a
INNER JOIN TotalSales AS t ON 1 = 1
ORDER BY a.YearPeriod;
Here is an approximate result:




And now will do the Magic
SELECT DISTINCT YEAR(Period_Dt) AS YearPeriod
    , SUM(Sales_Amount) OVER(PARTITION BY YEAR(Period_Dt)) AS AnnualSale
    , SUM(Sales_Amount) OVER(PARTITION BY 1) AS TotalSale
    , CAST(ROUND(CAST(SUM(Sales_Amount) 
        OVER(PARTITION BY YEAR(Period_Dt)) * 100./ SUM(Sales_Amount) 
        OVER(PARTITION BY 1) AS SMALLMONEY),2) AS VARCHAR) + '%' AS Year_Total
FROM tbl_No_Croup_By
ORDER BY YEAR(Period_Dt);
GO

The result is exactly the same, but code does not have any single "GROUP BY" and not so obvious.

Is it good?
Turn On statistics in your session by running two following commands:
SET STATISTICS IO ON
SET STATISTICS TIME ON

And then run both scripts together.
If after execution you look at Messages tab you can see following picture:


Wow!!!
"GROUP BY" uses 5 times less CPU power and almost 1000 times less IO system!!!

As a conclusion: You CAN use "OVER"clause to do the aggregation in three following cases:
1. When data set is extremely small and fits in just one 8 Kb page;
2. When you want to hide your logic from any future developer or even yourself to make debugging and troubleshooting a nightmare;
3. When you really want to kill your SQL Server and its underlying disk system;


No comments:

Post a Comment