Friday, July 22, 2016

Dynamic PIVOT Query

Dynamic PIVOT Query

To make the above Static PIVOT query to dynamic, basically we have to remove the hardcoded PIVOT column names specified in the PIVOT operators PIVOT columns IN clause. Below query demonstrates this.
?
DECLARE @DynamicPivotQuery AS NVARCHAR(MAX)
DECLARE @ColumnName AS NVARCHAR(MAX)
 
--Get distinct values of the PIVOT Column
SELECT @ColumnName= ISNULL(@ColumnName + ',','')
       + QUOTENAME(Course)
FROM (SELECT DISTINCT Course FROM #CourseSales) AS Courses
 
--Prepare the PIVOT query using the dynamic
SET @DynamicPivotQuery =
  N'SELECT Year, ' + @ColumnName + '
    FROM #CourseSales
    PIVOT(SUM(Earning)
          FOR Course IN (' + @ColumnName + ')) AS PVTTable'
--Execute the Dynamic Pivot Query
EXEC sp_executesql @DynamicPivotQuery
[ALSO READ] How to replace NULL value by 0 in the Dynamic Pivot result – Sql Server
RESULT:

From the above result it is clear that this query is a True Dynamic PIVOT query as it reflected all the courses in the #CourseSales table without needing to write hardcoded course names in the PIVOT query.

Examples of PIVOT and Dynamic PIVOT

Below are the some of the examples of retrieving data in Sql Server using PIVOT and Dynamic PIVOT:

Dynamic PIVOT in Sql Server

Dynamic PIVOT in Sql Server


In the Previous Post PIVOT and UNPIVOT in Sql Server explained how PIVOT relational operator can be used to transform columns distinct values as Columns in the result set by mentioning all the distinct column values in the PIVOT operators PIVOT columns IN clause. This type of PIVOT query is called Static PIVOT query, because if the PIVOT column in the source table get’s extra unique values after the initial query then that will not reflect in the PIVOT query result unless it is mentioned in the PIVOT Columns IN clause. Static PIVOT queries are fine as long as we know that the PIVOT column values never change, for instance if PIVOT column values are MONTH or Day of the Week or hour of the day etc.
In this Article will present how we can write a Dynamic PIVOT query with an example, where we don’t need to mention the PIVOT columns each unique values and no need to worry if PIVOT column gets extra unique values after the initial query.
First Create a Temporary Table #CourseSales with sample records as depicted in the below image by using the following script:
Table to be Pivoted in Sql
?
--Create Temporary Table #CourseSales
CREATE TABLE #CourseSales
(Course VARCHAR(50),Year INT,Earning MONEY)
GO
--Populate Sample records
INSERT INTO #CourseSales VALUES('.NET',2012,10000)
INSERT INTO #CourseSales VALUES('Java',2012,20000)
INSERT INTO #CourseSales VALUES('.NET',2012,5000)
INSERT INTO #CourseSales VALUES('.NET',2013,48000)
INSERT INTO #CourseSales VALUES('Java',2013,30000)
GO

PIVOT #CourseSales Table data on the Course column Values

Let us first understand the Static PIVOT query and then see how we can modify this Static PIVOT query to Dynamic.

Static PIVOT query

Static PIVOT Query
Below Static PIVOT script pivots the #CourseSales Table data so that the Course columns distinct values are transformed as Columns in the result set as depicted in the above image.
?
SELECT *
FROM #CourseSales
PIVOT(SUM(Earning)
      FOR Course IN ([.NET], Java)) AS PVTTable
RESULT:
Static PIVOT Query In Sql Server
Let us insert one more row in the #CourseSales table for the new course SQL Server with below insert statement.
?
INSERT INTO #CourseSales VALUES('Sql Server',2013,15000)
Now rerun the above PIVOT query.
RESULT:
Static PIVOT Query In Sql Server
From the above result it is clear that the newly added course Sql Server sales data is not reflected in the result.

Dynamic PIVOT Query

To make the above Static PIVOT query to dynamic, basically we have to remove the hardcoded PIVOT column names specified in the PIVOT operators PIVOT columns IN clause. Below query demonstrates this.
?
DECLARE @DynamicPivotQuery AS NVARCHAR(MAX)
DECLARE @ColumnName AS NVARCHAR(MAX)
 
--Get distinct values of the PIVOT Column
SELECT @ColumnName= ISNULL(@ColumnName + ',','')
       + QUOTENAME(Course)
FROM (SELECT DISTINCT Course FROM #CourseSales) AS Courses
 
--Prepare the PIVOT query using the dynamic
SET @DynamicPivotQuery =
  N'SELECT Year, ' + @ColumnName + '
    FROM #CourseSales
    PIVOT(SUM(Earning)
          FOR Course IN (' + @ColumnName + ')) AS PVTTable'
--Execute the Dynamic Pivot Query
EXEC sp_executesql @DynamicPivotQuery
[ALSO READ] How to replace NULL value by 0 in the Dynamic Pivot result – Sql Server
RESULT:
Dynamic PIVOT Query in Sql Server
From the above result it is clear that this query is a True Dynamic PIVOT query as it reflected all the courses in the #CourseSales table without needing to write hardcoded course names in the PIVOT query.

Examples of PIVOT and Dynamic PIVOT

Below are the some of the examples of retrieving data in Sql Server using PIVOT and Dynamic PIVOT:

Wednesday, July 20, 2016

E. Using correlated subqueries

SELECT Examples (Transact-SQL)

 
THIS TOPIC APPLIES TO:yesSQL Server (starting with 2008)yesAzure SQL DatabasenoAzure SQL Data Warehouse noParallel Data Warehouse
This topic provides examples of using the SELECT statement.
The following example shows three code examples. This first code example returns all rows (no WHERE clause is specified) and all columns (using the *) from the Product table in the AdventureWorks2012 database.
USE AdventureWorks2012;
GO
SELECT *
FROM Production.Product
ORDER BY Name ASC;
-- Alternate way.
USE AdventureWorks2012;
GO
SELECT p.*
FROM Production.Product AS p
ORDER BY Name ASC;
GO
This example returns all rows (no WHERE clause is specified), and only a subset of the columns (NameProductNumberListPrice) from the Producttable in the AdventureWorks2012 database. Additionally, a column heading is added.
USE AdventureWorks2012;
GO
SELECT Name, ProductNumber, ListPrice AS Price
FROM Production.Product 
ORDER BY Name ASC;
GO
This example returns only the rows for Product that have a product line of R and that have days to manufacture that is less than 4.
USE AdventureWorks2012;
GO
SELECT Name, ProductNumber, ListPrice AS Price
FROM Production.Product 
WHERE ProductLine = 'R' 
AND DaysToManufacture < 4
ORDER BY Name ASC;
GO
The following examples return all rows from the Product table. The first example returns total sales and the discounts for each product. In the second example, the total revenue is calculated for each product.
USE AdventureWorks2012;
GO
SELECT p.Name AS ProductName, 
NonDiscountSales = (OrderQty * UnitPrice),
Discounts = ((OrderQty * UnitPrice) * UnitPriceDiscount)
FROM Production.Product AS p 
INNER JOIN Sales.SalesOrderDetail AS sod
ON p.ProductID = sod.ProductID 
ORDER BY ProductName DESC;
GO
This is the query that calculates the revenue for each product in each sales order.
USE AdventureWorks2012;
GO
SELECT 'Total income is', ((OrderQty * UnitPrice) * (1.0 - UnitPriceDiscount)), ' for ',
p.Name AS ProductName 
FROM Production.Product AS p 
INNER JOIN Sales.SalesOrderDetail AS sod
ON p.ProductID = sod.ProductID 
ORDER BY ProductName ASC;
GO
The following example uses DISTINCT to prevent the retrieval of duplicate titles.
USE AdventureWorks2012;
GO
SELECT DISTINCT JobTitle
FROM HumanResources.Employee
ORDER BY JobTitle;
GO
The following first example creates a temporary table named #Bicycles in tempdb.
USE tempdb;
GO
IF OBJECT_ID (N'#Bicycles',N'U') IS NOT NULL
DROP TABLE #Bicycles;
GO
SELECT * 
INTO #Bicycles
FROM AdventureWorks2012.Production.Product
WHERE ProductNumber LIKE 'BK%';
GO
This second example creates the permanent table NewProducts.
USE AdventureWorks2012;
GO
IF OBJECT_ID('dbo.NewProducts', 'U') IS NOT NULL
    DROP TABLE dbo.NewProducts;
GO
ALTER DATABASE AdventureWorks2012 SET RECOVERY BULK_LOGGED;
GO

SELECT * INTO dbo.NewProducts
FROM Production.Product
WHERE ListPrice > $25 
AND ListPrice < $100;
GO
ALTER DATABASE AdventureWorks2012 SET RECOVERY FULL;
GO
The following example shows queries that are semantically equivalent and illustrates the difference between using the EXISTS keyword and the INkeyword. Both are examples of a valid subquery that retrieves one instance of each product name for which the product model is a long sleeve logo jersey, and the ProductModelID numbers match between the Product and ProductModel tables.
USE AdventureWorks2012;
GO
SELECT DISTINCT Name
FROM Production.Product AS p 
WHERE EXISTS
    (SELECT *
     FROM Production.ProductModel AS pm 
     WHERE p.ProductModelID = pm.ProductModelID
           AND pm.Name LIKE 'Long-Sleeve Logo Jersey%');
GO

-- OR

USE AdventureWorks2012;
GO
SELECT DISTINCT Name
FROM Production.Product
WHERE ProductModelID IN
    (SELECT ProductModelID 
     FROM Production.ProductModel
     WHERE Name LIKE 'Long-Sleeve Logo Jersey%');
GO
The following example uses IN in a correlated, or repeating, subquery. This is a query that depends on the outer query for its values. The query is executed repeatedly, one time for each row that may be selected by the outer query. This query retrieves one instance of the first and last name of each employee for which the bonus in the SalesPerson table is 5000.00 and for which the employee identification numbers match in the Employee andSalesPerson tables.
USE AdventureWorks2012;
GO
SELECT DISTINCT p.LastName, p.FirstName 
FROM Person.Person AS p 
JOIN HumanResources.Employee AS e
    ON e.BusinessEntityID = p.BusinessEntityID WHERE 5000.00 IN
    (SELECT Bonus
     FROM Sales.SalesPerson AS sp
     WHERE e.BusinessEntityID = sp.BusinessEntityID);
GO
The previous subquery in this statement cannot be evaluated independently of the outer query. It requires a value for Employee.EmployeeID, but this value changes as the SQL Server Database Engine examines different rows in Employee.
A correlated subquery can also be used in the HAVING clause of an outer query. This example finds the product models for which the maximum list price is more than twice the average for the model.
USE AdventureWorks2012;
GO
SELECT p1.ProductModelID
FROM Production.Product AS p1
GROUP BY p1.ProductModelID
HAVING MAX(p1.ListPrice) >= ALL
    (SELECT AVG(p2.ListPrice)
     FROM Production.Product AS p2
     WHERE p1.ProductModelID = p2.ProductModelID);
GO
This example uses two correlated subqueries to find the names of employees who have sold a particular product.
USE AdventureWorks2012;
GO
SELECT DISTINCT pp.LastName, pp.FirstName 
FROM Person.Person pp JOIN HumanResources.Employee e
ON e.BusinessEntityID = pp.BusinessEntityID WHERE pp.BusinessEntityID IN 
(SELECT SalesPersonID 
FROM Sales.SalesOrderHeader
WHERE SalesOrderID IN 
(SELECT SalesOrderID 
FROM Sales.SalesOrderDetail
WHERE ProductID IN 
(SELECT ProductID 
FROM Production.Product p 
WHERE ProductNumber = 'BK-M68B-42')));
GO
The following example finds the total of each sales order in the database.
USE AdventureWorks2012;
GO
SELECT SalesOrderID, SUM(LineTotal) AS SubTotal
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
ORDER BY SalesOrderID;
GO
Because of the GROUP BY clause, only one row containing the sum of all sales is returned for each sales order.
The following example finds the average price and the sum of year-to-date sales, grouped by product ID and special offer ID.
USE AdventureWorks2012;
GO
SELECT ProductID, SpecialOfferID, AVG(UnitPrice) AS [Average Price], 
    SUM(LineTotal) AS SubTotal
FROM Sales.SalesOrderDetail
GROUP BY ProductID, SpecialOfferID
ORDER BY ProductID;
GO
The following example puts the results into groups after retrieving only the rows with list prices greater than $1000.
USE AdventureWorks2012;
GO
SELECT ProductModelID, AVG(ListPrice) AS [Average List Price]
FROM Production.Product
WHERE ListPrice > $1000
GROUP BY ProductModelID
ORDER BY ProductModelID;
GO
The following example groups by an expression. You can group by an expression if the expression does not include aggregate functions.
USE AdventureWorks2012;
GO
SELECT AVG(OrderQty) AS [Average Quantity], 
NonDiscountSales = (OrderQty * UnitPrice)
FROM Sales.SalesOrderDetail
GROUP BY (OrderQty * UnitPrice)
ORDER BY (OrderQty * UnitPrice) DESC;
GO
The following example finds the average price of each type of product and orders the results by average price.
USE AdventureWorks2012;
GO
SELECT ProductID, AVG(UnitPrice) AS [Average Price]
FROM Sales.SalesOrderDetail
WHERE OrderQty > 10
GROUP BY ProductID
ORDER BY AVG(UnitPrice);
GO
The first example that follows shows a HAVING clause with an aggregate function. It groups the rows in the SalesOrderDetail table by product ID and eliminates products whose average order quantities are five or less. The second example shows a HAVING clause without aggregate functions.
USE AdventureWorks2012;
GO
SELECT ProductID 
FROM Sales.SalesOrderDetail
GROUP BY ProductID
HAVING AVG(OrderQty) > 5
ORDER BY ProductID;
GO
This query uses the LIKE clause in the HAVING clause.
USE AdventureWorks2012 ;  
GO  
SELECT SalesOrderID, CarrierTrackingNumber   
FROM Sales.SalesOrderDetail  
GROUP BY SalesOrderID, CarrierTrackingNumber  
HAVING CarrierTrackingNumber LIKE '4BD%'  
ORDER BY SalesOrderID ;  
GO  

The following example shows using GROUP BYHAVINGWHERE, and ORDER BY clauses in one SELECT statement. It produces groups and summary values but does so after eliminating the products with prices over $25 and average order quantities under 5. It also organizes the results by ProductID.
USE AdventureWorks2012;
GO
SELECT ProductID 
FROM Sales.SalesOrderDetail
WHERE UnitPrice < 25.00
GROUP BY ProductID
HAVING AVG(OrderQty) > 5
ORDER BY ProductID;

What is a Correlated Subquery?

What is a Correlated Subquery?

A correlated subquery is a query that depends on another query. Another way to describe it is when a TSQL command is made up of more than one TSQL statement where the inner TSQL statement depends on the results of the outer TSQL statement. Because the inner query depends on the outer query the inner query is called a correlated subquery. When I say "depends on the outer query" I mean the inner query references a column or columns from the outer query. Because the inner query references columns from the outer query it also can't be execute as a stand-alone TSQL statement. The correlated subquery might be run many times. It is run once for every row returned from the outer query. To better understand what a correlated subquery is let me show you a couple of examples that use a correlated subquery in the WHERE and HAVING clause and how a correlated subquery can be used in an UPDATE statement.

Sample Database for Examples

Rather than build my own test data I'm going to use the AdventureWorks2012 database. If you want to run the examples in this article you can download this sample database from the following location:

Correlated Subquery in WHERE Clause

This first example will show you how a correlated subquery can be used within the WHERE constraint. Suppose I have a need to identify the customer that has the highest SubTotal amount per region, so I can reward them. To accomplish this customer identification I can run the following code:
USE AdventureWorks2012;
GO
SELECT SOH.CustomerID
, SOH.SalesOrderID
, TerritoryID
, TotalDue
FROM Sales.SalesOrderHeader AS SOH
WHERE SOH.SalesOrderNumber = (SELECT TOP 1 SalesOrderNumber
FROM Sales.SalesOrderHeader
WHERE TerritoryID = SOH.TerritoryId
ORDER BY TotalDue)
ORDER BY TerritoryID;
When I run this code I get the following output:
CustomerID SalesOrderID TerritoryID TotalDue
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
28094 53084 1 2.5305
30078 51782 2 1.5183
18315 60985 3 8.0444
11527 52371 4 2.5305
11533 57966 5 2.5305
11142 52682 6 2.5305
20142 52051 7 4.409
24561 52392 8 4.409
15640 51885 9 2.5305
29644 65214 10 3.0365
The part of the code that is the correlated subquery can be found on the right side of the WHERE constraint. Here is that subquery code:
SELECT TOP 1 SalesOrderNumber
FROM Sales.SalesOrderHeader
WHERE TerritoryID = SOH.TerritoryId
ORDER BY TotalDue
You can tell this is a correlated subquery because the SOH.TerritoryID column is referencing the SalesOrderHeader table from the outer query. Additionally if you run this code as is it will fail, whereas a normal subquery would be able to be run independently of the complete query.

Correlated Subquery in HAVING Clause

In this example I want to return all the CustomerID's for those customers that order more product in 2008 than they did in 2007. To do this I will use a correlated subquery in the HAVING clause. Here is the query that will identify those CustomerID's that bought more product in 2008:
SELECT CustomerID FROM Sales.SalesOrderHeader SOH
WHERE YEAR(OrderDate) = 2008
GROUP BY CustomerID
HAVING SUM(SubTotal) > (
SELECT SUM(SubTotal) FROM Sales.SalesOrderHeader
WHERE YEAR(OrderDate) = 2007
AND CustomerID = SOH.CustomerID
GROUP BY CustomerID)
Here you can see there are two different SELECT statements that make up this TSQL statement. This first SELECT statement is known as the outer query. The second SELECT statement, which is part of the HAVING clause, is known as the inner query. In the second SELECT statement I refer to the column SOH.CustomerID, which is referring to the CustomerID returned from the outer query that placed orders in 2008. Because the second SELECT statement referred to a column returned from the first SELECT statement is why this second SELECT statement is a correlated subquery. This correlated subquery returns the sum of the SubTotal column for 2007 orders for each CustomerID that was identified in the outer query, or another way to say it the correlated subquery runs for every customer that placed an order in 2008. This summed up SubTotal value of the 2007 orders for a specific customer's 2007 orders is then compare to the summed up SubTotal for the 2008 orders for the same customer. If the 2008 summed up SubTotal is more than the 2007 summed up SubTotal amount then that CustomerID is returned from this query.

Performing an Update Using a Correlated Subquery

You can even use a correlated subquery in an UPDATE statement. To demo this I will be using the following script to CREATE and populate my UpdateDemo table:
USE TempDB;
GO
-- Create demo table
CREATE TABLE UpdateDemo (
ID INT,
ParentID INT,
BirthDate DATE,
Sex Char(1),
NumOfChildren INT);
-- Populated table with Parent Record
INSERT INTO UpdateDemo VALUES(1,0,'10-18-1990','F',0);
-- Insert child Record
INSERT INTO UpdateDemo VALUES(2,1,'10-12-2013','M',0);
-- Display Results
SELECT * FROM UpdateDemo;
When I run this code my UpdateDemo table will be created and then populated with two records. Those two records look like this:
ID ParentID BirthDate Sex NumOfChildren
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --  -- -- -
1 0 1990-10-18 F 0
2 1 2013-10-12 M 0
Here you can see that I created two records in the UpdateDemo table. The design of this table is to track parents and the number of children they have. As you can see the NumOfChildren column is set to zero for both records in the UpdateDemo table. The zero value for the NumOfChildren on the row that has an ID value of 1 is not correctly populated because that parent record does have a child record. You can see this by looking at the ParentID column. Note that for the row with ID 2, the ParentID is set to 1. This is the child record for ID 1. In order to update the NumOfChildren column for ID 1 I will use the UPDATE statement in the following code, which contains a correlated subquery:
-- Update Table with Subquery
UPDATE UpdateDemo
SET NumOfChildren = (SELECT COUNT(*) FROM UpdateDemo Inner_UpdateDemo
WHERE Inner_UpdateDemo.ParentID = Outer_UpdateDemo.ID)
FROM UpdateDemo Outer_UpdateDemo;
-- Display Results
SELECT * FROM UpdateDemo;
When I run this code I get the following output returned from the SELECT statement:
ID ParentID BirthDate Sex NumOfChildren
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
1 0 1990-10-18 F 1
2 1 2013-10-12 M 0
By looking at the output of the above correlated subquery you can see that the NumOfChildren column is now set to 1. This was accomplished by the correlated UPDATE statement. If you look at the SET clause in the UPDATE statement above you can see that I counted the number of rows in the UpdateDemo table that meet the following WHERE constraint:
WHERE Inner_UpdateDemo.ParentID = Outer_UpdateDemo.ID
In this WHERE constraint you can see that on the right side of the equal sign I referenced a column with an alias name of Outer_UpdateDemo.ID. This column reference is what makes this SELECT statement in the SET clause a correlated subquery, because this column is referencing the table alias from the outer query. Also if you try to run this SELECT statement by itself it will fail with the following error:
Msg 4104, Level 16, State 1, Line 22
The multi-part identifier "Outer_UpdateDemo.ID" could not be bound.
This is error a dead giveaway that this is a correlated subquery.
The following code tests out the update statement again to make sure it still works after I add more row to my UpdateDemo table by running the following code:
-- INSERT Another Child arecord
INSERT INTO UpdateDemo VALUES(2,1,'7-1-2015','F',0);
-- Update Table Again
UPDATE UpdateDemo
SET NumOfChildren = (SELECT COUNT(*) FROM UpdateDemo
Inner_UpdateDemo
WHERE Inner_UpdateDemo.ParentID = Outer_UpdateDemo.ID)
FROM UpdateDemo Outer_UpdateDemo;
-- Display Results
SELECT * FROM UpdateDemo;
When I run this code I get the following output:
ID ParentID BirthDate Sex NumOfChildren
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
1 0 1990-10-18 F 2
2 1 2013-10-12 M 0
3 1 2015-07-01 F 0
Now you can see the NumOfChildren column for the row with an ID value of 1 is equal to 2.
This example only showed using a correlated subquery in an UPDATE statement. You can also us correlated subqueries in the INSERT, and DELETE statements as well.

Performance Issues with Correlated Subqueries

A correlated subquery is executed for each candidate row from the outer query. Because of this the correlated subquery will be executed over, and over again. If an outer query returns a large result set the correlated subquery might not scale well. When a correlated subquery runs slow a different solution will be necessary, such as an INNER JOIN, or constraining your outer query to return fewer rows.

Conclusion

As you can see correlated subqueries are a great way to relate two different sets of records to identify the final set. By using a correlated subquery you can filter out rows in the outer query based on the results of the inner query. Next time you have a need to restrict rows in one result set by records that can be related to another set, then consider whether using a correlated subquery will get you the desired final set you are looking for.