The other day I ran into an interesting problem with an SQL query, I needed to create the
count() of all items and the
count() of a specific set of items. Normally, I would use subqueries but there was so much data that the subqueries took a minute and that was not acceptable to our client (I actually wrote the query originally with a small amount of data and after 4 years the performance of it was horrible. :-)).
Read on to see how to fix this.
Let’s say we have a table (Sale_Lineitem) that tracks the items inside an individual Sale. The table has a column employeePurchase that’s a tinyint(1) that tracks if the transaction was an employee purchasing something for their own use.
We need to find how many total sales there were and how many total sales were employee purchases. As I’ve already said the subquery version of this runs into performance problems with large amounts of data (and a large number of joins inside the subquery). Below is the version that runs much faster.
select count(distinct SaleId) as TotalSales, count(distinct (case when employeePurchase > 0 then SaleId end)) as TotalEmployeeSales from Sale_Lineitem;
As an extra tip we’ve started some of our reports by first putting the data into a temporary tables when we were trying to combine a lot of different data or across multiple tables.