The other day I ran into an interesting problem with an SQL query, I needed to create the count() of all items and the count() of a specific set of items. Normally, I would use subqueries but there was so much data that the subqueries took a minute and that was not acceptable to our client (I wrote the query originally with a small amount of data and after 4 years the performance of it was horrible).

Read on to see how to fix this.

Let’s say we have a table (Sale_Lineitem) that tracks the items inside an individual Sale. The table has a column employeePurchase that’s a tinyint(1) that tracks if the transaction was an employee purchasing something for their use.

We need to find how many total sales there were and how many total sales were employee purchases. As I’ve already said the subquery version of this runs into performance problems with large amounts of data (and a large number of joins inside the subquery). Below is the version that runs much faster.

  count(distinct SaleId) as TotalSales,
  count(distinct (case when employeePurchase > 0 then SaleId end)) as TotalEmployeeSales

As an extra tip we’ve started some of our reports by first putting the data into a temporary tables when we were trying to combine a lot of different data or across multiple tables.