Wednesday, August 12, 2009

Limit Use of Select Distinct

I read a great article on the use of select distinct and I really feel the need to add to that in my own musings on SQL. I work with some really intelligent people, but I still see the rampant use of select distinct with little or no explanation as to why they used it. Here are three questions you should ask yourself before you use distinct in a select statement:
  1. Have you seen all the data from the tables that you are pulling from?
  2. Have you exercised all other options to remove duplicate rows?
  3. If someone asked why you used distinct can you justify its use?
If your answer to any of these is "No", then you should not use it. Even if you are getting the right data you are leaving questions unanswered in the raw data. Why are there duplicates? Should they be accounted for? Is there an error in the system that is causing duplicate rows to be entered?

If you have no choice then you need to be able to have an answer to the "Why" in question 3. And "I needed to get rid of the duplicates." is not an answer. Clearly identify what steps you have taken to minimize the duplicates and, if possible, research the source data to determine how they could be minimized further. The more questions you have answered the better the development process will go.