Ramblings of a SQL Hillbilly

Or, the mad ravings of a data warehouse forklift operator

Temporary Query Items: Indexing and Statistics

Note: This is part of a series on Temporary Query Items.

In my previous posts, I’ve talked about what Temporary Query Items are and how scope helps to define the similarities and differences among the different TQIs. Before we can launch into an in-depth discussion of our first Temporary Query Item, though, we need to take a little time to talk about indexing and statistics.

Index? What, like the card?

Many people have made many comparisons of indexes in SQL Server to more familiar real-life objects. The obvious analog is to an index in a book, where you look up certain terms and the index points you to the page numbers in the book where the term can be found – here’s Technet. Others have compared indexes to entries in the phone book, like Brent Ozar. Mark Solomon suggests that non-clustered indexes are like miniature tables (SQL Server tables, not dining room tables.)

Analogies are nice and all, but what’s the reality? In essence, indexes do two things: Holding (and potentially organizing) data, and holding pointers to the physical location of data. Pointers, as you’ll recall from your hours of C coding earlier today, contain an address to a location in memory. In SQL Server indexing, a pointer contains the address of a row in the table.

Before we move on with that idea though, we should talk about the two types of indexes in SQL Server: Clustered and non-clustered. They’re both types of indexes, of course, but they tend to be defined by their differences more than their similarities.

Clustered

A clustered index is the technical term for an index that defines the physical structure of a table. You can have up to one clustered index per table – though technically, a table without a clustered index isn’t really a table, it’s a heap. When you have a clustered index, then the data in the table will be ordered on disk according to the definition of the index, which is a large part of why there can only be one clustered index.

Non-clustered

A non-clustered index does not enforce any ordering on the data in the table. Instead, each row within the non-clustered index contains a pointer to the original row in the clustered index. Any columns contained within the non-clustered index will be stored along with the pointer (so, those columns are essentially duplicated.) There can be up to 999 non-clustered indexes on a table (according to Denny Cherry that’s not a challenge), and non-clustered indexes can be applied even when there is no clustered index. When it comes to heaps, the pointer to the non-clustered index is replaced by a pointer to a RowID in the heap.

Okay, so why do we have them again?

Both kinds of indexes in SQL Server are there to help support queries. Without indexing, finding data in our tables would require scanning the whole table every time. With indexes, we can know exactly where to look for what data, and so we can be much more efficient. If you want more on that, Brent Ozar has a fantastic course that walks through how this works called How To Think Like SQL Server, which for $29 is a steal.

Statistics. What is this, baseball?

I love baseball statistics. Not because I like baseball, but because you really start to see what can happen when you try very hard to identify new outliers time after time. We’re about to the point that “Most pitches thrown left-handed by a right-handed pitcher against a switch-hitting batter during a September game at 50 degrees when the catcher’s name begins with M” will be a thing that is said and celebrated by some announcer somewhere. That is RIDICULOUSLY specific.

Statistics in SQL Server aren’t nearly that bad, though. Where indexes contain information about the intersection of values in columns with rows, statistics contain information about the distribution of values in columns. Statistics contain some information about cardinality and ranges of values as well. I’m a big fan of Erin Stellato’s Statistics Starters presentation for more information on statistics.

What does this have to do with Temporary Query Thingies?

The SQL Server optimizer uses indexes and statistics to help it make the right decisions about turning your queries into execution plans. This means that understanding how TQIs interact with indexes and statistics can help you make better decisions about which ones are right for you and what you may be doing at any given moment.

Moving On

We’re much closer to talking about Temporary Tables, our first Temporary Query Item, in depth. However, there’s one more topic I really need to cover before that – TempDB and Memory.