Most often whenever Performance Engineer detects some issues with regard to response time, the first thing the application team thinks as a fix is to check and add some more indexes to the columns in the table.Adding indexes is low hanging fruit which every development team tries and should try before reviewing their code base for fix.I certainly agree and go with most development team on this.However these concepts are now misused and now people often try to add indexes as band aid than do real time fix for their insufficient design. I recently had one such case, and to everyone’s surprise it failed and they had rework on their queries.Don’t get me wrong, designs often get outdated with technologies.They don’t stay in pace with growth of technologies.Today I may use Model 2 MVC, but 3 years down the line, Model 2 MVC will be outdated.However these should not impact low level basics used in the design.If I am storing zip codes, then probably I should be using char or int with limited bytes range depending on zip code format.This according to me is low level basics and the same applies to the basics about indexes.
So this post is all about indexes and my thoughts on it.
Indexes helps in reducing IO when we use it efficiently.Indexes work in the similar manner the way indexes in the book works.Lets say if we have book with around 1000 pages in it and you need to find a definition of some single word and you know for sure that definition exists somewhere in the book but not sure which page,how are you going to find out ?.Just go to the index page of the book and then search of the word what you require and then go to that page number.No need to flip over thousand pages to get the information.That’s the power of index when correctly implemented.Lets assume what happens when there is no index on any column involving the query, MSSQL database engine has to read each row of the table for getting the information and this results in the full table scan.So whenever you have large amount of rows and full table scan happens, the first thing that gets impacted is the response time of the application and to some extent resource usage of the database server.Your Database box is going to get very busy even to fetch 1 row of record.
Indexes are categorized as clustered and non clustered Indexes.Clustered indexes are the one which often gives the better results and is default index type whenever we create the primary key constraint on the table in MSSQL Database.Clustered indexes are often created first than non clustered indexes since row locators of the non clustered indexes can be pointed toward the index keys of the clustered index.Clustered indexes also helps in sorting of the data and in cases where large range of rows needs to be retrieved.Non clustered index is useful when we are retrieving a very small number of rows.
Then there are also certain situations where in adding the indexes in fact degrade the performance of the application.Indexes often adds overhead costs in terms of more memory consumption and can also impact response time of the applications.The Insert/update/Delete often gets longer whenever we have indexes on the tables where these operations takes place.Lets take earlier example, If I am deleting 25 pages from 1000 pages book, then the order of the all the pages changes and this change can lead to serious response time issues depending on the frequency of the change.So this is reason as why we should be cautious in using indexes for use cases where in we do high number of these DML operation.
Some of the main factors to be considered while designing indexes are,
- Where clause and Joins to be used in queries:If the queries returns less rows, better the benefits we get due to index.We have to keep in mind the whole purpose of Where clause is to get exact required information from the table and if Where clause column is already indexed and the benefits due to the index almost gets doubled.
- Column Data types : Columns with data type char,varchar,nchar etc of String family gains less due to indexes.Less the number of bytes they occupy,more the gain we can have due to indexes.
- Columns which has duplicate data in large numbers are bad choice for indexes.Boolean columns where in we store the flag status with only 2 options Y and N.The reason these are bad is that it defeats the uniqueness of the Where clause if used.Column which has large number of unique values gains most due to the indexes.
Information about the indexes and their associated costs can also be found out using Data management View (DMV’s) of MS SQL Servers.The DMV sys.dm_db_index_operational stats can help us to get low-level activity, such as locks and I/O, on an index that is in use. The DMV sys.dm_db_index_usage_stats can give us the counts of the various index operations that have occurred to an index over time.
These are the key factors to be considered while designing the indexes on the table.Indexes often helps in fixing performance issues but if the design of the table or query is bad, then adding indexes to it may worsen the problem than resolving it.