Home     RSSRSS

Posts Tagged ‘MS SQL 2005’

Short note on using Indexes- MS SQL 2005

August 18, 2012 by kiranbadi1991 | 1 Comment | Filed in Database, Development, Environment, Performance Engineering, Testing

Most often whenever Performance Engineer detects some issues  with regard to response time, the first thing the application team thinks as a fix is to check and add some more indexes to the columns in the table.Adding indexes is low hanging fruit which every development team tries and should try  before reviewing their code base for fix.I certainly agree and go with most development team on this.However these concepts are now misused and now people often try to add indexes as band aid than do real time fix for their insufficient design. I recently had one such case, and to everyone’s surprise it failed  and they had rework on their queries.Don’t get me wrong, designs often get outdated with technologies.They don’t stay in pace with growth of technologies.Today I may use Model 2 MVC, but 3 years down the line, Model 2 MVC will be outdated.However these  should not impact low level basics used in the design.If I am storing zip codes, then probably I should be using char or int with limited bytes range depending on zip code format.This according to me is low level basics and the same applies to the basics about indexes.

So this post is all about indexes and my thoughts on it.

Indexes helps in reducing IO when we use it efficiently.Indexes work in the similar manner the way indexes in the book works.Lets say if we have book with around 1000 pages in it and you need to find a definition of some single word and you know for sure that definition exists somewhere in the book but not sure which page,how are you going to find out ?.Just go to the index page of the book and then search of the word what you require and then go to that page number.No need to flip over thousand pages to get the information.That’s the power of index when correctly implemented.Lets assume what happens when there is no index on any column involving the query, MSSQL database engine has to read each row of the table for getting the information and this results in the full table scan.So whenever you have large amount of rows and full table scan happens, the first thing that gets impacted is the response time of the application and to some extent resource usage of the database server.Your Database box is going to get very busy even to fetch 1 row of record.

Indexes are categorized as  clustered and non clustered Indexes.Clustered indexes are the one which often gives the better results and is default index type whenever we create the primary key constraint on the table in MSSQL Database.Clustered indexes are often created first than non clustered indexes since row locators of the non clustered indexes can be pointed toward the index keys of the clustered index.Clustered indexes also helps in sorting of the data and in cases where large range of rows needs to be retrieved.Non clustered index is useful when we are retrieving a very small number of rows.

Then there are also certain situations where in adding the indexes in fact degrade the performance of the application.Indexes often adds overhead costs in terms of more memory consumption and can also impact response time of the applications.The Insert/update/Delete  often gets longer whenever we have indexes on the tables where these operations takes place.Lets take earlier example, If I am deleting 25 pages from 1000 pages book, then the order of the all the pages changes and this change can lead to serious response time issues depending on the frequency of the change.So this is reason as why we should be cautious in using indexes for use cases where in we do high number of these DML operation.

Some of the main factors to be considered while designing indexes are,

  • Where clause and Joins to be used in queries:If the queries returns less rows, better the benefits we get due to index.We have to keep in mind the whole purpose of Where clause is to get exact required information from the table and if Where clause column is already indexed and the benefits due to the index almost gets doubled.
  • Column Data types : Columns with data type char,varchar,nchar etc of String family gains less due to indexes.Less the number of bytes they occupy,more the gain we can have due to indexes.
  • Columns which has duplicate data in large numbers are bad choice for indexes.Boolean columns where in we store the flag status with only 2 options Y and N.The reason these are bad is that it defeats the uniqueness of the Where clause if used.Column which has large number of unique values gains most due to the indexes.

Information about the indexes and their associated costs can also be found out using Data management View (DMV’s) of MS SQL Servers.The DMV sys.dm_db_index_operational stats can help us to get  low-level activity, such as locks and I/O, on an index that is in use. The DMV sys.dm_db_index_usage_stats can give us the  counts of the various index operations that have occurred to an index over time.

These are the key factors to be considered while designing the indexes on the table.Indexes often helps in fixing performance issues but if the design of the table or query is bad, then adding indexes to it may worsen the problem than resolving it.

Tags: , , , , ,

Performance,Dirty Reads ,Lost Updates and Locks

May 25, 2012 by kiranbadi1991 | 1 Comment | Filed in Database, Development, Environment, Performance Engineering, Testing

Recently I came across a very interesting issue, which I feel I should write about it. During load testing of web based application, I came across an issue where in response time for couple of the business process was exceeding a lot compare to SLA’s given to us. Test was almost clean with less than 0.1% error with no failed transactions in it. Server resources were also looking good, no abnormal behavior of JVM/CPU/Memory etc. Still the transaction response time was very high for the couple of business processes.

Let’s call those business processes for the sake of this post as A, B, C. Business process A does adding (Insert) a new record to the database, Business B does editing the record in the same table (We call edit as Update in DB Language) and Process C does downloading the latest information which includes both Process A and Process B. We ran these 3 business processes for about an hour with 1 user each. All the data sets used in load testing for these 3 business processes were unique. These business processes were fairly simple to script and web navigation was also fairly easy. Our job was to test these business processes with 1 user each so as to achieve close to 100 transactions for each process in an hour. Pretty easy task and we ran lot many tests with these processes and in all these tests response time was almost same with very little variance. The results were like puzzle. Thankfully we had HP Dia installed in that environment and we had some instrumentation done for this application. So I thought let’s see what application threads are doing, while I am running the test for this application.HP Diagnostics has a very good feature of showing you live thread count along with thread trace as what each threads are doing at that point of time. So I took the thread dump of the all threads running in this JVM. Believe me, taking thread dump with HP Diagnostic is as easy as clicking the link.

Vow, for the first 10 mins, most of the thread were runnable state, and as slowly users ramped up, then few of the threads went into the waiting state and then few were oscillating between runnable and waiting state. Another 10 mins in the test run, I could see lot many threads in waiting state. Now after seeing threads in waiting state I can understood as why I was seeing a very high response time for the most simple business process which involves nothing but simple insert and update operation. Response time was high because threads were waiting for something and that something was nothing but DB Execute update calls.

Now next puzzle for me was why on earth these threads are waiting at DB Execute calls and that too for cool 10 secs and with so less load (I had close to 17 scripts with one user each per script). Now given the limited access we had on database, and after discussing my findings with application development team, we decided to engage DBA to assist us in finding out the root cause of this wait. At this point I must appreciate the honestly of the application developer who agreed to this idea of engaging the DBA. There are very rare cases where developers agree with performance engineers and that’s reason I must thank this guy.

So we engaged the DBA and ran couple of tests and he collected the database stats during the test. He did some analysis and identified couple of stored procedure and dynamic queries which was in need of tuning. In addition to this he came back and said that response time for these 3 business process were high because, row level locking was getting escalated to table level lock after certain duration of time and again after certain time, some transactions were getting dirty read and still after couple of minutes, table gets locked and does not allow any insert or update. Fair enough analysis.

Hmm I was happy that finally we know as why we were seeing high response time. Our next step was discussing this with key stakeholders about our findings. Lot many things were discussed in the meeting, but the key question which I liked in an hour’s meeting was that of DBA asking “Can this application tolerate the dirty reads “. There was long pause for some time in the meeting and finally after some time, this developer guy said this application cannot tolerate the dirty reads. Hmm there was pause again for about 30 secs, after another 20 secs, another voice said, these 3 transactions are executed less than 100 times in an hour and there exists absolutely very little chance that we see this concurrency in Production. There was again some silence and later on everyone moved on to next topic.

Now the reason as why I feel this is interesting case and deserves some writing on this topic is that we have dirty reads/lost updates/phantom reads that can be found out only by proper load testing and there exists a very high probability that if we ignore this we are sending out incorrect information to the application users in production under load and this incorrect information at times could be used by those users to sue the company back. These are typical cases where data integrity issues takes precedence to performance.So if you cannot redesign the queries, than I would suggest you to sacrifice the performance.These types of issues should become high priority fix at least in cases where we are dealing with financial applications. I also feel that at times these are critical bugs given that we have table level lock and this lock impacts all the users if one user locks the table for any reason.

Maybe later on I will write another short post on dirty read/lost updates. These are interesting cases where only proper load testing reproduces the issue and strong foundational skills of performance engineering helps to identify the root cause of the issue.

.

Tags: , , , , ,

Know how shared is your server box and If Server runs with Default Settings

May 20, 2012 by kiranbadi1991 | Comments Off on Know how shared is your server box and If Server runs with Default Settings | Filed in Database, Development, Environment, Others, Performance Engineering, Testing

Working on the shared environments or clustered environment is quite a challenge and getting the best performance out of the servers requires that we understand as how servers are been designed or maintained in the infrastructure. If during the load testing, we are continuously seeing a very high response time which continuously exceeds the SLA’s given to us, then it’s time probably we should  start focusing on the shared environment aspects of the application.

Some of the question I often ask are as below,

  • Is the Database the only application running on the server box? If it’s sharing the server box with other enterprise applications like web server etc. , then probably it cannot perform optimally.
  • Does the Database memory/CPU/IO counters looks OK and they seems to align with hardware configuration of the server box.
  • Is the hardware of the server box like BIOS, Disk Controller etc. is up to date with latest patches and latest software. (I Know Microsoft Supports recommend this step always in case of any performance issue and I whole heartedly agree to this approach).
  • How often the virus scanning software is running in the server box. Quite often virus scanning software interrupts the smooth functioning of the database servers by interfering with database file system.Both are IO based application. It makes sense to exclude scanning of key database files to improve the performance of the database.
  • Does the database server have regular maintenance window, how often is database maintenance done. Maintenance activities could include applying patches/server reboots etc.

Having the answers to these questions often helps in eliminating the various roots cause for the slow performance of the application. Once we have eliminated these causes and have confirmed that environment or server box seems to good and still we are seeing slowness in the application, then probably it makes sense to drill down the details of the individual database server configuration,

Quite often it happens that most of the servers are installed with default settings. We all love to click next and next and next button while installing any enterprise software.This habit installing enterprise software later on starts giving nightmare to everyone once the application goes live.Every setting is factory made settings.So it times makes sense to drill and find out those default settings of the servers.

So let’s assume that we have MSSQL 2008 as the database server and we need to know the various default configuration setting that is in place for MSSQL 2008 box like what’s the memory /connection/processor setting etc.,

Assuming that you have SQL Management Studio installed, all you need to do is follow the below steps,

  • Connect to the Database server with Management Studio.
  • Using object explorer of studio, right click on the server node and select properties from the context menu.
  • Server property window will be visible to you. This window will have all the information with regard to connections/memory and various other settings that are in place for that database.

We should be able to see the something like below screen,

MSSQL2008_001

Another way of know the default setting is via system procedure sp_configure

SELECT * FROM sys.configurations

ORDER BY name ;

GO

or with

sp_configure

go

All these information will help us to correlate the high response time from the front end to the backend database and at times helps to scale application performance.When you run this procedure, it will show lot many information ,nearly 60+ settings that impacts the database.So in case if you feel any settings do not match with your application requirement then probably you need to talk to the DBA and explain him your reasoning as why that setting is impacting the performance of the applications.

For example, if the user connection value in database is set to 10 , and during your tests with 30 users, you see that response time are high, then probably highlighting this difference to the DBA might help assuming that at some point of time,it might happens that 10 connection might not be sufficient to serve the peak hour demands of the application.

Tags: , , , , ,