Home     RSSRSS

Archives: Database

How to Identify Slow Running SQL Query in MYSQL 5.5x

October 1, 2012 by kiranbadi1991 | Comments Off on How to Identify Slow Running SQL Query in MYSQL 5.5x | Filed in Database, Development, Environment, Performance Engineering, Testing

From past couple of days I have also been playing around with MYSQL 5.5X Database doing some bit of writing queries, creating tables, indexes ,routines here and there for one of project. MYSQL database seems to be bit easy to understand and provides almost all similar features as provided by MSSQL or Oracle. (Of course there might be some difference in the ways they are designed or in the way they use SQL).

As soon as someone reports that application is slow or during test if we find slowness, the find thing we need to do is to identify cause of slowness (Most people don’t do this step, they become defensive, at times even I have exhibited this behavior,its humanly approach). There could be many ways to identify the cause of slowness and there could be many reasons for this slowness. However for the sake of this post let’s assume that we have identified the slowness as MySQL database and we have ruled out other causes for this slowness.

In order to identify the slow running MySQL query, one can run the below command in MySQL workbench or via MySQL client and see whats going on in the MySQL box,

Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.

mysql> show full processlist\G
*************************** 1. row ***************************
     Id: 1
   User: root
   Host: localhost:51606
    db: mydb
  Command: Sleep
   Time: 372
  State:
   Info: NULL
*************************** 2. row ***************************
     Id: 2
   User: root
   Host: localhost:51607
     db: mydb
Command: Query
   Time: 58
  State:Query
   Info: SELECT * FROM MYTABLE WHERE auto_id = 46102

 

As you can see from above that Select statement in itself is taking around 58 secs to execute.In addition to above,Show Process List command can also be used to get  insights as which threads are running in MySQL server and it is quite often used to debug connection issues.This link will provide more info about this command.

Once we know which SQL is taking more time, then the next task here is to replicate the issue outside the application using the same data and same statement but with using  MySQL client. Only when we are able replicate this issue outside application, we can say that issue is with SQL Query and not with any other elements of the environment or application.In almost all cases replication of issue happens successfully.(However do watch out for those smart and excellent communicator DBA, who often share the screen with businesses to show them that in spite of querying more rows of data, issue cannot be reproduced and query executes in fraction of eye blink,in such cases ensure that we use same set of data which is used in application during the time you saw slowness along with before and after row count for the table and also all condition remains the same.)

Moving on, once you are able to replicate the issue, the next step is to identify the Query plan generated by the query,in MySQL server, this can done  by using Explain Statement,

MySQL> EXPLAIN SELECT * FROM MYTABLE WHERE auto_id = 46102
           id: 1
  select_type: SIMPLE
        table: MYTABLE
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 47890
        Extra: Using where
 

In above query execution plan,any query that do not use an index signified by the key row above in the preceding output can be considered a poorly tuned SQL query. The number of rows read in evaluating this SQL statement,is as signified by the rows row,gives some indication to as how much data is read and can directly correlate to the amount of time required to execute the query. The type row with a value of ALL is also an indicator of a problem.

Adding the indexes to the table might help in these cases,but again it also depends a lot on the structure of the table, so before applying any fix to the situation, it makes more sense to understand the table structure and amount of the data the table holds,

Below command will give you the information about table structure,

SHOW CREATE TABLE ‘MYTABLE’ ;

The above statement will provide you the information about the table along with all column information.Once we understand the structure of the table it becomes quite easy to apply and try out various fixes.Below command will give you information about data length and other various table information

SHOW TABLE STATUS LIKE 'MYTABLE'

Both the above commands gives us very interesting information and this information can help in doing sizing of the databases along with capacity planning.

Once we have all these information, we can start working on applying fixes.Maybe after I fix some of my tables, I can write some more interesting things to do.

 

 

 
 

Tags: , , , ,

Short note on using Indexes- MS SQL 2005

August 18, 2012 by kiranbadi1991 | 1 Comment | Filed in Database, Development, Environment, Performance Engineering, Testing

Most often whenever Performance Engineer detects some issues  with regard to response time, the first thing the application team thinks as a fix is to check and add some more indexes to the columns in the table.Adding indexes is low hanging fruit which every development team tries and should try  before reviewing their code base for fix.I certainly agree and go with most development team on this.However these concepts are now misused and now people often try to add indexes as band aid than do real time fix for their insufficient design. I recently had one such case, and to everyone’s surprise it failed  and they had rework on their queries.Don’t get me wrong, designs often get outdated with technologies.They don’t stay in pace with growth of technologies.Today I may use Model 2 MVC, but 3 years down the line, Model 2 MVC will be outdated.However these  should not impact low level basics used in the design.If I am storing zip codes, then probably I should be using char or int with limited bytes range depending on zip code format.This according to me is low level basics and the same applies to the basics about indexes.

So this post is all about indexes and my thoughts on it.

Indexes helps in reducing IO when we use it efficiently.Indexes work in the similar manner the way indexes in the book works.Lets say if we have book with around 1000 pages in it and you need to find a definition of some single word and you know for sure that definition exists somewhere in the book but not sure which page,how are you going to find out ?.Just go to the index page of the book and then search of the word what you require and then go to that page number.No need to flip over thousand pages to get the information.That’s the power of index when correctly implemented.Lets assume what happens when there is no index on any column involving the query, MSSQL database engine has to read each row of the table for getting the information and this results in the full table scan.So whenever you have large amount of rows and full table scan happens, the first thing that gets impacted is the response time of the application and to some extent resource usage of the database server.Your Database box is going to get very busy even to fetch 1 row of record.

Indexes are categorized as  clustered and non clustered Indexes.Clustered indexes are the one which often gives the better results and is default index type whenever we create the primary key constraint on the table in MSSQL Database.Clustered indexes are often created first than non clustered indexes since row locators of the non clustered indexes can be pointed toward the index keys of the clustered index.Clustered indexes also helps in sorting of the data and in cases where large range of rows needs to be retrieved.Non clustered index is useful when we are retrieving a very small number of rows.

Then there are also certain situations where in adding the indexes in fact degrade the performance of the application.Indexes often adds overhead costs in terms of more memory consumption and can also impact response time of the applications.The Insert/update/Delete  often gets longer whenever we have indexes on the tables where these operations takes place.Lets take earlier example, If I am deleting 25 pages from 1000 pages book, then the order of the all the pages changes and this change can lead to serious response time issues depending on the frequency of the change.So this is reason as why we should be cautious in using indexes for use cases where in we do high number of these DML operation.

Some of the main factors to be considered while designing indexes are,

  • Where clause and Joins to be used in queries:If the queries returns less rows, better the benefits we get due to index.We have to keep in mind the whole purpose of Where clause is to get exact required information from the table and if Where clause column is already indexed and the benefits due to the index almost gets doubled.
  • Column Data types : Columns with data type char,varchar,nchar etc of String family gains less due to indexes.Less the number of bytes they occupy,more the gain we can have due to indexes.
  • Columns which has duplicate data in large numbers are bad choice for indexes.Boolean columns where in we store the flag status with only 2 options Y and N.The reason these are bad is that it defeats the uniqueness of the Where clause if used.Column which has large number of unique values gains most due to the indexes.

Information about the indexes and their associated costs can also be found out using Data management View (DMV’s) of MS SQL Servers.The DMV sys.dm_db_index_operational stats can help us to get  low-level activity, such as locks and I/O, on an index that is in use. The DMV sys.dm_db_index_usage_stats can give us the  counts of the various index operations that have occurred to an index over time.

These are the key factors to be considered while designing the indexes on the table.Indexes often helps in fixing performance issues but if the design of the table or query is bad, then adding indexes to it may worsen the problem than resolving it.

Tags: , , , , ,

Performance,Dirty Reads ,Lost Updates and Locks

May 25, 2012 by kiranbadi1991 | 1 Comment | Filed in Database, Development, Environment, Performance Engineering, Testing

Recently I came across a very interesting issue, which I feel I should write about it. During load testing of web based application, I came across an issue where in response time for couple of the business process was exceeding a lot compare to SLA’s given to us. Test was almost clean with less than 0.1% error with no failed transactions in it. Server resources were also looking good, no abnormal behavior of JVM/CPU/Memory etc. Still the transaction response time was very high for the couple of business processes.

Let’s call those business processes for the sake of this post as A, B, C. Business process A does adding (Insert) a new record to the database, Business B does editing the record in the same table (We call edit as Update in DB Language) and Process C does downloading the latest information which includes both Process A and Process B. We ran these 3 business processes for about an hour with 1 user each. All the data sets used in load testing for these 3 business processes were unique. These business processes were fairly simple to script and web navigation was also fairly easy. Our job was to test these business processes with 1 user each so as to achieve close to 100 transactions for each process in an hour. Pretty easy task and we ran lot many tests with these processes and in all these tests response time was almost same with very little variance. The results were like puzzle. Thankfully we had HP Dia installed in that environment and we had some instrumentation done for this application. So I thought let’s see what application threads are doing, while I am running the test for this application.HP Diagnostics has a very good feature of showing you live thread count along with thread trace as what each threads are doing at that point of time. So I took the thread dump of the all threads running in this JVM. Believe me, taking thread dump with HP Diagnostic is as easy as clicking the link.

Vow, for the first 10 mins, most of the thread were runnable state, and as slowly users ramped up, then few of the threads went into the waiting state and then few were oscillating between runnable and waiting state. Another 10 mins in the test run, I could see lot many threads in waiting state. Now after seeing threads in waiting state I can understood as why I was seeing a very high response time for the most simple business process which involves nothing but simple insert and update operation. Response time was high because threads were waiting for something and that something was nothing but DB Execute update calls.

Now next puzzle for me was why on earth these threads are waiting at DB Execute calls and that too for cool 10 secs and with so less load (I had close to 17 scripts with one user each per script). Now given the limited access we had on database, and after discussing my findings with application development team, we decided to engage DBA to assist us in finding out the root cause of this wait. At this point I must appreciate the honestly of the application developer who agreed to this idea of engaging the DBA. There are very rare cases where developers agree with performance engineers and that’s reason I must thank this guy.

So we engaged the DBA and ran couple of tests and he collected the database stats during the test. He did some analysis and identified couple of stored procedure and dynamic queries which was in need of tuning. In addition to this he came back and said that response time for these 3 business process were high because, row level locking was getting escalated to table level lock after certain duration of time and again after certain time, some transactions were getting dirty read and still after couple of minutes, table gets locked and does not allow any insert or update. Fair enough analysis.

Hmm I was happy that finally we know as why we were seeing high response time. Our next step was discussing this with key stakeholders about our findings. Lot many things were discussed in the meeting, but the key question which I liked in an hour’s meeting was that of DBA asking “Can this application tolerate the dirty reads “. There was long pause for some time in the meeting and finally after some time, this developer guy said this application cannot tolerate the dirty reads. Hmm there was pause again for about 30 secs, after another 20 secs, another voice said, these 3 transactions are executed less than 100 times in an hour and there exists absolutely very little chance that we see this concurrency in Production. There was again some silence and later on everyone moved on to next topic.

Now the reason as why I feel this is interesting case and deserves some writing on this topic is that we have dirty reads/lost updates/phantom reads that can be found out only by proper load testing and there exists a very high probability that if we ignore this we are sending out incorrect information to the application users in production under load and this incorrect information at times could be used by those users to sue the company back. These are typical cases where data integrity issues takes precedence to performance.So if you cannot redesign the queries, than I would suggest you to sacrifice the performance.These types of issues should become high priority fix at least in cases where we are dealing with financial applications. I also feel that at times these are critical bugs given that we have table level lock and this lock impacts all the users if one user locks the table for any reason.

Maybe later on I will write another short post on dirty read/lost updates. These are interesting cases where only proper load testing reproduces the issue and strong foundational skills of performance engineering helps to identify the root cause of the issue.

.

Tags: , , , , ,