Home     RSSRSS

Archives: Load Balancer

Importance of IIS HTTP ERR logs in Performance Engineering

February 18, 2012 by kiranbadi1991 | Comments Off on Importance of IIS HTTP ERR logs in Performance Engineering | Filed in Development, Load Balancer, LoadRunner, Performance Engineering, SilkPerformer

One among many skills required to be good Performance Engineer is to know, understand, interpret and analyze the logs of the backend servers and trace back those events to the load test executed during that known time period. Many times it so happens that in spite of having perfect script, perfect workload model, response times of the tests do not meet the service level agreements or they are just beyond the range as expected by the business owners or application developers. There could be many reasons for this, but to pin point the exact cause and come up with conclusive data, it becomes extremely necessary that we understand importance of logging and analyzing the server logs for the backend servers. So in this post I would be focusing my effort in explaining the importance of HTTP ERR logs which is the part of IIS logging.

Since IIS handles applications which are HTTP based, errors originating from the HTTP API of the IIS are logged in HTTP ERR logs. The most common types of errors logged here are network drop connections, connection dropped by the application pool, Service Unavailable errors, bad client request etc. Best part of this log is that it gives you the precise information as which URL or application pool is responsible for incorrect behavior, something which is very valuable in case if you have 10s or 100’s of sites sitting on the same IIS box or sharing the same application pool. All these information one can easily correlate with load test results to build a strong case that the application under test has some issues which needs some attention from the concerned team and may possibly have performance impact.

HttpErr logs comes with standard fields as shown below.It has all the information that requires for troubleshooting any fatal issue that occurs during performance testing. One thing I would like to mention here is that in case if you have IIS server sitting behind the  load balancers, then it might happen that the IP what you see in these logs might belong to the load balancer rather than end client users.Again this could be due to your infrastructure/environment design.

Field

Description

Date

The Date field follows the W3C format. This field is based on Coordinated Universal Time (UTC). The Date field is always ten characters in the form of YYYY-MM-DD. For example, May 1, 2003 is expressed as 2003-05-01.

Time

The Time field follows the W3C format. This field is based on UTC. The time field is always eight characters in the form of MM: HH: SS. For example, 5:30 PM (UTC) is expressed as 17:30:00.

Client IP Address

The IP address of the affected client. The value in this field can be either an IPv4 address or an IPv6 address. If the client IP address is an IPv6 address, the ScopeId field is also included in the address.

Client Port

The port number for the affected client.

Server IP Address

The IP address of the affected server. The value in this field can be either an IPv4 address or an IPv6 address. If the server IP address is an IPv6 address, the ScopeId field is also included in the address.

Server Port

The port number of the affected server.

Protocol Version

The version of the protocol that is being used.
If the connection has not been parsed sufficiently to determine the protocol version, a hyphen (0x002D) is used as a placeholder for the empty field.
If either the major version number or the minor version number that is parsed is greater than or equal to 10, the version is logged as HTTP/?

Verb

The verb state that the last request that is parsed passes. Unknown verbs are included, but any verb that is more than 255 bytes is truncated to this length. If a verb is not available, a hyphen (0x002D) is used as a placeholder for the empty field.

CookedURL + Query

The URL and any query that is associated with it are logged as one field that is separated by a question mark (0x3F). This field is truncated at its length limit of 4096 bytes.
If this URL has been parsed (“cooked”), it is logged with local code page conversion, and is treated as a Unicode field.
If this URL has not been parsed (“cooked”) at the time of logging, it is copied exactly, without any Unicode conversion.
If the HTTP API cannot parse this URL, a hyphen (0x002D) is used as a placeholder for the empty field.

Protocol Status

The protocol status cannot be greater than 999.
If the protocol status of the response to a request is available, it is logged in this field.
If the protocol status is not available, a hyphen (0x002D) is used as a placeholder for the empty field.

Site ID

Not used in this version of the HTTP API. A placeholder hyphen (0x002D) always appears in this field.

Reason Phrase

This field contains a string that identifies the type of error that is being logged. This field is never left empty.

Queue Name

This the request queue name.

Another good thing about HTTP ERR logs is that they can be queried exactly the way we query IIS Access logs by using log parser.

Below is one such example query, we can use to query HTTP ERR logs is as below,

LogParser‘SELECT TO_STRING(date,'YYYY-MM-DD'),TO_STRING
(time,'hh:mm:ss'),c-ip,c-port,s-ip,s-port,cs-version,cs-method,
cs-uri,sc-status,s-siteid,s-reason,s-queuename
 From \\YOUR PATH TO HTTP ERROR LOG’

Most often the HTTPERR logs are written in SYSTEMROOT\SYSTEM32\Log Files\HTTPERR Folder of the IIS box, however since this location is configurable  so at times we might need to confirm the exact location with Server administrators.Also in some rare cases , these logs might also have been disabled due to various reasons, so in those cases we might need to check this with Server administrators and get those enabled though I feel no one really disables it for any reason.

Once you query these logs with log parser you should be able to get the details as shown below.

Httperror1

As you can see the wealth of the information lies in the s-reason codes which helps us in identifying the issues with the application and aids a lot in troubleshooting some of the hardest performance bottlenecks.

Below is the Microsoft definition of the reason codes that comes as part of HTTP ERR logs,

AppOffline: A service unavailable error occurred (an HTTP error 503). The service is not available because application errors caused the application to be taken offline.

AppPoolTimer: A service unavailable error occurred (an HTTP error 503). The service is not available because the application pool process is too busy to handle the request.

AppShutdown : A service unavailable error occurred (an HTTP error 503). The service is not available because the application shut down automatically in response to administrator policy.

Bad Request : A parse error occurred while processing a request.

Client_Reset : The connection between the client and the server was closed before the request could be assigned to a worker process. The most common cause of this behavior is that the client prematurely closes its connection to the server.

Connection_Abandoned_By_AppPool : A worker process from the application pool has quit unexpectedly or orphaned a pending request by closing its handle.

Connection_Abandoned_By_ReqQueue : A worker process from the application pool has quit unexpectedly or orphaned a pending request by closing its handle. Specific to Windows Vista and Windows Server 2008.

Connection_Dropped : The connection between the client and the server was closed before the server could send its final response packet. The most common cause of this behavior is that the client prematurely closes its connection to the server.

ConnLimit : A service unavailable error occurred (an HTTP error 503). The service is not available because the site level connection limit has been reached or exceeded.

Connections_Refused : The kernel NonPagedPool memory has dropped below 20MB and http.sys has stopped receiving new connections.

Disabled : A service unavailable error occurred (an HTTP error 503). The service is not available because an administrator has taken the application offline.

EntityTooLarge : An entity exceeded the maximum size that is permitted.

FieldLength : A field length limit was exceeded.

Forbidden : A forbidden element or sequence was encountered while parsing.

Header : A parse error occurred in a header.

Hostname : A parse error occurred while processing a Hostname.

Internal : An internal server error occurred (an HTTP error 500).

Invalid_CR/LF : An illegal carriage return or line feed occurred.

N/A : A service unavailable error occurred (an HTTP error 503). The service is not available because an internal error (such as a memory allocation failure) occurred.

N/I : A not-implemented error occurred (an HTTP error 501), or a service unavailable error occurred (an HTTP error 503) because of an unknown transfer encoding.

Number : A parse error occurred while processing a number.

Precondition : A required precondition was missing.

QueueFull : A service unavailable error occurred (an HTTP error 503). The service is not available because the application request queue is full.

RequestLength : A request length limit was exceeded.

Timer_AppPool:The connection expired because a request waited too long in an application pool queue for a server application to de queue and process it. This timeout duration is ConnectionTimeout. By default, this value is set to two minutes.

Timer_Connection Idle :The connection expired and remains idle. The default ConnectionTimeout duration is two minutes.

Timer_EntityBody :The connection expired before the request entity body arrived. When it is clear that a request has an entity body, the HTTP API turns on the Timer_EntityBody timer. Initially, the limit of this timer is set to the ConnectionTimeout value (typically 2 minutes). Each time another data indication is received on this request, the HTTP API resets the timer to give the connection two more minutes (or whatever is specified in ConnectionTimeout).

Timer_HeaderWait :The connection expired because the header parsing for a request took more time than the default limit of two minutes.

Timer_MinBytesPerSecond :The connection expired because the client was not receiving a response at a reasonable speed. The response send rate was slower than the default of 240 bytes/sec. This can be controlled with the MinFileBytesPerSec metabase property.

Timer_ReqQueue : The connection expired because a request waited too long in an application pool queue for a server application to dequeue. This timeout duration is ConnectionTimeout. By default, this value is set to two minutes. Specific to Windows Vista and Windows Server 2008.

Timer_Response : Reserved. Not currently used.

URL : A parse error occurred while processing a URL.

URL_Length : A URL exceeded the maximum permitted size.

Verb : A parse error occurred while processing a verb.

Version_N/S : A version-not-supported error occurred (an HTTP error 505).

So next time if you are load testing any .Net Web Application installed on IIS Web Server, make sure that you ask for access to IIS HTTP ERR logs as well in addition to the IIS Access logs.It will help you identifying issues in the fraction of seconds compared to any other methods used quite often.

Tags: , ,

Performance Testing and Load Balancing

November 19, 2011 by kiranbadi1991 | 8 Comments | Filed in Environment, IP Spoofing, Load Balancer, Others, Performance Engineering, Testing

In my last post, I mentioned that performance Engineer needs to have basic understanding as how load balancers work, its required skill for any performance engineer who is working in big enterprise environment, as without understanding this,your performance tests results are useless as response time also depends on how traffic is routed to the backend servers.So in this post, I am going to try and explain as how I make sure that my tests also stress the load balancer in the meaning full way as close to the production behavior.

Load balancers in majority of  the cases identify clients with either a source IP or by setting the persistent cookie header to the client request.Source IP is the IP of the client who is using the applications.In the corporate environment, most load balancers uses the source IP to identify the client and route the requests because most of the environment infrastructure is known to all the concerned teams and revealing IP Address is not considered as the risk.Load Balancers reads the clients IP and based on the load balancing algorithm forwards the requests to the appropriate backend servers.We can also have source IP Stickiness to the instance of the servers  also referred as server affinity.

Load balancers depending on the algorithm and requirements of the applications, can also implement the cookie based solutions. Load balancers in this case  adds the cookies to the client requests before routing it to the appropriate backend servers.So each request going back and forth will be carrying this cookie with it.So as long as cookies are present in the requests, connections of the clients are maintained to the respective instances of the backend servers.Cookie based implementation also helps in maintaining the connection balance across the servers in the backend in case for  some reasons source IP based persistence fails.

I feel these are two most commonly used ways for implementing load balancing solutions and almost all load balancing vendors support these solutions.

Now the question here is  how should we test Load Balancers during Performance testing ? How do we ensure that Load balancer is working the way it should work ?

In case if you do not know anything about Load Balancers, then this might be tricky stuff for you ,but if you know algorithm and how Load Balancer is implementing its algorithm, then this might be a cake walk for you.

Below are some of the steps  I recommend one should perform during performance testing whenever you have Load Balancers in your environments,

1. Identify how your Load Balancer is implementing the Load Balancing.How does it spread the requests across backend farm of servers.Understand the algorithm and its implementing.

2.IP Spoofing is required.Sometimes back, I had written about IP Spoofing here. Feel free to spend some time reading it.We need as many IP as the number of users we are testing for depending on the algorithm of the Load balancer.If Algorithm is using Source IP, then it will route the requests based on Source IP.

3. Once IP Spoofing is enabled.Scripts should and must output the IP Address of the Vuser.I am aware most load testing tools provide you a interface where in you can out put the IP address of the user.Below is the piece of code which can be used in LoadRunner,

char *ip;
ip = lr_get_vuser_ip();
if (ip)
lr_output_message(“The IP address is %s”, ip);
else
lr_output_message(“IP spoofing disabled”);

4. If the load balancing is done via cookies, make sure you identify  and not comment out the cookies set by the Load Balancers,some tools store persistent cookies in memory, this don’t show up in your scripts,they are handled automatically by the tool.LoadRunner falls in this category of tools which handles persistent cookies automatically.You need to use advanced logging to see those cookies going back and forth.If possible log the cookies to the log along with timestamp, this will help you to troubleshoot issues in complex and shared environments.

5. I often make sure that I capture run time stats  as where the client request is routed to in the back end farm.I do this by using this technique. Feel free to spend sometime reading that as well. All you got to do is correlate the headers values which displays the server information and write it to the log with timestamp.This will really help you in runtime debugging .

6. If your sole objective is to test Load Balancer , make sure you run only single script and not multiple scripts, as this will rule out the connection imbalance issues normally called as network bottlenecks.Not all connection exit gracefully across all layers at TCP Stack.

Sometimes  I have seen lot of folks using webserver logs and retrieving number of hits from there for the duration of the test run as measurable way of testing load balancers, I agree that in some cases its correct and sometimes it could be wrong in case your application uses some kind of caching mechanism,we need to remember always that our runtime settings for the scenario has a strong  influence on the hits per second.In most of the applications, static and non resource contents like images/js/css files are cached.If you are using webservers logs to analyze load balancers work, then make sure you do comparison only for POST requests and not for GET requests.Here is reasons as why POST requests makes sense.

Once you have all these stats, it will make your job extremely easy to identify and say whether load balancer is doing its job correctly or not.

Hope this helps.

Technorati Tags: ,

Tags:

Basics of LoadBalancing for Performance Engineer

October 22, 2011 by kiranbadi1991 | 1 Comment | Filed in Development, IP Spoofing, Load Balancer, Performance Engineering

One of the skillsets required to be successful performance engineer is knowing and understanding as how Load balancing works and what are the most frequent algorithms applications use and how these algorithms can impact or mitigate the performance issues and bring about the good user experiences.

We do Load balancing to spread the incoming requests across various backend servers which is also called as server farm so as to optimize and provide good user experience.Often,single machine is not sufficient to service the growing business requirement.We need lot of servers in the backend to service the growing requests coming from the clients.

There are different ways by which load balancing can be done, we can have hardware based load balancing,Network based Load Balancing and also software  based load balancing.Hardware based load balancing is mostly used by cloud service providers as requests to routed to the servers based on various network factors.For this discussion, I would like to talk more on the software based algorithms here used by the load balancers.Some of the commonly used load balancing ways are

  • Round Robin : In Round Robin style of Load Balancing,requests are routed to the backend servers based on round robin fashion,first request goes to server 1, then request goes to server 2…This pattern is followed till the request reaches the last server on the farm.Round Robin normally gives a good connection balance across the servers and this is one of most commonly used algorithm used.
  • Least Connection: In Least connection style, requests are routed to the servers which has got least number of connections to present at that point of time.If the server is overloaded , then connections or requests are not send to that server.Least connection is useful whenever we have resource contentions and whenever we need extract a full value of the box.Normally the application needs to tuned appropriately so that servers can be utilized to the max extent.
  • Weighted Connection: This is hybrid process which  can use the techniques of round robin and Least connections.Here the certain amount of points or percentages are allocated to individual server boxes in the backend and load balancer directs the traffic according to the weighted rule to the backend systems.Weighted connections are normally beneficial in the shared environments where applications share the resources.

In addition to above three methods, there might also be vendor or situation specific algorithms which can used.Most Load Balancer vendors provide a support where in load balancing can be done via custom implementation by writing some rules.So it makes sense for the performance Engineer to clarify algorithm rather than assuming it.Server Affinity/Persistence/Stickiness normally goes hand in hand with Load Balancer but however they are two different topics to discuss.

In order to select or suggest the correct algorithms, one also needs to have complete understanding of the platforms which is going to be hosted on the servers.Applications storing sessions in server process or storing in cookies might require persistence  to the instance of the server which is serving the requests,so in this either of these 3 methods might not be sufficient, we might needs a extra stuff to maintain a persistence.Applications storing the sessions in the databases can use any of the these of methods but might have some performance impact, as it might need to do some round trips to retrieve and validate sessions.

Hope this helps.

Tags: , , ,