Home     RSSRSS

Archives: SilkPerformer

Transaction Response time–First Indicator of Application Performance

February 14, 2014 by kiranbadi1991 | Comments Off on Transaction Response time–First Indicator of Application Performance | Filed in Performance Center, Performance Engineering, Performance Test Tools, Quality, SilkPerformer, Small Businesses, Web Server

What do you mean by Transaction ?

What do you mean by Transaction Response time ?

Why is the Transaction Response time a key Performance Indicator for the application or system ?

Lot of young people who wants to become Performance Engineer asks me about these types of questions. Seriously I believe Transaction is often associated with the context.For Database DBA it could be something like commit the output of the SQL statement to the DB or rollback the entire output of the SQL Statement and bringing the DBA to its initial state due to some failures.For Developer, it could be related to 1 business requirement equal to 1 transaction and to the banker or domain specialist, it could be one entry in the ledger log.For the prospective of Performance Engineer, it could means any of the 3. It just depends on what is your objective of your tests or what are we looking for measure in the system or application.

The way Performance Engineer sees the transaction is bit different than the way Developer sees them.Some of the examples of the Transaction as seen by Performance Engineer are,

  • Time taken to do the login of the application
  • Time taken to generate and publish the report to the users.
  • Time taken to load the data to the database or Time taken to process xx amount of data and load it into the database(Batch Jobs).

If you look closely at above example, you can see that transaction is always associated with Time. Time taken to do X activity is prime focus of the Performance Engineer.The same might not be true for other folks like DBA,Domain experts or Developers.

One of the reasons as why Performance Engineer always associates time to transaction is because most performance tools have taught us to see transaction this way.Wrap the activity/event/Business functionality between start and end transaction markers and calculate the difference between these start and end time and say that these are our transaction response time for that activity.Most Load testing tools works this way and this logic works for almost 99.9999% of the application.However this kind of logic does not gel well with few of the technologies where non blocking of UI is more preferable than blocking of the UI. Comet/Push are some of the technologies where Marker based transactions do not work favorably unless you do some kind of the hacking.So I believe that Transaction marker based solutions work only for technologies where users waits for the response to come back.

Transaction Response time is most important characteristic of the System Performance from the Users point of view.If they are satisfied with the speed of the system to do more work in short time, then probably no effort is required to speed up the system, else lot of effort is required to increase the speed. Sometimes users are also satisfied with bit of high response time because they know the there is lot analytical calculation program/application has to do to bring data back to them.However when as more and more users starts getting into the system , they start experiencing slowness of the application.Application starts to more time to respond to the users request and the functionality which was taking 3 to 5 sec now starts to give response in 5 to 8 sec.As the user load increases, response time also starts to increase and after a certain point, it fails to respond to the users request.Performance Engineers calls this behavior a breaking point of the application.The reason for the application to stop responding could be many reasons ,however from the performance engineering’s prospective,its suggested that we do bit of analysis based on queuing theory.Rate at which application is receiving requests is more than the rate at which it can serve the requests keeping in mind the environmental constraints.Response time degradation can  happen not only when the number of users or volume of data exceeds some thresholds, but also after the application is up and running for a particular period. A period can be as short as hours or as long as days or weeks. That is often an indication that
some system resources are not released after transactions are completed
and there is a scarcity of remaining resources to allocate to new transactions.
This is called  resource leak  and is  usually due to programming/coding defects.

We say application is having performance issues after analyzing the trend of transaction and can conclude the below points based on data we have collected

    • Transaction response time is getting longer as more users are actively
      working with the system
    • Transaction response time is getting longer as more data are loaded
      into application databases.
    • Transaction response time is getting longer over time even for the same
      number of users or data volume primarily due to leaking of resources like memory,connections etc..

Transaction response time matters a lot when your application is web based application and its public facing site. The revenue of the company depends on how fast the application responds to the user’s request. If your application is internal , then it increases the productivity of the team and company over all.

Tags: , ,

Some Good,Not so Good,Bad things about Logging and its impact on Application Performance.

February 15, 2013 by kiranbadi1991 | Comments Off on Some Good,Not so Good,Bad things about Logging and its impact on Application Performance. | Filed in Development, Environment, Performance Engineering, Quality, SilkPerformer

Over the years working in projects and after analyzing logs year after year I have seen one trend which is been frequently repeated by developer or development teams, they just don’t log things about application what they need to log. Since they don’t log information correctly, all activities further downstream become painful and subsequently lead to not so memorable fix ever for a smaller issue.

Now why I am giving so much importance to logging? Probably I love logs and probably I know something about importance of logging and therefore maybe I can share with you as why logging is important, what things we need to log, what logging levels are, how they help project teams and when they become bottleneck.

Log4j is one of most frequently used logging libraries out there and it’s also one of the most robust libraries used for logging. Log4j can be configured via properties file and this is good approach for setting up logging. With log4j, we can log information with quite a few levels of granularity, namely debug level, info level, warn level, error level, and fatal level, all and trace level. Below is my high level understanding of these levels,

  • Debug levels are used when we need to debug the issues which are minor in nature and are occurring frequently.
  • Info level is used when we need to log events which are significant in nature and are important events in the application life cycle such as initialization of JNDI Resource, Data source etc. Logging all the significant events helps get to information without running into the code. We can also track the progress of the application at granular level.
  • Warn Level is used to log minor problems which are more of warnings and these are ones which at times are external in nature. They are used to log potentially harmful events. Example logging the input parameters which are not acceptable to the program.
  • Error Level is used to log exception details or errors which break the functionality of the application. Normally errors do not crash the application, however it might break the subset of functionality which generates the error.
  • Fatal levels are used to log the fatal errors .Fatal errors are generally related to crash of the application or some components of the application. With fatal errors, applications normally abort working.
  • Trace levels are used to trace fine level informational events.
  • All level is used to log all the information events. It turns on full logging.

One can choose any of the above levels to log information depending on the priority and severity of the issue. One can also customize and extend the logger class to create the custom level if required. But I feel that would be an extra work and definitely not required since existing levels will provide the information as what we require to confirm the issue.

Logging helps in many ways. Unless we log we will never know as what exactly our code is doing. We will never know the path our code takes to implement the functionality. Logging the information also helps to understand and troubleshoot issues after application goes live. In fact job of the many technical product support people who are not so good in technical aspect of software rely just on logs and provide solutions to the issues encountered by the users. Logs also help in User data mining. Google/Facebook/Amazon extensively logs the information about its users and then uses that information to understand the user’s behavior and then come up with some functionality that delights its users. In short logging also helps to grow the business.

However there is certain information which should not get into the logs, like,

  • User sensitive information like password, user ids etc.
  • Any personal information that can be used to identify and trace the user and has potential do harm to user in case it falls in wrong hands.
  • Any information which is financial in nature like bank accounts, credit cards details etc.
  • Logs should not also contain information about infrastructure on which application is running. They should only contain information which techy in nature.

Now let’s talk about some good ways and things that should get into the logs,

  • Logging should always include the source of the event, namely class, file or package names that generate the event.
  • Logging should always be categorized as info, error or debug as per your application requirements.
  • Logs should be rotated always once their reach defined file size or preferably on daily basis.
  • Logs should contain timestamps preferably server time stamps.
  • Logs should be human readable and parse able so that information can be extracted from it easily.
  • Logs should be stored preferably on local machines.
  • Logging should contain application start/stop details, significant events related to application life cycle etc.
  • We should also log details of events like how much time it took to connect to database, how much time it took to execute query, how big was the result set etc.

Each of the information in the logs should always contains details as who (Class), when (timestamp), where (Which part of code), what (what action did code do) and finally output that gets generated for the action. Some of the folks I know who built codebase that processed millions of requests in an hour had used logging extensively to fine tune to their code base. Once their code base was tuned appropriately, they turned off the logging and then deployed it in test/or live environment.

There are also few things which are not so good about logging which we need to understand,

  • Excessive logging leads to bad performance of the application especially when logs are written in network drives. It chokes the network.
  • Using incorrect logging level for getting information is also bad practice.
  • I personally prefer to log information asynchronously as this will not block my code execution. (Take this suggestion with pinch of salt as I am still implementing logging for my code, so this statement might be wrong, but I know we can log in async mode, however it also depends on our code, it should allow multiple threads to run).
  • Logging on the same local drive as server often has IO overheads. Think as what happens when we have 100’s or 1000’s of users are on application and they all are logging.

Though lot many experts believe logging do not take more than couple of nanoseconds to couple of milliseconds to log information per user, however based on my experience, people implement logging very poorly, they just don’t log the information what is required or do excessive logging for various reasons or log it sync mode. Under load test based on my experiences, application which has got info level logging will have at least 4 to 10% more response time than when it has got logging level as on error. Of course this statement is true  if developer has some logging set up else expect that its going to long journey to with steps something like setup the logging infrastructure,reproduce the issue, verify the issue in logs, check the code for execution path,confirm the issue in code and then provide the fix for the issue.

Tags: , ,

Compression,Decompression,Mobile Performance and LoadRunner

February 4, 2013 by kiranbadi1991 | 2 Comments | Filed in Decompression, Development, LoadRunner, Performance Center, Performance Engineering, Scripting, SilkPerformer

Recently I inherited some of the  LR scripts from one of my colleagues,it was all about building the json calls for stressing the backend spring security framework which was first layer of entry into the mobile infrastructure.Those scripts were simple scripts  built using the custom request with json string as a body part.One of the things that really surprised me as part of this effort was that web custom request in itself was taking close to 100ms to 300ms to do decompression of the server response during load testing.

Okay first let me give you some background,servers were configured to send the response compressed in gzip format with content encoding header as gzip.The functionality under scope had SLA of 1 sec max and quite a few functionality in scope also had SLA that was less than 500ms.Quite a challenging SLA’s I would say.But again these functionality were supposed to be accessed over the mobile device,so probably less the response time better it is for users.

Most of the response coming from the server for all functionality was served as chunked bytes,so what it means is that server sends initially some bytes as response in compressed gzip format,LR decompresses  those bytes in 5 to 10ms and then again server sends next range of bytes as chunked gzip response and then again LR will spend close to 5 to 10ms to decompress those bytes and like wise the process continues till we have final set of bytes.All these process happens in the single connection and connection never closes with the server.In case if you do have some server response validation in place, then expect that it will add another 10ms to do that validation.

Now I have measured all these times in the single iteration of vugen,these times increase exponentially when we are running the Load Test in controller or PC and this overhead of decoding the gzip content becomes a quite an issue when response time SLA are in ms.

Here is how it looks when you see the behavior in LR Vugen with decompression on in the script.You can see that it takes 5ms to decode the 154 bytes of response.Now imagine the normal webpage will have size of 2mb of data gzipped,so you can see the impact of this decoding  when size of page increase specially when response is coming as chunked bytes with no fixed content length from the server.

pic1

 

I think HP LR team might also be aware of this behavior and probably that the reason as why they might have come up with function to disable this.Use Web set option with decode content flag turned off if you are running the scripts which do not require validation and has response time SLA’s in ms.The drawback of disabling this feature is that all your correlation and other checks for server response will fail since server response will show up as binary content like below.

pic3

 

I would suggest you to disable this feature if you can and do the response validation by using the other techniques like verifying server logs etc.By disabling this you will gain close to 15 to 20% reduction in response time reported by LR.

Is this expected behavior of LoadRunner ?, I think they have to do this,unless they decode the response, none of the other function like web reg save param or web reg find will work and these functions are core functions of LoadRunner.Probably the right way is that LR should not add these decompression timing in their transaction markers.These timing really pollute the results specially for web applications or probably they can increase the speed of this decompression library what they are using in LoadRunner.

Tags: , , , ,