Wednesday, July 2, 2008

Measuring SQL Performance

One thing that often amazes me is that many SQL Server developers do not actually measure the performance of their queries. When you are working with a small site or home project you might not see a big difference, but when implementing systems with large amounts of users and high levels of traffic you can not just settle with the fact that your query returns the expected result. You must also make sure that your queries use the least amount of resources and execute as quickly as possible. Sure, you can read articles and literature that describe how to write queries that perform well, but you can still not be sure that they work in an optimal way for your specific situation. After all, different schema designs, amounts of data, hardware resources etc all affect how a query performs. And one of the problems with SQL is that you can write the same query (i.e. that return the same results) in many different ways, and the performance of these different formulations will often differ as well. When I started investigating why some developers did not compare the performance of their queries it became clear to me that the main reason is that they do not know how to do this in an easy way. Many of them thought that you needed external tools, more or less complicated, to run against your server, and they did not have the time or inclination to learn and try these. This article will describe a couple of much easier methods of measuring performance of queries.

Time

The most simple way to measure performance is of course to measure the time it takes to execute a query. If you have not already noticed it, take a look in the status bar at the bottom-right corner of Query Analyzer. There you will find a timer that shows how many hours:minutes:seconds it took for a query (or rather the entire script) to execute. This is of course not a very exact measurement. Most queries you want to measure will probably not take more than a second to run, in a high-traffic environment they should probably execute in milliseconds if they are correctly optimized. So you need a better instrument to measure the execution time for a query.

Another, and better way to measure the amount of time it takes for a query to execute is to use the built-in function GETDATE(). Example 1 show how you can do this. The example uses the command WAITFOR to make the query execution 'stand still' for as long as we specify with DELAY. By first storing the present date and time when the execution begins and then comparing this to what it is when the execution is finished we can get a more exact measurement with milliseconds specified. Note however that the time is only specified down to 1/300 of a second (i.e. 3.33 ms). So, if a query takes 40 ms to execute that means somewhere between 40-43 ms.

-- Example 1
DECLARE @start datetime, @stop datetime
SET @start = GETDATE()
 
 
WAITFOR DELAY '00:00:00.080' -- do not do anything for 80 ms
 
 
SET @stop = GETDATE()
 
 
SELECT 'The execution took ' + CONVERT(varchar(10), DATEDIFF(ms, @start, @stop)) + ' ms to finish'

STATISTICS TIME

The best way to measure time however is to use the configuration setting SET STATISTICS TIME. The syntax for this is as shown below:

SET STATISTICS TIME {ON | OFF}

When this parameter is set to on the results pane of Query Analyzer will show statistics for the time it took to execute a query. Note that if you are running QA in grid mode you will need to switch to the Messages tab to see this. Example 2 demonstrates this:

-- Example 2
USE Northwind
GO
 
SET STATISTICS TIME ON
 
SELECT * FROM orders

In my results pane I get the following text:

SQL Server Execution Times:

  CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time: 
 CPU time = 0 ms, elapsed time = 0 ms.
 
 
(830 row(s) affected)
 
SQL Server Execution Times:
 CPU time = 30 ms,  elapsed time = 500 ms.

At first glance this might seem complicated to understand, but more or less the only thing you need to do is to look for the row with SQL Server Execution Times that is printed right after the text that specifies the number of affected rows. Above this you can see the time it took to parse and compile the query, but that time is not what we are interested in here. Most of the times this will be 0 ms if you run the same query several times in a row since the execution plan will already be cached. As said earlier, what we are looking for is the time it took to execute the query. In the example above it needed 30 ms of CPU time, but the total amount of time needed was 500 ms (try replacing the WAITFOR statement in example 1 with the select statement in example 2 and see if GETDATE gives you the same measurement). But if CPU time was only 30 ms, then where are the remaining 470 ms? The answer for this is I/O.

STATISTICS IO

As you probably know I/O is short for Input/Output. You could say that it means reading/writing resources, and normally you mean reading/writing from/to disk or memory. Very simply described, SQL Server needs to have the data pages containing the data to return to the client stored in memory (RAM). If they are not already cached there they must first be read from disk where they are physically stored and then placed in memory, from where they can then be returned to the client. The data pages will then be cached in memory for an unspecified time, which depending on several factors can range from 0 - ~ (indefinitely). Therefore a query might need more time to execute the first time you execute it, and because of this you should always execute the query a couple of times when measuring performance for it.

It is not only the time it takes for a query to execute that is interesting when measuring performance. Equally important (and often even more) is the amount of system resources that is needed to execute it. Since I/O is normally the slowest part of a query, especially if physical disk access is needed, it is very important to know the amount of I/O resources needed to execute it. The way to measure this is to use another configuration setting called SET STATISTICS IO. The syntax for this is similar to that of SET STATISTICS TIME:

SET STATISTICS IO {ON | OFF}

The result however is different. Again, look in the text of the results pane in QA. I executed example 2 a couple of times and the result is shown below:

Table 'Orders'. Scan count 1, logical reads 22, physical reads 0, read-ahead reads 0.

First we have the table name. Then comes the number of time this table was scanned, or rather accessed, to fetch the result of the query. The next parts tells us how many pages (data and/or index) that were read from the cache in memory to fetch the results, how many pages that were read from disk and the final number called read-ahead reads shows how many pages were placed into the cache for the query. The numbers you should normally look at is logical and physical reads plus scan count, and they should all of course be as low as possible. It might be better to have 100 logical reads than 10 physical reads since it is faster to read from memory, but generally speaking they should both be as low as possible. If you execute a query a couple of times physical reads will often be 0 since the data pages will already be cached after the first execution. Use these numbers to compare the resources needed when executing the same query formulated in different ways.

Other tools

With the above mentioned tools you have a good way of deciding which of several different versions of a query you should use to get the results you want in an optimal way. There are lots of other tools available as well, but I will not discuss them in this article. If you want to experiment with them yourself I would recommend you take a look at the following tools:

  • Show execution plan: by pressing Ctrl-K you get an extra tab when executing queries in QA. This tab shows a graphical representation of the execution plan used by SQL Server to execute your query. There is lots of information in Books Online about how to use the information shown there, as well as articles online.
  • SET STATISTICS PROFILE: This configuration option gives you a textbased variant of the execution plan.
  • SET SHOWPLAN_ALL and SET SHOWPLAN_TEXT: These options both present information regarding the resources and execution plan that would be used to execute the query, without actually executing it.
  • Profiler, Sysmon (Performance Monitor) and other external applications: Finally there are several external applications that can be used to measure and show different events and measurements in SQL Server and the system under execution. Profiler, a tool in the SQL Server client tools pack, connects to SQL Server and log all kind of diffenent events that occur, and Sysmon can of course be used to log measurements for a huge amount of performance counters both for SQL Server and the system as a whole.

 

Creating a PDF from a Stored Procedure

This article explains how to create a a stored procedure that will in turn create a simple column based report in PDF without using any external tools or libraries (and their associated licensing costs!).

SQL2PDF makes a PDF report from text inserted in the table psopdf ( nvarchar(80) ). First a table named psopdf should be created.

CREATE TABLE psopdf (code NVARCHAR(80)) 

After that create the stored procedure SQL2PDF.

SQL2PDF.TXT

And table psopdf has to be filled with your data as shown in examples below.
At the end the stored procedure is called using the file name only (not extension).

EXEC sql2pdf 'fileName'

The result is in your C:\ directory.

EXAMPLE 1:

INSERT psopdf(code) SELECT SPACE(60) + 'COMPANY LTD'
INSERT psopdf(code) SELECT SPACE(60) + 'COMPANY ADDRESS'
INSERT psopdf(code) SELECT SPACE(60) + 'STREET NAME & No'
INSERT psopdf(code) SELECT ' '
INSERT psopdf(code) SELECT SPACE(34) + 'BILL OF SALE'
INSERT psopdf(code) SELECT ' '
INSERT psopdf(code) SELECT 'Product' + SPACE(10) + 'Quantity'
+ SPACE(10) + 'Price' + SPACE(10) + 'Total'
INSERT psopdf(code) SELECT REPLACE(SPACE(56), ' ', '_')
INSERT psopdf(code) SELECT 'Product1' + SPACE(9) + '10.00 '
+ SPACE(10) + '52.30' + SPACE(10) + '5230.0'
INSERT psopdf(code) SELECT 'Product2' + SPACE(9) + '2.00 '
+ SPACE(10) + '10.00' + SPACE(10) + ' 20.0'
INSERT psopdf(code) SELECT REPLACE(SPACE(56), ' ', '_')
INSERT psopdf(code) SELECT SPACE(50) + '5250.0'

After INSERT call the stored procedure with file name demo2.

EXEC sql2pdf 'demo2'

The result is in your C:\ directory.

http://www.sqlservercentral.com/columnists/mivica/pdfdemo2.jpg

EXAMPLE 2:

Second example uses a database pubs.

USE pubs
INSERT psopdf(code) SELECT t1.au_lname + ' ' + t1.au_fname + ' ' + t1.phone 
  +
 ' ' + t1.address + ' ' + t1.city + ' ' + t1.state + ' ' + t1.zip FROM 
 authors t1, authors t2

After INSERT call the stored procedure with file name demo1.

EXEC sql2pdf 'demo1'

>The result is in your C:\ directory.

http://www.sqlservercentral.com/columnists/mivica/pdfdemo1.jpg

 

The Myth of Bandwidth and Application Performance

Deduping Data in SQL Server 2005

MS Excel and MS Word shortcuts

XML Injection

One more type of Injection we need to be aware of since we are using XML for data transfer in most of our applications:

 

XML Injection:

 

Click here to get some more information:

http://www.owasp.org/index.php/Testing_for_XML_Injection

How to add sitemap.xml for BLOGSPOT.COM

Please follow below mentioned steps in order to add a sitemap.xml for BLOGSPOT.COM. At this movement I am assuming that you have already verified your blog website adding Google provided meta tag on your home page of blog.

 

1.       Goto Google webmaster tools URL: www.google.com/webmasters/tools

 

2.       Enter your gmail username and password

 

3.       Click on the website link for which you want to add sitemap.xml file

 

4.       Click on Sitemap link

 

5.       Click on Add a sitemap

 

6.       Select “Add General Web Sitemap” option from the drop down menu

 

7.       Enter file name as “ATOM.XML”

 

8.       Click on “Add General Sitemap Button”

 

9.       That’s it J you are done. Google will verified your Atom.xml file within few hours…

Gmail Rolls Out 13 New Features

Gmail rocks and all of you knew it already, but even so, users always want to see more features from Google's mail service. Regardless if we're talking about folders or a different UI, people have always requested new functions. Because of that, Google has recently rolled out Gmail Labs, a special testing platform which incorporates test versions of the upcoming Gmail features which could be used by all users of the mail service.

"People often ask how we decide what to build next. It's usually a mix of factors, like how many users are asking for it (think delete button, vacation responder, and IMAP, among others), how useful we think it will be (think chat, conversation view, etc.) or how much fun it will be to work on (this is actually really important). We have all sorts of debates about each option, we weigh the pros and cons, and then some of the time we probably make the wrong decision," Keith Coleman, Product Manager, mentioned the reasons for implementing the new platform.

In other words, Gmail Labs is a testing platform designed by Google based on your feedback. If you and lots of other users request the same feature, Google designs it and integrates into Gmail Labs, in order to see what other people think about it. In case positive feedback is received, Google may bring the functionality in final version and upgrade it from Gmail Labs.

"The idea behind Labs is that any engineer can go to lunch, come up with a cool idea, code it up, and ship it as a Labs feature," the Google official explains. "Labs is now out to all English users (US and UK), and administrators using Google Apps can choose to enable Labs by checking the `Turn on new features' box in Domain Settings."

At this point, no less than 13 functions are available under the Gmail Lab link included in all Gmail accounts. All you need to do is to navigate to Settings and click on the last tab called Labs. Here you can find all the available experimental features. For instance, you can now choose new chat emoticons, mouse gestures for Gmail, custom keyboard shortcuts, custom date formats and many others. Obviously, the main goal of this testing platform is actually to get feedback, so feel free to send your opinion about the new features to Google.