Sunday, September 27, 2009

Query best practices

In this blog I'm stating several best practices when writing SQL queries in MS SQL Server. They are;
  • Choose the most appropriate data type
  • Write correctly formed queries
  • Return only the columns/rows needed
  • Fully qualified object names
  • Avoid long actions in triggers
  • Avoid expensive operators such as NOT LIKE
  • Minimize cursor use
  • Limits query and index hints
  • Avoid implicit or explicit functions in WHERE clause

Choose the most appropriate data type
Consider each of the following when you're choosing a data type:
Try to choose the smallest data type that works for each column.
  • Choose the most appropriate type because both implicit and explicit conversions may be costly in terms of the time it takes to do the conversion. 
  • Use the varchar type instead of text superior performance for columns that contains less than 8,000 characters.
  • The sql_variant data type allows a single column, parameter, or variable to store data values of different data types like int and nchar. However, each instance of a sql_variant column records the data value and additional metadata. The metadata includes the base data type, maximum size, scale, precision, and collation. While sql_variant provides flexibility, the use of sql_variant affects performance because of the additional data type conversion.
  • Unicode data type like nchar and nvarchar take twice as much storage space compared to ASCII data types like char and varchar. 
More about SQL Server data types

Write correctly formed queries
Check that all joins are correct, that all parts of the keys are included in the ON clause, and that there is a predicate for all queries. if you miss any keys in ON or WHERE clause that will result to generate a cross product or Cartesian product which perform badly.

Do not use DISTINCT clause unless you really need it. You should investigate queries which return duplicate records. Duplicate records may be a result of an incorrect join.

Return only the columns/rows needed
One of the most common performance and scalability problems are queries that return too many columns or too many rows. (Avoid using SELECT * FROM statements) Columns in the SELECT list are also considered by the optimizer when it identifies indexes for execution plans. This type of queries not only returns unnecessary data, but it also can force clustered index scans for the query plan, regardless of the WHERE clause restrictions.

Fully qualified object names
By fully qualifying all database objects with the owner, you minimize overhead for name resolution, and you avoid potential schema locks and execution plan recompiles.

Avoid long actions in triggers
A long running action in a trigger can cause locks to be held longer than intended, resulting in blocking of other queries. Keep your trigger code as small as efficient as possible. If you need to perform a long-running or resource-intensive task, use message queuing to accomplish the task asynchronously.

Avoid expensive operators such as NOT LIKE
The LIKE operator with a value enclosed in wildcards ("%value%") almost always causes a table scan. LIKE operators with only the closing wildcard can use an index because the index is part of a B+ tree, and the index is traversed by matching the string value from left to right.

Negative operations like NOT LIKE are also very difficult to resolve efficiently. Try to rewrite them in another way if you can. If you're only checking for existence, use the IF EXISTS or the IF NOT EXISTS construct instead.

Minimize cursor use
Use forward-only and read-only cursors unless you need to update tables. More locks may be used than are needed, and there is an impact on the tempdb database. The impact varies according to the type of cursor used.

The forward-only, read-only cursor is the fastest and least resource-intensive way to get data from the server.  Often cursors are used to perform a function row by row. If there is a primary key on a table, you can usally write a WHILE loop to do the same work without incurring the overhead of a cursor. Below code portion shows how you construct a WHILE loop without using  a cursor.

use AdventureWorks

declare @CustomerID int

select top 1 @CustomerID=CustomerID from Sales.Customer

while @CustomerID is not null

print @CustomerID 
--place your code here
select top 1 @CustomerID=CustomerID from Sales.Customer where CustomerID>@CustomerID 
if @@rowcount=0
set @CustomerID=null


Limits query and index hints
Query hints include the MERGE, HASH, LOOP, and FORCE ORDER hints that direct the optimizer to select a specific join algorithm. Index hints are table hints where a certain index is specified for the optimizer to use. Generally the Optimizer chooses the most efficient execution plan. Also, remember that SQL Server uses a cost based Optimizer; costs change over time as data changes.

If you find that the Optimizer is not choosing an optimal plan, try breaking your query into smaller pieces. Or try another approach to the query to obtain a better plan before you decide to use hard-coded query hints.

Avoid implicit or explicit functions in WHERE clause
The Optimizer cannot always select an index by using columns in a WHERE clause that are inside functions. Columns in a WHERE clause are seen as expression rather than a column. 

No comments:

Post a Comment

How to interpret Disk Latency

I was analyzing IO stats in one of our SQL Servers and noticed that IO latency (Read / Write) are very high. As a rule of thumb, we know tha...