Monday, August 29, 2016

Powershell supports Linux

Microsoft this week announced that PowerShell is now open sourced and available on Linux, the start of a development process that aims to enable users to manage any platform from anywhere, on any device.

R Integration in SQL Server 2016

R is a popular programming language that is used in data science to analyse data. R has capabilities for data analytics. Data analytics needs special data structures such as arrays, vectors, matrix. Traditional RDBMS products do not have capabilities for data analytics because they do not have such data structures stated above. 

The latest release of MS SQL Server 2016 has R integration so that data processing and data analytics can be done in the same data management platform. 

This blog post highlights R integration in SQL Server 2016 by using a simple example. 

Let's assume you have a requirement to generate sequence of numbers. There are various methods you can do in SQL Server using T-SQL.  The typical solution would be using a WHILE loop and few control variables which is mentioned below;

There is an interesting  msdn forum question which has many alternative ways to achieve this. It is worth to read the discussion. 

The same thing can be accomplished using R in a very simple way. R has a function named seq, and below code snippet shows R seq to generate a sequence of numbers from 1 to 10. You can use RStudio to use full R functionalities. 

> seq(1, 10)
 [1]  1  2  3  4  5  6  7  8  9 10

Now, lets see how we can integrate the above simple R function with SQL Server 2016. First you need to enable R integration in SQL Server 2016. 

I had to restart SQL Server service after executing the above code which should not be the case. Below is the version of the SQL Server that I'm using for this demo. 

Microsoft SQL Server 2016 (RTM) - 13.0.1601.5 (X64)

I had to restart SQL Server Launchpad service too which is a new service introduced in MS SQL Server 2016. 

Then you can use sp_execute_external_script programming interface to call an external script written in R or in other language which recognized by SQL Server

Below is the output of the above code. 

There are many complex data analytics work you can do in R and the same thing you can do inside SQL Server by combing the data stored in SQL Server. 


Thursday, August 25, 2016

SQL Server on Linux

It is certain that many are excited and welcome the Microsoft official announcement of SQL Server is going to support Linux platform. Specially the open source community must be thrilled with the news. 

However, still there is not much information available to public about how SQL Server on Linux will looks like. As per Microsoft the initial release will be available on mid of 2017. 

Announcing SQL Server on Linux


If we look at the history of SQL Server, the very early released versions were on UNIX based operating system known as OS/2 which is jointly developed by IBM and Microsoft. As per wikipedia the kernel type of OS/2 is mentioned as hybrid. That was 1988/89 period. 

The SQL Server versions 1.0, 1.1, and 4.2 were all based on OS/2 platform. Microsoft had separately developed the version 4.2 to support for their first version of Windows NT OS but with the same features as in the version that runs on OS/2.

Historically, the SQL Server has born in UNIX based platform and later it has been ported to support only on Windows after ending the agreement with Sybase.  

After 25 years later, Microsoft has decided to make SQL Server available on Linux platform which is a good move. 

source: Microsoft SQL Server - History


In my opinion, the main challenge would be to develop an abstraction layer to support Linux platform. The architecture of Windows and Linux are distinctly different. Both has preemptive scheduling but for the process model, Linux has very unique implementation of threads. SQL Server's process/threads model that runs in Windows is, single process multi-threading, meaning you can see only a single process for an SQL Server instance while internally it has thread pool to service for client requests. Consequently, thread is the unit of work at OS level to accomplish SQL Server requests in Windows. 

Conversely, in Linux has multi processes and there is no concept called threads as in Windows OS. Linux calls fork() system call to create a child process from the parent process and then it is finished or killed. In a nutshell, in Linux, everything is done by in-terms of processes. 

Bridging this architectural difference in SQL Server probably the most challenging part. I'm no expert in OS but if you think in software engineering point of view, this is the image I get about the abstraction layer. 

SQL Server has it's own user mode operating system known as SQLOS which is non-preemptive scheduler to interact with the Windows OS to service better for special needs of SQL Server. The SQLOS was introduced in SQL Server 2005 code name Yukon. 

To support Linux platform, the SQLOS component may need lot of additions or else there could be a separate component like SQLOS to support Linux platform. We do not know that information yet. 

SQLOS – unleashed

What features will be in SQL Server on Linux?

It is unclear what features will be available in SQL Server version that runs on Linux. To provide full features set as in SQL Server 2016 would be a real challenge. Another question is, whether it will include the BI tools? As per the announcement blog, it clearly states "We are bringing the core relational database capabilities to preview today" What are the core relational database capabilities? I presume the following capabilities should definitely need be addressed. They are; scheduling, memory management, I/O, exception handling, ACID support, etc. How about high availability options like clustering and AlwaysOn? I predict there will be some sort of high availability option in the first release on Linux. How about the Page size, is it 8K or different? By default Linux page size is 4K (4096 Bytes), however SQL Server in Windows supports non-configurable page size which is 8K(8192 Bytes). Consequently, SQL Server on Linux has to support is default page size of 4K. However unlike in Windows, Linux page size is configurable but it is not certain how feasible and the consequences of doing it. I just checked the page size in Mac OS and it is 4K too. See below figure. 

Will there be In-Memory OLTP engine with SQL Server on Linux? In-Memory OLTP, code name Hekaton was first introduced in SQL Server 2014 and it further developed in SQL Server 2016. In-Memory OLTP is an enormously important feature in future RDBMS products. However it is unclear yet, the first release of SQL Server on Linux will have this feature or not. 

What this means to DBAs?

Skill set required for future SQL Server DBAs are going to be expanded. They need to learn Linux OS commands as well as Python and Powershell for scripting tasks. In future companies will going to have a mix of Linux and Windows based SQL Server installations most probably. Licensing cost will decide which platform version of SQL Server is going to be the bigger portion. In a nutshell, future SQL Server DBAs will be cross-platform DBAs. It's challenging isn't it?

How about the Certifications?

Most likely, a new certification path will emerge to cater for the new trends/skills of SQL Server. To become a successful DBA, you might need to be qualified for SQL Server for both Linux and Windows platforms as well as cloud platforms. 

How this will impact to other RDBMS products?

My sense is, this will impact directly to MySQL. MS SQL Server is a proven enterprise level data platform. If anyone has a chance to get that into Linux OS, then it is highly unlikely someone will choose MySQL besides SQL Server providing that SQL Server Developer edition is free. 

I'm really excited and looking forward to get some hands on with SQL Server on Linux in very near future.


Thursday, August 18, 2016

TempDB Configuration

Recently we had some discussion around the best practices for tempdb configuration. The entire discussion was based on SQL Server 2012 EE. This blogpost highlights some salient points of tempdb configuration and include links to additional resources as well. 
  • How many files (data / log) we need to create? Is it based on logical cores or physical CPUs?
Any SQL Server database requires two files (1 data file and 1 log file) at minimum. SQL Server will allow you to add multiple log files but there is no benefit by doing so as SQL Server writes to log file in serial fashion. As a result, we can safely state, you do not need multiple log files for tempdb. 

What about the data files? SQL Server allows to add multiple data files too and there are benefits of doing this. Since our discussion is around tempdb, we can add multiple data files to address the very common phenomena known as tempdb contention. The whole objective of adding multiple data files is to improve the concurrency of tempdb activities in other words to improve the throughput of tempdb. Beginning from SQL Server 2005, tempdb tends to become the bottleneck as SQL Server continues to add new features which utilizes the tempdb heavily. As a result contention around tempdb becomes increased. However the history of tempdb contention goes back to SQL Server 7.0 because TF-1118 is also applies to SQL Server 7.0. The TF-1118 uses to reduce the tempdb contention of allocation pages (PFS, GAM and SGAM).

When deciding the no.of data files for tempdb, a general rule of thumb is to have no.of data files equal to the no.of logical cores up to 8 logical cores, meaning if the machine has 16 logical cores, you can add up to 8 data files. I've seen servers where you've 16 cores and tempdb has 16 data files too. This configuration is not appropriate according to the general rule of thumb. 

Modern CPUs have large number of cores. For example, Intel Xeon E7 processor family has a processor which has 24 cores. If you consider 8 CPU sockets server then you will get 192 (24*8) logical cores and when hyperthreading is enabled it will become 384 (192*2) logical cores. So in such a server, are we going to configure 384 data files for tempdb? That is an extremely bad idea because when there are such no.of data files the SQL Server has an additional overhead of threads management. 

Back to no.of files discussion, you can start with 8 data files (if no.of cores are 8) and then increase the no.of data files by multiple of 4 (up to the no.of logical processors of the machine). You need to do that, only if you still see contention around tempdb. 

  • How to decide the initial size of the tempdb?
There is no universal formula to do this and therefore it is non-trivial. You need to identify the features you're planning to use which potentially utilize the tempdb space. E.g: consider temp table and table variable usage, sorting, ordering, version store, etc. Then use your best judgement to decide the initial size and then monitor the tempdb usage by executing a workload in a test environment and decide whether you need to resize it or not. 
  • Do we need to enable autogrowth or not? If we do, then, is it for all the files or just one file?
Autogrowth setting should be used for emergency situations (for both data and log files) and should not be the process to grow the data file or log file. You always need to manually size the data files and log file and set the autogrowth so that in an emergency situation if tempdb needs additional space the SQL Server will grow the files without impacting the current server activities. If you've multiple data files situation then enable the autogrowth for all the files. 
  • The size of autogrowth?
Microsoft RAP tool has a rule which detects non standard database autogrowth sizes. The rule is as below;

"Database files have been identified that have the next Auto Growth increment of 1GB or above, and instant file Initialization is not enabled"

When the tool detects this, it classified as severity high risk. Basically it does not recommend to have autogrowth sizes 1GB or above. So that I would keep autogrowth size as 1023 MB (yes, it is not 1024 GB). This is especially important for the log file because depending on the autogrowth chunk size, the no.of VLFs vary. No.of VLF logic is as below (this applies to SQL Server 2016 as well);

chunks less than 64MB and up to 64MB  = 4 VLFs

chunks larger than 64MB and up to 1GB = 8 VLFs

chunks larger than 1GB = 16 VLFs

Note: The size of the t-log does not affect the no.of VLFs but the chunk size. 
  • Do we need to keep all data files and log file in the same drive?
It is better to keep them in separate drives if you've the luxury to have multiple drives (to increase the IO bandwidth). The limitation comes in clustered environment where you can have limited drive letters. (Please note that you can also place the tempdb in SSD).

If you've 4 nodes cluster with 3 SQL Server instances. We have already utilized 20 drive letters (including the C drive for OS). These drive letters are for system data like master, msdb, etc, data files, log files, local MSDTC, tempdb, backup and Quorum. 
  • Changes in SQL Server 2016
The TF (trace flag) 1117 is used to grow all the data files when a file in the filegroup reaches the autogrowth threshold. Beginning from SQL Server 2016 this behavior is controlled by the two options added to ALTER DATABASE statement. The two options are, 1. AUTOGROW_SINGLE_FILE, 2. AUTOGROW_ALL_FILES. However this is applies to user databases not the tempdb. 

Starting from SQL Server 2016, the behaviour of TF-1117 or TF-1118 will be automatically enabled by default for tempdb. So it makes DBAs life much more easier. 

Just to make it clear, see the below table; 

There are lot to talk and learn about tempdb. Tempdb is the only database per instance which keeps all the temporary work of that instance. So it may become a bottleneck and the entire system will become slow. So it is always need to pay a close attention to tempdb behaviour and its activities then take remedial actions to minimize them. Best practices are there for you get a good start but what is important is to identify the optimal configuration and parameters for your system rather than following best practices blindly. 


Monday, August 8, 2016

How to check the isolation level

SQL Server supports all four ANSI standards isolation levels. They are;
  1. Read Uncommitted (the lowest level, high concurrency, less blocking, more currency related issues) 
  2. Read Committed (default)
  3. Repeatable Read
  4. Serializable (the highest level, less concurrency, higher blocking, no concurrency related issues) 
Other than the above four, SQL Server introduced additional two optimistic concurrency control isolation levels based on row versioning technique;
  1. Read Committed Snapshot
  2. Snapshot isolation level
DBCC USEROPTIONS command can be used to see the current isolation level in SQL Server.

Wednesday, August 3, 2016

Differences between SQL Server and SQL Azure

I captured this screenshot from MS Azure Webinar and thought of sharing it since it gives a high level overview of supported features in SQL Server Azure. 

For more details:

How to interpret Disk Latency

I was analyzing IO stats in one of our SQL Servers and noticed that IO latency (Read / Write) are very high. As a rule of thumb, we know tha...