DataDev: Ben Thurecht on SQL Server, C# and other bits: July 2011

Friday, July 29, 2011

SQL Server Table Partitioning – A Round Robin Approach

Over the last few weeks I’ve been working on designing a large, partitioned table using SQL Server 2008 R2. This table needs to hold 30 days worth of data, and every day it receives around 15 million new rows. So the total rowcount will be around 30x15M = 450 million rows.

The key design requirement for this table is that it needs to implement a 30 day sliding window, and recycling old partitions needs to be completed rapidly, and without manual intervention. When a new day starts, the corresponding partition should be wiped as quickly as possible to remove the 30 day old data sitting in it.

Step One: An Elegant ‘Hash’ Function

CREATE FUNCTION [dbo].[fnGetPartition](@Day SMALLINT)
RETURNS TINYINT
WITH SCHEMABINDING
AS
BEGIN
    DECLARE @RetVal TINYINT
    --The number of Partitions present
    DECLARE @PartitionCount TINYINT
    SET @PartitionCount = 30
    --Hash the supplied day in to one of the @PartitionCount buckets
    SET @RetVal = @Day % @PartitionCount + 1
    RETURN @RetVal
END
GO

Days are represented by monotonically increasing integers in this system. So the function above takes the current day, finds the modulus 30 of that number and returns an integer between 1 and 30. This is a really neat way to calculate a round-robin partition number for each day. The function maps input to output like this:

Input	Output
1	1
2	2
…	…
29	29
30	30
31	1
32	2

So day 31 will overwrite day 1, and so forth.

Step Two: Store the Partition Number as a Persisted Computed Column on the Table

The next step is to persist the partition number as a column on the large table. This is a cut-down, simplified version of the table for illustrative purposes:

--Create a test table
CREATE TABLE [dbo].[BigTable]
(
    [Partition] AS dbo.fnPartition(Day) PERSISTED,
    [Day]       SMALLINT        NOT NULL,
    [Symbol]    CHAR(5)     NOT NULL,
    [Ref]       CHAR(15)    NOT NULL,
    [Open]      BIT                     NOT NULL,
    [Closed]    BIT                     NOT NULL,
    [IsEmpty]   BIT                     NOT NULL,
)
ON PRIMARY
GO

This builds a table which has an integer between 1 and 30 stored in the Partition column

Step Three: Create A Partition Function

The next step is to add a partition function. This function maps an input parameter to a partition number. In this scenario, the hard part of figuring out which partition to use has already been completed by the hash function and stored in the [Partition] column. I found this was an easier approach, as there are restrictions on what you can and can’t do inside a partition function.

CREATE PARTITION FUNCTION [pfBigTable](TINYINT)
AS RANGE LEFT FOR VALUES
(
    1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
    16,17,18,19,20,21,22,23,24,25,26,27,28,29,30
)
GO

Step Four: Create A Partition Scheme

The partition scheme maps the partitions specified in the partition function to their physical storage location. For this scenario, I mapped all 30 partitions to a single filegroup.

CREATE PARTITION SCHEME psBigTable
AS PARTITION pfBigTable
ALL TO (BigFileGroup)
GO

Step Five: Configure the Table to Use the Partition Scheme

At this stage, the table can be recreated on the partition scheme rather than on the PRIMARY filegroup where it is currently sitting. The ON psBigTable([Partition]) statement tells SQL Server that the [Partition] column is what should be supplied to the partition function in order to decide where to store each row.

CREATE TABLE [dbo].[BigTable]
(
    [Partition] AS dbo.fnPartition(Day) PERSISTED,
    [Day]       SMALLINT        NOT NULL,
    [Symbol]    CHAR(5)     NOT NULL,
    [Ref]       CHAR(15)    NOT NULL,
    [Open]      BIT                     NOT NULL,
    [Closed]    BIT                     NOT NULL,
    [IsEmpty]   BIT                     NOT NULL,
)
ON psBigTable([Partition])
GO

Summary

That’s it for part one. The result is a partitioned table spread across 30 partitions, with a simple hash function in place to ensure that data is redirected to the appropriate partition based on the monotonically increasing [Day] column. This ensures that the buckets get ‘recycled’ every 30 days.

In the next post I’ll discuss partition aligned indexes, and an elegant way to do massive data deletes using partition switching.

Wednesday, July 27, 2011

Creating Persisted Computed Columns using Scalar Valued Functions

Computed columns are a really handy feature in SQL Server which allow you to calculate a column based on the value in one or more other columns. By default, a particular value for computed column is calculated on the fly every time the row is accessed.

There is a computational overhead incurred by the calculation, and in some cases it is better to persist the computed value to increase performance. This will depend on a few factors, including the complexity of the calculation, and the ratio of data modification to selects being issued against the table.

Which brings me to the topic of this post: Using a Scalar Valued Function to build a persisted computed column.

The following script shows the basic steps required to create a table that includes a computed column

--Create a test table
CREATE TABLE T1
(
 ID      INT                      NOT NULL,
 Price   DECIMAL(9,2) NOT NULL,
 Qty     INT                      NOT NULL,
 Total AS (Price * Qty)
) 

--Add Test Data
INSERT T1(ID, Price, Qty) VALUES (1,10.50,2) 

--Check Results
SELECT * FROM T1

This works well for basic calculations. What if we want to do something more complicated? The next step is to replace the calculation above with a call to a user defined scalar function. Here is a really simple function to demonstrate the point:

CREATE FUNCTION fnCalculateTotal( @price DECIMAL(9,2), @qty INT)
    RETURNS INT
AS
BEGIN
    RETURN @price * @qty
END

Next step: Modify the table to use the function instead of the hard-coded calculation.

--Create a test table
CREATE TABLE T1
(
 ID      INT                      NOT NULL,
 Price   DECIMAL(9,2) NOT NULL,
 Qty     INT                      NOT NULL,
 Total AS dbo.fnCalculateTotal(Price, Qty)
)

So far so good. But now I want to change the Total column to be persisted, so I don’t have to incur the overhead of calculating the value every time I query the table. To achieve this, add the PERSISTED keyword to the column definition:

--Create a test table
CREATE TABLE T1
(
 ID      INT                      NOT NULL,
 Price   DECIMAL(9,2) NOT NULL,
 Qty     INT                      NOT NULL,
 Total AS dbo.fnCalculateTotal(Price, Qty) PERSISTED
)

Msg 4936, Level 16, State 1, Line 2
Computed column 'Total' in table 'T1' cannot be persisted because the column is non-deterministic.

Non Determinism. Hmmm

A function is described as being derterministic if it always returns the same value when called with the same input parameters. The function above is deterministic, but SQL Server doesn’t know that. An example of a nondeterministic function is:

GETDATE()

Getdate is nondeterministic because it gives a different result every time it is called, despite the input parameters being the same.

To tell SQL Server that our function is deterministic, we need to add the WITH SCHEMABINDING keyword to the definition

CREATE FUNCTION fnCalculateTotal( @price DECIMAL(9,2), @qty INT)
    RETURNS INT
WITH SCHEMABINDING    
AS
BEGIN
    RETURN @price * @qty
END

SQL Server will now allow the creation of the persisted computed column in T1, as the function is guaranteed to return reliable results. In my next post, I'll discuss using a persisted computed column in conjunction with table partitioning.

Thursday, July 7, 2011

Protecting My App From Synchronous Database Mirroring - A Step Too Far?

The application I am writing at the moment does massive amounts of inserts, and it was never designed to be used with synchronous database mirroring. A DBA recently set up synchronous mirroring instead of asynchronous, and everything ground to a halt.

So now when this windows service starts up it checks what is going on back there in mirroring land:

declare @MirrorLevel int
select
      @MirrorLevel = isnull(mirroring_safety_level,0)
from sys.database_mirroring
where database_id = db_id()

select case @MirrorLevel
    when 0 then 'None'
    when 1 then 'Asynchronous'
    when 2 then 'Synchronous'
end as [MirroringConfiguration]

A step too far? Should my application be validating it’s infrastructure and throwing it’s toys if it doesn’t like it? Not sure what the correct answer is, but it was fun to implement anyway…

Saturday, July 2, 2011

WiX Toolset 3.5: Build that MSI like a Champ

The .NET 3.5 windows service I have been working on lately just got a bit more complicated. I need to deploy multiple instances of the service via an MSI and let the user choose which ones to deploy. Trying something like this with the standard Visual Studio packager is like punching yourself in the face.

The Windows Installer XML (WiX) toolset is an open source project that allows you to build MSIs based upon an XML instruction file. Sounds daunting but I managed to build my multi-service deploying MSI in about 3 hours.

I can’t recommend this highly enough to anyone who is looking at packaging options.

Click here for the WiX Toolset 3.5