DataDev: Ben Thurecht on SQL Server, C# and other bits

Friday, June 29, 2012

SQL 2005 Upgrade Exams Expire 31st July 2012

The SQL 2005 to 2008 upgrade exams expire on 31st July 2012. This includes:

Exam 70-453: Upgrade: Transition Your MCITP SQL Server 2005 DBA to MCITP SQL Server 2008 DBA
Exam 70-454: Upgrade: Transition Your MCITP SQL Server 2005 DBD to MCITP SQL Server 2008 DBD
Exam 70-455: Upgrade: Transition Your MCITP SQL Server 2005 BI Developer to MCITP SQL Server 2008 BI Developer

To be honest it’s probably worth starting again on the SQL Server 2012 exams if you haven’t upgraded to 2008 at this point.

More information can be found here:
http://www.microsoft.com/learning/en/us/certification/cert-sql-server-all.aspx#tab2

Wednesday, June 27, 2012

Creating Database Snapshots Via SMO

The SMO (Server Management Objects) API is the best way to programmatically interact with SQL Server. On a recent project I had a requirement to create database snapshots on a regular basis using this technique. I’ve created a static C# wrapper class to look after the implementation, as some of the sample code I found online for this was pretty average.

There’s a few things to validate before the snapshot can be created:

Database name and snapshot name must be different
Database must exist
Snapshot must not exist
The SQL Server Edition must be Enterprise or Developer. (Standard doesn’t support snapshots)

Once those conditions are satisfied, snap away! Here's my validation code:

private static void ValidateSnapshotParameters(Server server, string databaseToSnap, string snapshotName)
{
    //Server Edition must be Developer or Enterprise
    if(server.EngineEdition != Edition.EnterpriseOrDeveloper)
        throw new SnapshotException("Snapshots are not supported in this edition of SQL Server. Enterprise or Developer edition is required.");

    //Database must exist
    Database db = server.Databases[databaseToSnap];
    if(db == null)
        throw new SnapshotException(string.Format("Specified Database does not exist: {0}", databaseToSnap));

    //Snapshot must not exist
    Database snapshot = server.Databases[snapshotName];
    if (snapshot != null)
        throw new SnapshotException(string.Format("A snapshot or database named {0} already exists", snapshotName));

    //Database and snapshot cannot be the same name
    if(databaseToSnap == snapshotName)
        throw new SnapshotException("Snapshot name must be different from Database name");
}

Below is the code to perform the actual snapshot. I recommend putting the validation method and the TakeSnapshot() method in a static class. Note that the snapshot must have the same filegroups and data files as the database being snapped. The implementation below looks after this. I've used the .ss file extension here, but it's an arbitrary choice according to MSDN. The most important line of code is:

snapshot.DatabaseSnapshotBaseName = databaseToSnap;

If you don't specify the DatabaseSnapshotBaseName property, then really all you are doing is creating a blank database!

public static void TakeSnapshot(Server server, string databaseToSnap, string snapshotName)
{
    try
    {
        ValidateSnapshotParameters(server, databaseToSnap, snapshotName);

        Database sourceDatabase = server.Databases[databaseToSnap];
        Database snapshot = new Database(server, snapshotName);
        snapshot.DatabaseSnapshotBaseName = databaseToSnap;

        //Create the same filegroups on the snapshot
        foreach (FileGroup fg in sourceDatabase.FileGroups)
        {
            snapshot.FileGroups.Add(new FileGroup(snapshot, fg.Name));

            //Add all of the files to the new filegroup
            foreach (DataFile file in fg.Files)
            {
                snapshot.FileGroups[fg.Name].Files.Add(new DataFile(snapshot.FileGroups[fg.Name], file.Name, Path.Combine(sourceDatabase.PrimaryFilePath, string.Format("{0}_{1}.ss", snapshot.Name, file.Name))));
            }
        }

        //Create the snapshot
        snapshot.Create();
    }
    catch (FailedOperationException ex)
    {
        //TODO: Logging
        throw;
    }
}

The snapshot operation completes in O(1) time, that is to say the time to complete the operation is not related to the size of the database. Finally, my implementation of a SnapshotException, which is a simple derivation of ApplicationException.

public class SnapshotException : ApplicationException
{
    public SnapshotException(string message) : base(message){}
}

Friday, June 22, 2012

Using the ADO.NET SqlBulkCopy Class

Recently I had an interview where I was asked about the ADO.NET Bulk Copy class. Frustratingly, I had used it about 12 months prior and couldn’t remember much about it. So this post serves the dual purpose of providing a tutorial for the class and reinforcing my general knowledge.

Overview

The SqlBulkCopy class can be found in the System.Data.SqlClient namespace, and it was introduced in .NET 2.0. As the name suggests, it allows for the bulk insert of data into a SQL Server table. The benefit of this over a traditional insert is that the operation is minimally logged and therefore much faster at inserting a large number of rows.

Transactions

Transactions management is applicable to the SqlBulkCopy class, and consequently a SqlTransaction can be passed to the constructor if required. Rolling back will cause all rows that have been inserted by the SqlBulkCopy object to be reversed out. So the operation is atomic and reliable.

Testing SqlBulkCopy

I set up the following table to test the SqlBulkCopy class. My target platform for this series of tests was SQL Server 2012 Developer Edition 11.0.2100.60.

CREATE TABLE [Customer]
(
 ID   INT IDENTITY(1,1) NOT NULL,
 FirstName VARCHAR(50)   NOT NULL,
 LastName VARCHAR(100)  NOT NULL,
 DOB   DATETIME   NOT NULL,
 Phone  VARCHAR(20)   NULL,
 City  VARCHAR(50)   NULL,
 CONSTRAINT PK_Customer PRIMARY KEY(ID)
)

This should be sufficient to test a few things that I am curious about – non-nullable columns, an identity column and a basic primary key.

Here is the C# code I am using to run the test:

public static void BulkCopyExample(string connectionString, DataTable rowsToInsert, int batchSize)
{
    try
    {
        using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connectionString))
        {
            //Batch Size
            bulkCopy.BatchSize = batchSize;

            //Destination Table Name
            bulkCopy.DestinationTableName = "Customer";

            //Write the rows to the Table
            bulkCopy.WriteToServer(rowsToInsert);

            //Close
            bulkCopy.Close();
        }
    }
    catch (Exception ex)
    {
        Console.WriteLine(ex.Message);
        Console.ReadLine();
        throw;
    }
}

Column Mappings

The code above is pretty simple, but there’s one key thing missing. There are no column mappings, so the SqlBulkCopy compoment doesn’t know how to map the provided DataTable to the underlying SQL Table.

Non-Nullable Columns

When the mappings aren’t done properly (or at all), you may receive an exception similar to the following:

The given value of type String from the data source cannot be converted to type datetime of the specified target column.

This particular exception is related to the DOB column, which needs to have a valid DateTime provided.

Identity Column

Simply omit the identity column from the ColumnMappings collection on the SqlBulkCopy object, and SQL Server will allocate the next available identity value for that column.

Here is the revised code, showing all of the columns neatly mapped. Note that nullable columns in the underlying table do not need to be mapped. I’ve omitted the City mapping to prove this.

//Column Mappings
bulkCopy.ColumnMappings.Add("FirstName", "FirstName");
bulkCopy.ColumnMappings.Add("LastName", "LastName");
bulkCopy.ColumnMappings.Add("DOB", "DOB");
bulkCopy.ColumnMappings.Add("Phone", "Phone");

Results

The sample row goes into the table. Note the City column is null because the mapping was omitted.

Keeping Track of Progress

The NotifyAfter property of the SqlBulkCopy object determines how frequently the SqlRowsCopied event is fired. For example:

//Notify After
bulkCopy.NotifyAfter = 1000;

//Listen for the Event
bulkCopy.SqlRowsCopied += new SqlRowsCopiedEventHandler(bulkCopy_SqlRowsCopied);

This will call the SQLRowsCopiedEventHandler after every 1000 rows. Note that there is a performance overhead associated with this call. In the event handler, a SqlRowsCopiedEventArgs is received and this can be used to get the current rowcount.

Experimenting with Batch Size

The BatchSize property determines how many rows will be sent to SQL Server in each round trip. Smaller batch sizes will result in more trips to SQL Server, but larger batch sizes will take longer to commit. The optimal batch size will depend on a number of factors including the length of each row, and the total of number of rows being inserted.

To test the effect of different batch sizes on performance, I inserted 1,000,000 identical rows using varying batch sizes and measured the total time taken to complete the task. The table was truncated after each test to guarantee a level playing field.

The results are pretty interesting – a small batch number results in lousy performance, whereas a batch size of around 70,000 seems to be optimal for my table and the sample row I was working with. It would be interesting to re-run this test for rows of differing lengths.

Friday, May 25, 2012

ADO.NET Entity Data Model – Unexpected Number of Rows Affected

I use the Entity Data Model in almost all of my projects now. Major time saver usually, but today I received an error that I hadn’t seen before:

Store update, insert, or delete statement affected an unexpected number of rows (0). Entities may have been modified or deleted since entities were loaded. Refresh ObjectStateManager entries.

What gives? I am adding an object to a really simple collection using this code:

using (HCEntities context = new HCEntities())
{
 Alert a = new Alert()
 {
  Description = "test",
  AgentGUID = "aaaa",
  Timestamp = DateTime.Now
 };
 context.AddToAlerts(a);
 context.SaveChanges();
}

The Entity Framework is attempting to insert into this table:

CREATE TABLE [dbo].[Alerts]
(
 [ID]           INT IDENTITY(1,1)  NOT NULL,
 [AgentGUID]    VARCHAR (100)      NOT NULL,
 [Timestamp]    DATETIME          NOT NULL,
 [Description]  VARCHAR(500)      NULL
 CONSTRAINT [PK_Alerts] PRIMARY KEY 
 (
  [ID],
  [AgentGUID], 
  [Timestamp]
 )

 CONSTRAINT [FK_Alerts_Agent] FOREIGN KEY ([AgentGUID]) 
 REFERENCES [dbo].[Agent] ([AgentGUID]) 
 ON DELETE CASCADE ON UPDATE NO ACTION
)

Solution

It turns out that the Entity Framework doesn’t like IDENTITY columns participating in a composite primary key. Given that the identity column is unique by definition, a composite key makes no sense in this scenario anyway.

I updated the primary key definition and the problem was fixed. The exception thrown by the framework is fairly cryptic though.

Tuesday, May 22, 2012

Ancient SQL 2000 Bug

Recently I’ve been running an automated data collection across a couple of hundred SQL Server instances. I hit an issue today when I ran a particular query on some servers running 8.00.760 (SQL 2000 SP3)

select filename 
from sysaltfiles
where name = 'tempdev'

Not very exciting – it returns the path as expected:

E:\MSSQL\DATA\tempdb.mdf

However when my script ran this slight variation of the query above, I got a really strange result:

declare @tmp varchar(255)
select @tmp =filename
from sysaltfiles
where name = 'tempdev'

print @tmp

E:\MSSQL\DATA\tempdb.mdfft SQL Server\MSSQL\DATA\tempdb.mdf

It turns out that when you move tempdb on a server running this build, there was a bug where it didn’t properly clean out the value stored in sysaltfiles. The original value is overwritten, and a null marks the end of the new string. Looks ok in the results pane, but when you print it, it wreaks havoc on your output.

I doubt anyone is still seriously running this build, but there you go. Old school dirty data.

Friday, October 7, 2011

Window Functions

Last night I was lucky enough hear Itzik Ben-Gan speak at the SQL Server user group here in London. The topic for the evening was Window Functions, and more specifically the enhancements that have been added to the OVER Clause in Denali.

Window Functions are an incredibly powerful addition to T-SQL, which effectively replace the GROUP BY clause in such a way that different aggregations can be applied to different subsets of the original result set. This is particularly powerful when calculating moving averages, and building cumulative totals. Previously this was difficult and computationally expensive to do in T-SQL.

Here is an article that Itzik wrote for SQL Server Magazine on this topic last year. Certainly worth a read for anyone interested in Window Functions.

Friday, July 29, 2011

SQL Server Table Partitioning – A Round Robin Approach

Over the last few weeks I’ve been working on designing a large, partitioned table using SQL Server 2008 R2. This table needs to hold 30 days worth of data, and every day it receives around 15 million new rows. So the total rowcount will be around 30x15M = 450 million rows.

The key design requirement for this table is that it needs to implement a 30 day sliding window, and recycling old partitions needs to be completed rapidly, and without manual intervention. When a new day starts, the corresponding partition should be wiped as quickly as possible to remove the 30 day old data sitting in it.

Step One: An Elegant ‘Hash’ Function

CREATE FUNCTION [dbo].[fnGetPartition](@Day SMALLINT)
RETURNS TINYINT
WITH SCHEMABINDING
AS
BEGIN
    DECLARE @RetVal TINYINT
    --The number of Partitions present
    DECLARE @PartitionCount TINYINT
    SET @PartitionCount = 30
    --Hash the supplied day in to one of the @PartitionCount buckets
    SET @RetVal = @Day % @PartitionCount + 1
    RETURN @RetVal
END
GO

Days are represented by monotonically increasing integers in this system. So the function above takes the current day, finds the modulus 30 of that number and returns an integer between 1 and 30. This is a really neat way to calculate a round-robin partition number for each day. The function maps input to output like this:

Input	Output
1	1
2	2
…	…
29	29
30	30
31	1
32	2

So day 31 will overwrite day 1, and so forth.

Step Two: Store the Partition Number as a Persisted Computed Column on the Table

The next step is to persist the partition number as a column on the large table. This is a cut-down, simplified version of the table for illustrative purposes:

--Create a test table
CREATE TABLE [dbo].[BigTable]
(
    [Partition] AS dbo.fnPartition(Day) PERSISTED,
    [Day]       SMALLINT        NOT NULL,
    [Symbol]    CHAR(5)     NOT NULL,
    [Ref]       CHAR(15)    NOT NULL,
    [Open]      BIT                     NOT NULL,
    [Closed]    BIT                     NOT NULL,
    [IsEmpty]   BIT                     NOT NULL,
)
ON PRIMARY
GO

This builds a table which has an integer between 1 and 30 stored in the Partition column

Step Three: Create A Partition Function

The next step is to add a partition function. This function maps an input parameter to a partition number. In this scenario, the hard part of figuring out which partition to use has already been completed by the hash function and stored in the [Partition] column. I found this was an easier approach, as there are restrictions on what you can and can’t do inside a partition function.

CREATE PARTITION FUNCTION [pfBigTable](TINYINT)
AS RANGE LEFT FOR VALUES
(
    1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
    16,17,18,19,20,21,22,23,24,25,26,27,28,29,30
)
GO

Step Four: Create A Partition Scheme

The partition scheme maps the partitions specified in the partition function to their physical storage location. For this scenario, I mapped all 30 partitions to a single filegroup.

CREATE PARTITION SCHEME psBigTable
AS PARTITION pfBigTable
ALL TO (BigFileGroup)
GO

Step Five: Configure the Table to Use the Partition Scheme

At this stage, the table can be recreated on the partition scheme rather than on the PRIMARY filegroup where it is currently sitting. The ON psBigTable([Partition]) statement tells SQL Server that the [Partition] column is what should be supplied to the partition function in order to decide where to store each row.

CREATE TABLE [dbo].[BigTable]
(
    [Partition] AS dbo.fnPartition(Day) PERSISTED,
    [Day]       SMALLINT        NOT NULL,
    [Symbol]    CHAR(5)     NOT NULL,
    [Ref]       CHAR(15)    NOT NULL,
    [Open]      BIT                     NOT NULL,
    [Closed]    BIT                     NOT NULL,
    [IsEmpty]   BIT                     NOT NULL,
)
ON psBigTable([Partition])
GO

Summary

That’s it for part one. The result is a partitioned table spread across 30 partitions, with a simple hash function in place to ensure that data is redirected to the appropriate partition based on the monotonically increasing [Day] column. This ensures that the buckets get ‘recycled’ every 30 days.

In the next post I’ll discuss partition aligned indexes, and an elegant way to do massive data deletes using partition switching.