Ode to Sunil – Minimal Logging changes in SQL Server 2008

Exploring blog posts from the Database Engine Team

Refer to Sunil Agarwal’s series on minimal logging in SQL server… their blog site.

What is minimal logging?

Minimally logged operations, sometimes incorrectly referred to as ‘non-logged’, is the method that can be used to speed up data manipulation (DML) or Bulk Loading by using techniques to (1) avoid a 1:1 write to the transaction log, (2) reduce the log record length size that does get logged. Read up on the SQL Server Write-Ahead log in order to get up to speed on our discussion here. When a write operation is fully logged, it generally runs slower than a minimally logged operation, and what we’re looking for always of course…is faster running code.

Because we don’t necessarily need to be on the product team, MSFT employee, a SQL MVP, or a co-author of the SQL Engine to see the differences and benefits of fully-logged vs. minimally-logged operations, all SQL noobs can participate, including myself!

Example from the blog

Before we start, make sure that your development database is either in Simple or Bulk-Logged recovery.

ALTER DATABASE yourdatabasename
       SET RECOVERY SIMPLE 

Run the following code in SQL Server. I’ve formatted the authors so I can read it better. Although it looks a bit different, it’s the same code. The first script below creates a source table, from which we’ll INSERT data, and then see the affects of Full vs. Minimal logging when we move data from this source table to our destination table:  

–create the source table

CREATE TABLE t_source (c1 int, c2 int, c3 char (100), c4 char(1000))

GO

 

DECLARE @i int

SELECT @i = 1

 

WHILE (@i < 1000)

BEGIN

INSERT INTO t_source VALUES (@i, @i+10000, indexkey, hello)
SELECT @i= @i + 1

END

 

 

The source table is now populated with data. Now run this script to create the target table (destination table, where we’ll write records to):

 

–— create the target heap 

CREATE TABLE t_heap (c1 int, c2 int, c3 char(100), c4 char(1000))

GO

 

His comment ‘target heap’, remember, is a SQL Server table with no clustered index.

OK here comes the fun now. We’re going to use an INSERT..SELECT statement to move the rows that we created to our destination table.

–— this is minimally logged.

–— LOCK: X lock on the table

–— this behavior is same even when ‘t_heap’ is not empty

 

Above, X lock is an ‘exclusive’ lock on the destination table. Without going into that in depth here, assume that this table is available to only the INSERT statement that is running and inserting rows.

 

BEGIN TRAN

INSERT INTO t_heap WITH (TABLOCK) SELECT * FROM t_source

 –— here are the top-10 log records. You can see that it is minimally logged 

SELECT TOP 10 operation

              ,context

              , [log record fixed length]

              , [log record length]

              , AllocUnitId

              , AllocUnitName

FROM fn_dblog(null, null)

WHERE allocunitname=‘dbo.t_heap’

ORDER BY [Log Record Length] DESC

 

Notice the WITH (TABLOCK) part of the insert statement. This is how we can get minimal logging for this statement.

I’m going to copy his results from the above select statement:

You can see that, although the target heap table has two int columns, a char(100) column, and a char(1000) wide column, the log record length is only 92 bytes.

Make sure and rollback the above transaction:

ROLLBACK

 

Now he demonstrates that, without the table hint WITH (TABLOCK), we get full logging:

 

 

–— this is fully logged

INSERT INTO t_heap

       SELECT * FROM t_source

 

–— here are the top 10 log records. You can see that it is fully logged

 

Now run the previous script again to see the logging that we got without the hint:

 

SELECT TOP 10 operation

              ,context

              , [log record fixed length]

              , [log record length]

              , AllocUnitId

              , AllocUnitName

FROM fn_dblog(null, null)

WHERE allocunitname=‘dbo.t_heap’

ORDER BY [Log Record Length] DESC

 

Here he shows us that there are a few changes. One thing that sticks out is that the log record length column now shows a value of 1208 for the rows that we queried; the previous time we ran the query, we noticed a log record length of 92. SQL Server is not logging the full record in the first run (92), but in the second run (1208) it is closer to the length of a row in our table (2 int columns @ 4 bytes each, another 100 bytes, and then 1000 bytes).

Performance Benefits

Let’s look at the above throwing some data at this exercise – a few rows is good for testing out this proof of concept, but I want to really see how it can help me.

I have a table that has 170 million rows for testing. Google and search for a script that allows you to create a large table for test data, there are plenty of them out there.

I’ll insert 1 million rows without the hint:

TEST 1:

SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 3 ms.

SQL Server Execution Times:
   CPU time = 3657 ms,  elapsed time = 3726 ms.

(1000000 row(s) affected)

Here is  1 million rows using the table hint WITH (TABLOCK):

SQL Server Execution Times:
   CPU time = 1889 ms,  elapsed time = 1184 ms.

(1000000 row(s) affected)

TEST 2:

Now with 12 million rows, first with no table hint and then with WITH (TABLOCK):

SQL Server Execution Times:
   CPU time = 40297 ms,  elapsed time = 81775 ms.

(12000000 row(s) affected)

SQL Server Execution Times:
   CPU time = 22798 ms,  elapsed time = 16073 ms.

(12000000 row(s) affected)

TEST 3:

Something interesting, I want to test 5 million rows using TABLOCK in a user database:

SQL Server Execution Times
  CPU time = 10280 ms,  elapsed time = 7289 ms.

Next, I want to try the same test but use a temporary, or ‘temp’ table:

SQL Server Execution Times:
   CPU time = 8001 ms,  elapsed time = 5062 ms.

(5000000 row(s) affected)

The temp table was faster. Now I want to try it with a regular SQL table, but created in tempdb instead. So this is a table similar to one that you would create in any user database, except create it in the tempdb system database. Note that any object in tempdb goes away when the service gets restarted, so you normally would not want to add objects there:

SQL Server Execution Times:
   CPU time = 5657 ms,  elapsed time = 3288 ms.

(5000000 row(s) affected)

The late Ken Henderson, who was a SQL Server expert, author of all of those Guru’s Guide to SQL Server books, Microsoft employee, and class act if you had the fortune to meet, shared the trick with me 13 years or so ago.

More on this one later.

What did we learn?

Sunil Agarwal has given us some great insight into SQL Server internals. According to his profile on the Microsoft site, he is a Principal Program Manager on the SQL Server Storage Engine Group, and has blogged for years, sharing some of the coolest tip and tricks.

This blog category is an appreciation thread to his great posts, and I look forward to studying what he has written and hopefully expanding on some of his topics for fun.

In this post, we learned about minimal logging and using a WITH (TABLOCK) on the destination table. We took the examples that were given, and then expanded on them by showing results with larger datasets. I did not DROPCLEANBUFFERS, FREEPROCCACHE, etc…we’re getting approximate times here, no need to go beyond that for now.

Finally, I tried a trick where I created the table in tempdb…we can explore the reasons why that consistently runs even faster yet, and then explore a little application that I built which helps us further in testing Full Logging vs. Minimal Logging.

Try some experimentation with this in your code; test it, for example, in a report where you manipulate data in several steps before finally joining it together for a report. The intermediate steps might be sped up significantly using this technique. Or if you load tables in, say, a batch job of several steps, you might decrease your load times a great deal. Nice!

Thank you

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s