I had the privilege to go on the SQL Cruise this summer and I never learned that much in one week ever! One of the things I knew about but wasn’t on my mind to check yet was the page verification option of the databases.
Argenis Fernandez gave a session on the SQL Cruise about the internals of SQL Server where this option was discussed. Back in the office I made a little script to check all my servers and boom! Almost all our databases, except the ones that were made after the installation of SQL Server 2005, had either the page verification set to NONE or TORN_PAGE_DETECTION.
This shocked me because the CHECKSUM option is available since SQL Server 2005 and every database was already running on SQL Server 2005 and higher. It turns out the former administrators didn’t think it was needed or didn’t know anything about it and didn’t want to change anything because everything worked as it did.
Because CHECKSUM makes sure that SQL Server can check if the read from disk is the actual data written to disk from before this is a big win when it comes to preventing large amounts of data corruption. You can read more about the checksums from Paul Randal’s blog.
When you change the page verification option to CHECKSUM, SQL Server will not do anything with the present data until it has to write the data to disk. Than, and only than, will it create a checksum for the data written. It will not do anything with the data that isn’t changed from the moment of changing the page verification option.
When you want to make sure that the data present from that moment on is being verified with a checksum you have to force SQL Server to read all the data and write it back to disk.
You can do this by various methods but the ones that came to my mind were:
- Create a new table and copy all the data from the old to the new. Delete the old table and rename the new to the old table.
- Rebuild the clustered index if present. If no clustered index is present, create one and drop it afterwards.
Method one sounds easy but is really time consuming besides the fact that you’ll need twice the amount of disk space for the tables to copy. When you have small tables this is fairly easy but in my case some of the tables were hundreds of gigabytes in size. So this wasn’t an option.
Method two is a lot easier. You rebuild all the clustered indexes and the data will have a checksum after the action is completed. Unfortunately I was dealing with a database where 95% of the tables were heaps (oh what do I hate heaps). To make this work I would have to create a clustered index for every heap and drop is afterwards.
The first thing I want to check is which tables are heaps.
IF OBJECT_ID('tempdb..#HEAPS') IS NOT NULL DROP TABLE #HEAPS CREATE TABLE #HEAPS ( object_id INT , table_schema VARCHAR(200) COLLATE SQL_Latin1_General_CP1_CI_AS , table_name VARCHAR(200) COLLATE SQL_Latin1_General_CP1_CI_AS ) -- List all heap tables INSERT INTO #HEAPS SELECT t.object_id , s.name AS [table_schema] , t.name AS [table_name] FROM sys.tables AS t INNER JOIN sys.schemas AS s ON t.schema_id = s.schema_id INNER JOIN sys.indexes AS i ON t.object_id = i.object_id AND i.type = 0 -- = Heap
The result by selecting the data from the table #HEAPS:
This will collect all the heaps that are present in the current database. That’s a start but how do I efficiently create all the clustered indexes where I don’t use a lot of disk space and it doesn’t take too long to create.
The next script will use the details from the temporary heaps table and check all the columns and get the column with the smallest length. Based on that a T-SQL statement will be created which will create the clustered index on the tables.
SELECT ('CREATE CLUSTERED INDEX ' + QUOTENAME('tempci_' + t.table_name) + ' ON ' + QUOTENAME(t.table_schema) + '.' + QUOTENAME(t.table_name) + '(' + QUOTENAME(t.column_name) + ' ASC' + ')') COLLATE SQL_Latin1_General_CP1_CI_AS FROM ( SELECT h.table_schema , h.table_name , c.object_id , c.name AS column_name , c.column_id , t.name AS type_name , c.max_length , ROW_NUMBER() OVER ( PARTITION BY c.object_id ORDER BY c.max_length ASC ) AS rownr FROM sys.columns c INNER JOIN #HEAPS h On h.object_id = c.object_id INNER JOIN sys.types t ON t.user_type_id = c.user_type_id ) t WHERE t.rownr = 1
I imagined that nobody with their right mind would create a clustered index named “tempci_[table name]” so that’s the name I used to create the clustered indexes.
Of course you could automatically execute all the statements but I like to have control of that so I let the script create all the dynamic statements for me and I would execute them separately.
Now that all the clustered indexes are created we have to cleanup the temporary clustered indexes. The following script will do that for you:
SELECT 'DROP INDEX ' + QUOTENAME(i.name) + ' ON ' + QUOTENAME(s.name) + '.' + QUOTENAME(t.name) + ' WITH ( ONLINE = OFF )' FROM sys.indexes i INNER JOIN sys.tables t ON i.object_id = t.object_id INNER JOIN sys.schemas s ON s.schema_id = t.schema_id WHERE i.name LIKE 'tempci_%' GO
The script creates the necessary statements which you can copy and execute in a query window.
If hope this script is useful for you and that it will make your life easier. Any comment is appreciated.