Analyze table usage for present data

Standard

dataThe time I’ve spend analyzing tables for effective database modeling is just too much. I don’t like to do things twice and I’d rather spend a little more time to develop something smart than doing things over and over again. This was also the case where I had to analyze table usage for a set of tables in a databases.

The question asked was to see the minimum and the maximum length for all the columns in a table. Doing this is not that hard but making it effective to execute it against more than one column is a bit harder. Still not rocket science but needs a little more actions are needed to get things done.

The script below will take a few arguments

  • table_name
  • sample

The table_name argument is just the table name you want to analyze. You could change the query so it will execute against all the tables but that’s a little too far for the purpose of this script.

The sample argument allows you to take a subset of data instead of the entire table. The reason I did that was because I’m working with tables with more than a 100 million rows and that could take a long time before everything would’ve been read.

The script will take the amount of rows and put it in a temporary table. Than get all the columns from the specified table. After this it will create the queries needed to get all the lengths needed and make up a result.

At the end the result is combined with the information from the columns.

DECLARE @table_name NVARCHAR(250)
DECLARE @query NVARCHAR(MAX)
DECLARE @sample NVARCHAR(100)

IF OBJECT_ID('tempdb..##TEMP') IS NOT NULL DROP TABLE ##TEMP
IF OBJECT_ID('tempdb..#COLUMNS') IS NOT NULL DROP TABLE #COLUMNS
IF OBJECT_ID('tempdb..#RESULT') IS NOT NULL DROP TABLE #RESULT

-- Replace the values in these variables
SET @table_name = '<table name>'
SET @sample = '<amount of records to test>'

-- Create the table for the results
CREATE TABLE #RESULT
(
	table_name VARCHAR(250)
	, column_name VARCHAR(250)
	, min_value INT
	, max_value INT
)

-- Get the sample from the table
EXECUTE ('SELECT TOP ' + @sample + ' * INTO ##TEMP FROM ' + @table_name +';')

-- Get the information about the columns
SELECT 
	c.name AS [column_name]
	, c.max_length AS [column_max_length]
	, t.name AS [column_type_name]
	, t.max_length AS [type_max_length]
INTO #COLUMNS FROM sys.columns c
INNER JOIN sys.types t ON t.user_type_id = c.user_type_id
WHERE OBJECT_NAME(object_id) = @table_name
ORDER BY column_id

-- Reset the query variable
SELECT @query = ''

-- Make up the queries for getting the lengths of the columns
SELECT @query += '
		INSERT INTO #RESULT 
		SELECT ''' + @table_name 
			+ ''', ''' + column_name + '''
			,ISNULL(MIN(LEN(' + column_name + ')), 0)
			, ISNULL(MAX(LEN(' + column_name + ')), 0) 
		FROM ##TEMP;'
FROM
	#COLUMNS

-- Execute the queries
EXECUTE sp_executesql @query

-- Get the result
SELECT 
	r.table_name
	, r.column_name
	, c.column_type_name
	, c.column_max_length
	, r.min_value
	, r.max_value
	
	, c.type_max_length
FROM 
	#RESULT r
INNER JOIN #COLUMNS c
	ON r.column_name = c.column_name

This will create the following result:

data_usage_result

In the example above you can see that several columns have a CHAR data type of length 40 but are unused and that costs data. By changing the column data type from CHAR to VARCHAR it was possible to save a lot of data especially if you’re dealing with millions of rows.

Hope this helps you out to analyze your tables a little faster.

Procedure statistics over period of time

Standard

procedureRecently I enjoyed a PASS session with Paul Randal where I saw a nice script which took two snapshots of the wait statistics and calculated the delta of the two. If you’re interested you can find the blog post here.

Based on that idea I wanted a script that did the same thing but than for stored procedures.

The script will take a snapshot of the sys.dm_exec_procedure_stats, wait for a set amount of time and take another snapshot. Based on the values of these two snapshots the script will calculate several values.

/*============================================================================
  File:     ProcedureStatsPeriod.sql
 
  Summary:  Take snapshots of the procedure stats
 
  SQL Server Versions: 2005 onwards
------------------------------------------------------------------------------
  Written by Sander Stad, SQLStad.nl
 
  (c) 2015, SQLStad.nl. All rights reserved.
 
  For more scripts and sample code, check out http://www.SQLStad.nl
 
  You may alter this code for your own *non-commercial* purposes (e.g. in a
  for-sale commercial tool). Use in your own environment is encouraged.
  You may republish altered code as long as you include this copyright and
  give due credit, but you must obtain prior permission before blogging
  this code.
 
  THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF
  ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED
  TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
  PARTICULAR PURPOSE.
============================================================================*/


-- Drop the temporary tables in case they exist
IF EXISTS (SELECT * FROM [tempdb].[sys].[objects]
    WHERE [name] = N'##ProcStats1')
    DROP TABLE [##ProcStats1];
 
IF EXISTS (SELECT * FROM [tempdb].[sys].[objects]
    WHERE [name] = N'##ProcStats2')
    DROP TABLE [##ProcStats2];
GO

SELECT
	d.object_id
	, d.database_id
	, DB_NAME(d.database_id) AS [database_name]
	, OBJECT_NAME(object_id, database_id) [procedure_name]
	, d.cached_time
	, d.last_execution_time
	, d.total_elapsed_time
	, d.total_elapsed_time/d.execution_count AS [avg_elapsed_time]
	, d.last_elapsed_time
	, d.total_worker_time
	, d.last_worker_time
	, d.total_worker_time/d.execution_count AS [avg_worker_time]
	, d.execution_count
INTO
	##ProcStats1
FROM 
	sys.dm_exec_procedure_stats AS d
GO

-- Lets wait
WAITFOR DELAY '00:05:00'
GO

-- Collect the data for the second snapshot
SELECT
	d.object_id
	, d.database_id
	, DB_NAME(d.database_id) AS [database_name]
	, OBJECT_NAME(object_id, database_id) [procedure_name]
	, d.cached_time
	, d.last_execution_time
	, d.total_elapsed_time
	, d.total_elapsed_time/d.execution_count AS [avg_elapsed_time]
	, d.last_elapsed_time
	, d.last_worker_time
	, d.total_worker_time
	, d.total_worker_time/d.execution_count AS [avg_worker_time]
	, d.execution_count
INTO
	##ProcStats2
FROM 
	sys.dm_exec_procedure_stats AS d
GO


WITH [DiffProcStats] AS
(
	SELECT
	-- Proc stats that weren't in the first snapshot
		[ps2].[object_id]
		,[ps2].[database_name]
		,[ps2].[procedure_name]
		,[ps2].[cached_time]
		,[ps2].[last_execution_time]
		,[ps2].[total_elapsed_time]
		,[ps2].[last_elapsed_time]
		, [ps2].[total_worker_time]
		,[ps2].[last_worker_time]
		,[ps2].[execution_count] 
    FROM [##ProcStats2] AS [ps2]
    LEFT OUTER JOIN [##ProcStats1] AS [ps1]
		ON [ps2].[object_id] = [ps1].[object_id]
		AND [ps2].[database_id] = [ps1].[database_id]
		AND [ps2].[procedure_name] = [ps1].[procedure_name]
    WHERE [ps1].[database_name] IS NULL
    --AND [ps2].[total_elapsed_time] > 0
UNION
SELECT
-- Diff of elapsed time in both snapshots
        [ps2].[object_id]
		, [ps2].[database_name]
		, [ps2].[procedure_name]
		, [ps2].[cached_time]
		, CAST([ps2].[last_execution_time] AS DECIMAL (16, 4)) AS [last_execution_time]
		, CAST([ps2].[total_elapsed_time] - [ps1].[total_elapsed_time] AS DECIMAL (16, 4)) AS [elapsed_time_ms]
		, CAST([ps2].[last_elapsed_time] - [ps1].[last_elapsed_time] AS DECIMAL (16, 4)) AS [last_elapsed_time_ms]
		, CAST([ps2].[total_worker_time] - [ps1].[total_worker_time] AS DECIMAL (16, 4)) AS [total_worker_time_ms]
		, CAST([ps2].[last_worker_time] - [ps1].[last_worker_time] AS DECIMAL (16, 4)) AS [last_worker_time_ms]
		, CAST([ps2].[execution_count]  - [ps1].[execution_count] AS DECIMAL (16, 4)) AS [execution_count]
    FROM 
		[##ProcStats2] AS [ps2]
    LEFT OUTER JOIN [##ProcStats1] AS [ps1]
        ON [ps2].[object_id] = [ps1].[object_id]
	WHERE 
		[ps1].[database_name] IS NOT NULL
)
,[ProcStats] AS (
	SELECT
		[object_id]
		, [database_name]
		, [procedure_name] AS [procedure_name]
        , MAX([cached_time]) AS [cached_time]
		, MAX([last_execution_time]) AS [last_execution_time]
		, [total_elapsed_time] AS [total_elapsed_time]
		, [last_elapsed_time] AS [elapsed_time]
		, [last_worker_time] AS [last_worker_time]
		, [total_worker_time] AS [total_worker_time]
		, [execution_count] AS [execution_count]	
		, 100.0 * [total_elapsed_time] / SUM ([total_elapsed_time]) OVER() AS [percentage]
		, ROW_NUMBER() OVER(ORDER BY [object_id] DESC) AS [rownum]
    FROM 
		[DiffProcStats]
	WHERE
		database_name IS NOT NULL
	GROUP BY
		[object_id]
		, [database_name]
		, [procedure_name]
		, [total_elapsed_time]
		, [last_elapsed_time]
		, [last_worker_time]
		, [total_worker_time]
		, [execution_count]
)


SELECT
	[p1].[object_id]
    , [p1].[database_name] 
	, [p1].[procedure_name]
	, [p1].[cached_time]
	, [p1].[last_execution_time]
	, [p1].[total_elapsed_time] /1000 AS [total_elapsed_time_s]
	, [p1].[elapsed_time] /1000 AS [last_elapsed_time_s]
	, [p1].[last_worker_time] /1000 AS [last_worker_time_s]
	, [p1].[total_worker_time] /1000 AS [total_worker_time_s]
	, [p1].[percentage] AS [percentage]
	, [p1].[execution_count]
	, CAST([p1].[elapsed_time] / [p1].[execution_count] / 1000 AS DECIMAL (16, 4)) AS [avg_elapsed_time_s]
	, CAST([p1].[total_worker_time]/ [p1].[execution_count] /1000 AS DECIMAL (16, 4)) AS [avg_worker_time_s]
FROM [ProcStats] AS [p1]
INNER JOIN [ProcStats] AS [p2]
    ON [p2].[RowNum] <= [p1].[RowNum]
WHERE
	[p1].execution_count > 0
GROUP BY 
	[p1].[object_id]
	, [p1].[database_name] 
	, [p1].[procedure_name]
	, [p1].[cached_time]
	, [p1].[last_execution_time]
	, [p1].[total_elapsed_time]
	, [p1].[elapsed_time]
	, [p1].[last_worker_time]
	, [p1].[total_worker_time]
	, [p1].[execution_count]
	, [p1].[percentage]
HAVING 
	SUM ([p2].[Percentage]) - [p1].[Percentage] < 95 -- percentage threshold
ORDER BY
	[percentage] DESC;
GO

-- Cleanup
IF EXISTS (SELECT * FROM [tempdb].[sys].[objects]
    WHERE [name] = N'##ProcStats1')
    DROP TABLE [##ProcStats1];
 
IF EXISTS (SELECT * FROM [tempdb].[sys].[objects]
    WHERE [name] = N'##ProcStats2')
    DROP TABLE [##ProcStats2];

Thank again Paul for the idea.