Analyze table usage for present data

Standard

dataThe time I’ve spend analyzing tables for effective database modeling is just too much. I don’t like to do things twice and I’d rather spend a little more time to develop something smart than doing things over and over again. This was also the case where I had to analyze table usage for a set of tables in a databases.

The question asked was to see the minimum and the maximum length for all the columns in a table. Doing this is not that hard but making it effective to execute it against more than one column is a bit harder. Still not rocket science but needs a little more actions are needed to get things done.

The script below will take a few arguments

  • table_name
  • sample

The table_name argument is just the table name you want to analyze. You could change the query so it will execute against all the tables but that’s a little too far for the purpose of this script.

The sample argument allows you to take a subset of data instead of the entire table. The reason I did that was because I’m working with tables with more than a 100 million rows and that could take a long time before everything would’ve been read.

The script will take the amount of rows and put it in a temporary table. Than get all the columns from the specified table. After this it will create the queries needed to get all the lengths needed and make up a result.

At the end the result is combined with the information from the columns.

DECLARE @table_name NVARCHAR(250)
DECLARE @query NVARCHAR(MAX)
DECLARE @sample NVARCHAR(100)

IF OBJECT_ID('tempdb..##TEMP') IS NOT NULL DROP TABLE ##TEMP
IF OBJECT_ID('tempdb..#COLUMNS') IS NOT NULL DROP TABLE #COLUMNS
IF OBJECT_ID('tempdb..#RESULT') IS NOT NULL DROP TABLE #RESULT

-- Replace the values in these variables
SET @table_name = '<table name>'
SET @sample = '<amount of records to test>'

-- Create the table for the results
CREATE TABLE #RESULT
(
	table_name VARCHAR(250)
	, column_name VARCHAR(250)
	, min_value INT
	, max_value INT
)

-- Get the sample from the table
EXECUTE ('SELECT TOP ' + @sample + ' * INTO ##TEMP FROM ' + @table_name +';')

-- Get the information about the columns
SELECT 
	c.name AS [column_name]
	, c.max_length AS [column_max_length]
	, t.name AS [column_type_name]
	, t.max_length AS [type_max_length]
INTO #COLUMNS FROM sys.columns c
INNER JOIN sys.types t ON t.user_type_id = c.user_type_id
WHERE OBJECT_NAME(object_id) = @table_name
ORDER BY column_id

-- Reset the query variable
SELECT @query = ''

-- Make up the queries for getting the lengths of the columns
SELECT @query += '
		INSERT INTO #RESULT 
		SELECT ''' + @table_name 
			+ ''', ''' + column_name + '''
			,ISNULL(MIN(LEN(' + column_name + ')), 0)
			, ISNULL(MAX(LEN(' + column_name + ')), 0) 
		FROM ##TEMP;'
FROM
	#COLUMNS

-- Execute the queries
EXECUTE sp_executesql @query

-- Get the result
SELECT 
	r.table_name
	, r.column_name
	, c.column_type_name
	, c.column_max_length
	, r.min_value
	, r.max_value
	
	, c.type_max_length
FROM 
	#RESULT r
INNER JOIN #COLUMNS c
	ON r.column_name = c.column_name

This will create the following result:

data_usage_result

In the example above you can see that several columns have a CHAR data type of length 40 but are unused and that costs data. By changing the column data type from CHAR to VARCHAR it was possible to save a lot of data especially if you’re dealing with millions of rows.

Hope this helps you out to analyze your tables a little faster.

One thought on “Analyze table usage for present data

Leave a Reply