The time I’ve spend analyzing tables for effective database modeling is just too much. I don’t like to do things twice and I’d rather spend a little more time to develop something smart than doing things over and over again. This was also the case where I had to analyze table usage for a set of tables in a databases.
The question asked was to see the minimum and the maximum length for all the columns in a table. Doing this is not that hard but making it effective to execute it against more than one column is a bit harder. Still not rocket science but needs a little more actions are needed to get things done.
The script below will take a few arguments
The table_name argument is just the table name you want to analyze. You could change the query so it will execute against all the tables but that’s a little too far for the purpose of this script.
The sample argument allows you to take a subset of data instead of the entire table. The reason I did that was because I’m working with tables with more than a 100 million rows and that could take a long time before everything would’ve been read.
The script will take the amount of rows and put it in a temporary table. Than get all the columns from the specified table. After this it will create the queries needed to get all the lengths needed and make up a result.
At the end the result is combined with the information from the columns.
DECLARE @table_name NVARCHAR(250)
DECLARE @query NVARCHAR(MAX)
DECLARE @sample NVARCHAR(100)
IF OBJECT_ID('tempdb..##TEMP') IS NOT NULL DROP TABLE ##TEMP
IF OBJECT_ID('tempdb..#COLUMNS') IS NOT NULL DROP TABLE #COLUMNS
IF OBJECT_ID('tempdb..#RESULT') IS NOT NULL DROP TABLE #RESULT
-- Replace the values in these variables
SET @table_name = '<table name>'
SET @sample = '<amount of records to test>'
-- Create the table for the results
CREATE TABLE #RESULT
, column_name VARCHAR(250)
, min_value INT
, max_value INT
-- Get the sample from the table
EXECUTE ('SELECT TOP ' + @sample + ' * INTO ##TEMP FROM ' + @table_name +';')
-- Get the information about the columns
c.name AS [column_name]
, c.max_length AS [column_max_length]
, t.name AS [column_type_name]
, t.max_length AS [type_max_length]
INTO #COLUMNS FROM sys.columns c
INNER JOIN sys.types t ON t.user_type_id = c.user_type_id
WHERE OBJECT_NAME(object_id) = @table_name
ORDER BY column_id
-- Reset the query variable
SELECT @query = ''
-- Make up the queries for getting the lengths of the columns
SELECT @query += '
INSERT INTO #RESULT
SELECT ''' + @table_name
+ ''', ''' + column_name + '''
,ISNULL(MIN(LEN(' + column_name + ')), 0)
, ISNULL(MAX(LEN(' + column_name + ')), 0)
-- Execute the queries
EXECUTE sp_executesql @query
-- Get the result
INNER JOIN #COLUMNS c
ON r.column_name = c.column_name
This will create the following result:
In the example above you can see that several columns have a CHAR data type of length 40 but are unused and that costs data. By changing the column data type from CHAR to VARCHAR it was possible to save a lot of data especially if you’re dealing with millions of rows.
Hope this helps you out to analyze your tables a little faster.