The time I’ve spend analyzing tables for effective database modeling is just too much. I don’t like to do things twice and I’d rather spend a little more time to develop something smart than doing things over and over again. This was also the case where I had to analyze table usage for a set of tables in a databases.
The question asked was to see the minimum and the maximum length for all the columns in a table. Doing this is not that hard but making it effective to execute it against more than one column is a bit harder. Still not rocket science but needs a little more actions are needed to get things done.
The script below will take a few arguments
- table_name
- sample
The table_name argument is just the table name you want to analyze. You could change the query so it will execute against all the tables but that’s a little too far for the purpose of this script.
The sample argument allows you to take a subset of data instead of the entire table. The reason I did that was because I’m working with tables with more than a 100 million rows and that could take a long time before everything would’ve been read.
The script will take the amount of rows and put it in a temporary table. Than get all the columns from the specified table. After this it will create the queries needed to get all the lengths needed and make up a result.
At the end the result is combined with the information from the columns.
DECLARE @table_name NVARCHAR(250) DECLARE @query NVARCHAR(MAX) DECLARE @sample NVARCHAR(100) IF OBJECT_ID('tempdb..##TEMP') IS NOT NULL DROP TABLE ##TEMP IF OBJECT_ID('tempdb..#COLUMNS') IS NOT NULL DROP TABLE #COLUMNS IF OBJECT_ID('tempdb..#RESULT') IS NOT NULL DROP TABLE #RESULT -- Replace the values in these variables SET @table_name = '<table name>' SET @sample = '<amount of records to test>' -- Create the table for the results CREATE TABLE #RESULT ( table_name VARCHAR(250) , column_name VARCHAR(250) , min_value INT , max_value INT ) -- Get the sample from the table EXECUTE ('SELECT TOP ' + @sample + ' * INTO ##TEMP FROM ' + @table_name +';') -- Get the information about the columns SELECT c.name AS [column_name] , c.max_length AS [column_max_length] , t.name AS [column_type_name] , t.max_length AS [type_max_length] INTO #COLUMNS FROM sys.columns c INNER JOIN sys.types t ON t.user_type_id = c.user_type_id WHERE OBJECT_NAME(object_id) = @table_name ORDER BY column_id -- Reset the query variable SELECT @query = '' -- Make up the queries for getting the lengths of the columns SELECT @query += ' INSERT INTO #RESULT SELECT ''' + @table_name + ''', ''' + column_name + ''' ,ISNULL(MIN(LEN(' + column_name + ')), 0) , ISNULL(MAX(LEN(' + column_name + ')), 0) FROM ##TEMP;' FROM #COLUMNS -- Execute the queries EXECUTE sp_executesql @query -- Get the result SELECT r.table_name , r.column_name , c.column_type_name , c.column_max_length , r.min_value , r.max_value , c.type_max_length FROM #RESULT r INNER JOIN #COLUMNS c ON r.column_name = c.column_name
This will create the following result:
In the example above you can see that several columns have a CHAR data type of length 40 but are unused and that costs data. By changing the column data type from CHAR to VARCHAR it was possible to save a lot of data especially if you’re dealing with millions of rows.
Hope this helps you out to analyze your tables a little faster.