Analyze table usage for present data

Standard

dataThe time I’ve spend analyzing tables for effective database modeling is just too much. I don’t like to do things twice and I’d rather spend a little more time to develop something smart than doing things over and over again. This was also the case where I had to analyze table usage for a set of tables in a databases.

The question asked was to see the minimum and the maximum length for all the columns in a table. Doing this is not that hard but making it effective to execute it against more than one column is a bit harder. Still not rocket science but needs a little more actions are needed to get things done.

The script below will take a few arguments

  • table_name
  • sample

The table_name argument is just the table name you want to analyze. You could change the query so it will execute against all the tables but that’s a little too far for the purpose of this script.

The sample argument allows you to take a subset of data instead of the entire table. The reason I did that was because I’m working with tables with more than a 100 million rows and that could take a long time before everything would’ve been read.

The script will take the amount of rows and put it in a temporary table. Than get all the columns from the specified table. After this it will create the queries needed to get all the lengths needed and make up a result.

At the end the result is combined with the information from the columns.

This will create the following result:

data_usage_result

In the example above you can see that several columns have a CHAR data type of length 40 but are unused and that costs data. By changing the column data type from CHAR to VARCHAR it was possible to save a lot of data especially if you’re dealing with millions of rows.

Hope this helps you out to analyze your tables a little faster.

One thought on “Analyze table usage for present data

Leave a Reply