Commissioning an architecture to host a data warehouse can be tricky, and computing the storage requirements is probably my least favourite task. A full guide is probably best saved for another post, but when you have character (char, varchar, etc.) columns, you want to find the average length of the data stored in them in your source. There’s a simple piece of SQL for this:
select avg(len([ColumnName])) from [TableName]
Easy to execute a few time, but if you have to run it against a few hundred columns it’ll get painful to do manually. Here’s a script to do it all for you and return a nice table of results. It relies on creating a table which has the table and column names in it, as below.
(Hint – your source to target map spreadsheet has this already in it, doesn’t it?)
With this in place, just run the following script:
declare @table varchar(20), @column varchar(50)
declare @results table (TableName varchar(20), ColumnName varchar(50), ColumnLength int)
declare @sql varchar(1000)
declare c1 cursor read_only
for select TableName, ColumnName from CharacterColumns
open c1
fetch next from c1 into @table, @column
while @@fetch_status = 0
begin
set @sql =
'select ''' + @table + ''', ''' + @column + ''', avg(len(' + @column + ')) from ' + @table
fetch next from c1 into @table, @column
insert into @results exec (@sql)
end
close c1
deallocate c1
select * from @results
Go and make a cup of tea, or lunch, or whatever it takes depending on the number of columns you are working with and the number of rows in each table. When you get back, you’ve got a result set ready to be pasted into your sizing document.
