Practical Business Intelligence

December 5, 2010

Extracting Source Records by Time Window

Filed under: Business Intelligence, ETL, SSIS — Duncan Sutcliffe @ 2:32 pm

More and more source systems seem to include 24 hour operation – for example retailers who might once have worked nine to five now accept orders via the Internet.  At the same time, data warehouses are getting closer to ‘real time’ information provision by running load procedures more than once per day.

So it’s more important than ever to make sure you set a defined time period for which your ETL is loading records, with a start and end time.  This period needs to be respected for every extract in a run, so for example you need to make sure you extract order headers and order lines which cover exactly the same time period.

Here’s the simplest but most reliable way I have been able to find to do this.

The first step in the SSIS defines a new load and sets the date/time range which the load covers.  Subsequent steps use these time periods when selecting source records.

Simplified SSIS package showing setting a time period range with an Execute SQL task(In a “real” solution I’d expect the first step to be done in your ‘master’ package and the results passed to subsequent child packages.)

The Execute SQL step does two things: creates a new load and sets the time window it is working with, then places the information in package variables.  The variables required are a load ID, a start date/time and a finish date/time:

image

In the meta-database which logs and controls the ETL runs, there is a simple table which mirrors these three values.

CREATE TABLE [dbo].[LoadWindowControl]( [LoadID] [int] identity NOT NULL, [LoadFrom] [datetime] NOT NULL, [LoadTo] [datetime] NOT NULL ) ON [PRIMARY]; GO

The SQL task calls a stored procedure:

CREATE PROCEDURE [dbo].[uspSetLoadWindow] @LoadID int output, @LoadFrom datetime output, @LoadTo datetime output AS BEGIN SET NOCOUNT ON; declare @StartTime datetime; select @StartTime = dateadd(s, 1, coalesce(max(LoadTo), 0)) from LoadWindowControl; declare @EndTime datetime; set @EndTime = convert(datetime, convert(varchar(30), CURRENT_TIMESTAMP, 120)); insert into LoadWindowControl (LoadFrom, LoadTo) values (@StartTime, @EndTime) select @LoadID = @@IDENTITY, @LoadFrom = @StartTime, @LoadTo = @EndTime END; GO

This is very simple – just adding a new record into the list of loads with an end date of “now” and a start date of the last end plus one second.  All three of these values are passed back as output parameters so the SSIS package can put them into variables.  Calling the proc from SSIS is easy:

image

And the parameter mapping inserts the right value to the right variable:

image

The first time this package runs, the table gets a record like this:

image

To check these values are being passed into the SSIS variables, here’s a quick look at the Locals window for anyone confused by these dates, remember I’m in the UK so the format is dd/mm/yyyy):

image

With these parameter values now populated them you can use them as part of the SQL command in each source.  For example:

image

image

For each package you can pass these variables through as configurations, or you could just pass through the load ID and retrieve the related dates using an Execute SQL task.

About these ads

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Shocking Blue Green Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: