More and more source systems seem to include 24 hour operation – for example retailers who might once have worked nine to five now accept orders via the Internet. At the same time, data warehouses are getting closer to ‘real time’ information provision by running load procedures more than once per day.
So it’s more important than ever to make sure you set a defined time period for which your ETL is loading records, with a start and end time. This period needs to be respected for every extract in a run, so for example you need to make sure you extract order headers and order lines which cover exactly the same time period.
Here’s the simplest but most reliable way I have been able to find to do this.
The first step in the SSIS defines a new load and sets the date/time range which the load covers. Subsequent steps use these time periods when selecting source records.
The Execute SQL step does two things: creates a new load and sets the time window it is working with, then places the information in package variables. The variables required are a load ID, a start date/time and a finish date/time:
In the meta-database which logs and controls the ETL runs, there is a simple table which mirrors these three values.
CREATE TABLE [dbo].[LoadWindowControl]( [LoadID] [int] identity NOT NULL, [LoadFrom] [datetime] NOT NULL, [LoadTo] [datetime] NOT NULL ) ON [PRIMARY]; GO
The SQL task calls a stored procedure:
CREATE PROCEDURE [dbo].[uspSetLoadWindow] @LoadID int output, @LoadFrom datetime output, @LoadTo datetime output AS BEGIN SET NOCOUNT ON; declare @StartTime datetime; select @StartTime = dateadd(s, 1, coalesce(max(LoadTo), 0)) from LoadWindowControl; declare @EndTime datetime; set @EndTime = convert(datetime, convert(varchar(30), CURRENT_TIMESTAMP, 120)); insert into LoadWindowControl (LoadFrom, LoadTo) values (@StartTime, @EndTime) select @LoadID = @@IDENTITY, @LoadFrom = @StartTime, @LoadTo = @EndTime END; GO
This is very simple – just adding a new record into the list of loads with an end date of “now” and a start date of the last end plus one second. All three of these values are passed back as output parameters so the SSIS package can put them into variables. Calling the proc from SSIS is easy:
And the parameter mapping inserts the right value to the right variable:
The first time this package runs, the table gets a record like this:
To check these values are being passed into the SSIS variables, here’s a quick look at the Locals window for anyone confused by these dates, remember I’m in the UK so the format is dd/mm/yyyy):
With these parameter values now populated them you can use them as part of the SQL command in each source. For example:
For each package you can pass these variables through as configurations, or you could just pass through the load ID and retrieve the related dates using an Execute SQL task.