Practical Business Intelligence

April 14, 2010

Unzipping and Zipping Files in SSIS

Filed under: Scripting, SSIS — Duncan Sutcliffe @ 5:46 pm

Got asked today whether SSIS can unzip files.  Of course it can, I thought – after all the Windows shell has had the ability to work with (simple) zip files since (I think) Windows 95.  So there must be some easily accessible methods I can use in a Script task.

Except – there aren’t.  You kind of can use some methods in the Windows base library but I don’t think the average SSIS developer really wants to be fishing around trying to sort that out.  So, there are some Java libraries you can use but that’s also rather awkward.  And then I came across a brilliant open source library for working with zip files on Codeplex called DotNetZip.  This is delivered as a managed code library so it can be added to the SSIS host then used in a Script task.  It took me only a few minutes to have this working – here’s how.

The first thing to do is to download the latest release of the library.  For development environments you will need to get the full DevKit release file as this includes the actual library plus the Intellisense XML file and various help files and other bits and pieces.

Open the downloaded zip and extract the following files to a directory (for example c:\DotNetZip\):

  • [Version]\Release\Ionic.Zip.dll
  • [Version]\Debug\Ionic.Zip.XML

You then need to copy them to your .NET framework directory (which is likely to be C:\Windows\Microsoft.NET\Framework\v2.0.nnnnn) and register them in the Global Assembly Cache using the command line instruction:

gacutil /i c:\DotNetZip\Ionic.Zip.dll

The location of gacutil.exe will vary depending on the version of Visual Studio you have installed.  On a Visual Studio 2008 installation it can be found at C:\Program Files\Microsoft SDKs\Windows\v6.0A\bin.
The library will now be available in any SSIS script task, in a control flow or a data flow.

Note that a script task will give you a failure error if you don’t register the dll in the GAC.

As an example, here’s how you can combine an unzip script task with a ForEach loop container in order to extract all the contents of every zip file.  This might be useful if you have source files delivered from an upstream system as zips, or if you retrieve zip files from an FTP server, for example.

You could actually loop through all the files in a directory just using a script task, but as the ForEach container is more familiar to most SSIS developers, I am going to go with that. Firstly, add a ForEach Loop container to a control flow and place a Script task inside it.

Set the properties of the loop container to work correctly with your environment – it’s a Foreach File Enumerator.

As the container loops, the current filename will be passed into the script which unzips the files.  This needs a variable, which is set on the Variable Mappings tab:

Then configure the script task, making sure the file name variable is being passed in as a ReadOnly Variable:

Open the script editor and add the DotNetZip library as a reference by right-clicking on the References node in the Project Explorer window. On the .NET tab, scroll down to the entry Ionic’s Zip Library (it will be there as long as the file was added to the framework directory as detailed above), highlight it and click OK.

With the reference added, you just need to include a using statement at the top of the file to use the methods provided by the library.

using Ionic.Zip;

You now have access to all the methods provided by the library.  For this example, we just want to unzip a file, so the code you would need is:

        {
            // Get the value of the current zip file in the directory
            Variables vars = Dts.Variables;
            String zipFile = vars["strZipFile"].Value.ToString();

            // Set the location for unzipped files - this could also be set by a variable of course.
            String targetDirectory = @"c:\test\unzipped";

            // Read the contents of the zip file
            ZipFile zip  = ZipFile.Read(zipFile);

            // Extract all the files in the zip to the target
            foreach (ZipEntry e in zip)
            {
                e.Extract(targetDirectory, ExtractExistingFileAction.OverwriteSilently);
            }

            // End of script
            Dts.TaskResult = (int)ScriptResults.Success;
        }

The library supports pretty much everything you could ever want to do with a zip file – creating, adding files, extracting files, handling passwords, compression levels etc.

Of course, there is another option – you could use the Execute Process task to call an external unzipper.  But that would rely on you having the right software installed on the SSIS machine, which may not fit with the corporate policy.  And it’s less elegant, which counts.

The Shocking Blue Green Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.