Had a requirement to establish whether a provided string in a source system matched a valid pattern for email addresses today. Not to actually try to send a mail to the address, just to check that it was theoretically valid.
There is a useful bit of regular expression code available from Microsoft at http://msdn.microsoft.com/en-us/library/01escwtf.aspx so all I had to do was work it into a script component. Here’s how.
Add a script component to the data flow and set the properties you’ll need. The input column will be the email address field you want to check, then add an output column with the DataType Boolean.
In the script itself, you need to add the regular expressions library to the script’s using statements:
using System; using System.Text.RegularExpressions;
Immediately after this, add a new class to actually parse the email address and use the regex to determine its validity:
public class RegexUtilities { public static bool IsValidEmail(string strIn) { // Return true if strIn is in valid e-mail format. return Regex.IsMatch(strIn, @"^(?("")("".+?""@)|(([0-9a-zA-Z]((\.(?!\.))|[-!#\$%&'\*\+/=\?\^`\{\}\|~\w])*)(?<=[0-9a-zA-Z])@))" + @"(?(\[)(\[(\d{1,3}\.){3}\d{1,3}\])|(([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,6}))$"); } }
Finally, inside the ProcessInputRow add some code to execute the validation class on each row:
public override void Input0_ProcessInputRow(Input0Buffer Row) { string emailAddress; emailAddress = Row.EMAILADDRESS; bool valid = false; if (RegexUtilities.IsValidEmail(emailAddress)) valid = true; else valid = false; Row.ValidEmailAddress = valid; }
The field ValidEmailAddress will now be on the dataflow.
