Data taps are one of the new features in SSIS 2012 in the data and performance troubleshooting
category. In a nutshell, they allow us, at runtime, to choose a given path from a data flow and capture a copy of the data at that specific point of the data flow in a .csv file on a given execution instance of the package.
Let's break it down some more:
Run-time. Data taps are a runtime artifact, so as developer, you don't define them as you design and develop packages in SSDT. Rather, you do it after the packages are deployed to the server by way of running some stored procedures in SSMS.
Data Flow Path: Data taps are defined at the data flow level, and within a data flow, at the path level. Paths are the blue or red arrows in the data flow that connect inputs and outputs of data flow components. You can read the books online definition of a path here.
Execution Instance: A data tap captures data in a given data flow path and stores it in a .csv file. This happens within a single execution instance of the package. That means we need to add the data tap each time we run the package during our troubleshooting exercise.
You could think of data taps in a similar way you think about data viewers in
Setting up a Data Tap
Now, let’s see an example and some pictures to show how you can add a data tap to a data flow.First, we create a package with a data flow in it and deploy it to the SSIS server. For this example, I have a pretty straight forward data flow that gets a list of 4 products via OLE DB Source, does some string manipulation and then loads the data into an OLE DB destination. As you can see in the picture, this data flow has 2 paths (blue arrows linking data flow components). Note the value of IdenificationString property of the data flow path as we will use it later.
.
We will need some pieces of information about the package we want to tap data from before we can setup the data tap:
Folder Name: The folder within SSIS server where the package is deployed.
Project Name: The name of the SSIS project that holds the package
Package Name: well, just the package name.
Data Flow Path ID: The value of the IdentificationString property of the data flow path where we want to add the data tap (highlighted in yellow in the picture above).
Data flow task path: The path of the data flow task within the package.
Once we have this information, it is time to add the data tap(s) and execute the package. We do all this in SQL Server Management Studio by running 3 stored procedures that are built-in the SSIS catalog:
- Create a execution instance for the package by running [catalog].[create_execution] stored procedure
- Add the data tap(s) by running [catalog].[add_data_tap]
- Run the package by running [catalog].[start_execution] package
https://raw.github.com/gist/1558450/f0a254d68a047f6199e6f53566c195cfcf06e192/gistfile1.sql
If everything goes well, you will have the .csv files in the
Folder with data tap files |
Is That it?
Almost. The example in the post is quite simplistic and the bare minimum I could come up with to get a data tap working. There is additional information in books online about them that you may want to review:catalog.add_data_tap
catalog.remove_data_tap
catalog.execution_data_taps
catalog.execution_data_statistics
Conclusion
Troubleshooting data issues can get hairy at times and data taps are an extra tool in our belt that can help in cases where logging (which has also improve a lot in SSIS 2012) and dumps may not give us the required level of information. In the other hand, I would have appreciated a more seamless user experience, specially when getting the parameters needed by the stored procedures, and perhaps having more control over the file format and its content. As it stands, we may always need to open a copy of the package in SSDT to get the required metadata and the jump back to SSMS to complete the work. It would be great to have a point and click interface within SSMS that allows to navigate the structure of the package/dataflow and let us add a data fow with few clicks. I would like to see an interface like the 'package explorer' in BIDS for this, but unfortunately that suggestion did not get too much traction last time around and it was closed as 'won't fix'. may be in SSIS 2014?
Hello Rafael.
ReplyDeleteSorry for my poor english but i'm french and i'm angry with english language.
I just want to say you that i think the Data flow task path can be easily found in the data flow properties on the line "PackagePath".
Good job an good article
Best Regards,
David
Hi David,
ReplyDeleteThanks for your comment. You are absolutely right! The path to the data flow is part of its own properties. I missed that one.
I updated my post.
Rafael,
ReplyDeleteCan you provide your example scripts for create_execution, add_data_tap, and start_execution? I tried following your instructions but SQL Server tells me it cannot find any of those stored procedures.
Thanks,
Dave
Hi Dave,
DeleteIt looks like the new template I used for my blog removed the embedded code from the post. I have included a link to it. Let me know if it works. Thanks for pointing that out.
Rafael
Thanks! That's very helpful.
DeleteRafael,
DeleteI used your script, and the package executes successfully, but no data tap file is created. Any idea what I might be doing wrong (file permissions, etc.)?
Thanks,
Dave