Recently, I blogged about Indirect configuration with SSIS. In the first SSIS project I did, I questioned myself, search through web sites, to find out whether I should use direct instead of indirect configuration. Here are the pros and cons of direct and indirect configurations.
Direct configuration
Pros:
- Doesn't need environment variables creation or maintenance
- Scale well when multiple databases (e.g. TEST and Pre-Prod) are used on the same server
- Changes can be made to the configurations files (.dtsconfig) when deployment is made using SSIS deployment utility
Cons:
- Need to specify configuration file that we want to use when the package is triggered with DTExec (/conf switch).
- If multiple layers of packages are used (parent/child packages), need to transfer configured values from the parent to the child package using parent packages variables which can be tricky (if one parent variable is missing, the rest of the parent package configs (parameters) will not be transferred).
- The two above cons can be bypassed by using SSIS deployment wizard, so if the configuration file switch (/conf) with DTExec is not used, packages need to be deployed via SSIS configuration wizard
Indirect configuration
Pros:
- All packages can reference the configuration file(s) via environment variable
- Packages can be deployed simply using copy/paste or xcopy, no need to mess with SSIS deployment utility
- Packages or application is not dependent of configuration switches when triggered with DTExec utility (command line is much simpler)
- Multiple layers (parent/child levels) scale better since all packages has all configuration values it needs to execute. They do not depend on parent packages and a child package can be used as a parent packages without problems (no need to remove or add parent packages configurations
Cons:
- Require environment variables to be created
- Does not support easily multiple databases (e.g. TEST and Pre-Prod) to be used on the same server
Using multiple package set on the same server
While this is not current, it happens sometime that the same server is used for both Test and Pre-Production. Using indirect configuration, An environment variable holds the reference to a configuration file. Since there are multiple databases or package set on the same server, indirect configuration cannot be used with system environment variables.
This problem can be circumvented by using user environment variables, but this is more complicated since it requires more than one user to be created to be able to launch loads. Also, remote desktop if often used to enable developers to connect onto test/pre-prod servers for debugging or deployment purposes using their active directory (AD) account. This approach would require multiplying AD accounts by the number of databases on the server.
Another approach would be to isolate databases by using virtual machines. But, with high volume ET loads, adding another layer between OS and package set can fool performance and tuning statistics and therefore strategies used to improve overall load performance. That said, having a slower machine while testing force developers to improve their packages loading performances, which cannot be harmful when deployment of the application onto production server :-).
Which method should you use?
The answer to this question is: it depends. If you are confident that you are going to use the same path to store your configurations files in Dev/Test/Pre-prod and production servers, direct configuration is an option.
The most flexible approach is definitively the indirect configuration. Using this method, you can change configuration file location and even its name. Also, I like having my packages behave like business objects: that is they the necessary logic to access what they need to function properly. A package know where to look to find out its core configurations (connection strings for example). It does not rely on parent package for them.
I hope this article help some of you decide which configuration method to use. On all the projects I did ( or doing), I always need to think of which method to use. It always depend on the architecture and deployment method used in place. But, when I have the choice (nothing has ever been done yet), indirect configuration is my first choice.
Leave a Reply