Tuesday, August 9, 2011

Aggregator Transformation Overview


Aggregator Transformation Overview
The Aggregator transformation performs aggregate calculations, such as averages and sums.
The Integration Service performs aggregate calculations as it reads and stores data group and row data in an aggregate cache
The  Aggregator transformation to perform calculations on groups. The Expression transformation permits you to perform calculations on a row-by-row basis only.
After you create a session that includes an Aggregator transformation, you can enable the session option, Incremental Aggregation. When the Integration Service performs incremental aggregation, it passes new source data through the mapping and uses historical cache data to perform new aggregation calculations incrementally.
The Aggregator is an active transformation, changing the number of rows in the pipeline. The Aggregator transformation has the following components and options:
Aggregate cache. The Integration Service stores data in the aggregate cache until it completes aggregate calculations. It stores group values in an index cache and row data in the data cache
Aggregate expression. Enter an expression in an output port. The expression can include non-aggregate expressions and conditional clauses.
Group by port. Indicate how to create groups. The port can be any input, input/output, output, or variable port. When grouping data, the Aggregator transformation outputs the last row of each group unless otherwise specified.
Sorted input. Select this option to improve session performance. To use sorted input, you must pass data to the Aggregator transformation sorted by group by port, in ascending or descending order.
Aggregator Transformation Properties
·         Cache Directory
·         Tracing Level
·         Sorted Input:
·         Aggregator Data Cache Size
·         Aggregator Index Cache Size
·         Transformation Scope
·          
Sorted Input: Indicates input data is presorted by groups. Select this option only if the mapping passes sorted data to the Aggregator transformation.

Aggregate Expressions
The Designer allows aggregate expressions only in the Aggregator transformation. An aggregate expression can include conditional clauses and non-aggregate functions. It can also include one aggregate function nested within another aggregate function

Nested Aggregate Functions
When you include single-level and nested functions in the same Aggregator transformation, the Designer marks the mapping or mapplet invalid. If you need to create both single-level and nested functions, create separate Aggregator transformations.

Conditional Clauses

Use conditional clauses in the aggregate expression to reduce the number of rows used in the aggregation. The conditional clause can be any clause that evaluates to TRUE or FALSE.
For example, use the following expression to calculate the total commissions of employees who exceeded their quarterly quota:
SUM( COMMISSION, COMMISSION > QUOTA )

Non-Aggregate Functions

You can also use non-aggregate functions in the aggregate expression.
The following expression returns the highest number of items sold for each item (grouped by item). If no items were sold, the expression returns 0.
IIF( MAX( QUANTITY ) > 0, MAX( QUANTITY ), 0))

Null Values in Aggregate Functions

When you configure the Integration Service, you can choose how you want the Integration Service to handle null values in aggregate functions. You can choose to treat null values in aggregate functions as NULL or zero. By default, the Integration Service treats null values as NULL in aggregate functions.
If you use sorted input and do not presort data correctly, you receive unexpected results.

Sorted Input Conditions

Do not use sorted input if either of the following conditions are true:
·         The aggregate expression uses nested aggregate functions.
·         The session uses incremental aggregation.
·         Source data is data driven.
If you use sorted input and do not sort data correctly, the session fails.
Use sorted input to decrease the use of aggregate caches.
Sorted input reduces the amount of data cached during the session and improves session performance. Use this option with the Sorter transformation to pass sorted data to the Aggregator transformation.
Limit connected input/output or output ports.
Limit the number of connected input/output or output ports to reduce the amount of data the Aggregator transformation stores in the data cache.
Filter the data before aggregating it.
If you use a Filter transformation in the mapping, place the transformation before the Aggregator transformation to reduce unnecessary aggregation.

5 comments: