Informatica Depth Analysis: 2011

Tuesday, October 4, 2011

Performance Tuning Overview

Performance Tuning Overview

The goal of performance tuning is to optimize session performance by eliminating performance bottlenecks. To tune session performance, first identify a performance bottleneck, eliminate it, and then identify the next performance bottleneck until you are satisfied with the session performance. You can use the test load option to run sessions when you tune session performance.

If you tune all the bottlenecks, you can further optimize session performance by increasing the number of pipeline partitions in the session. Adding partitions can improve performance by utilizing more of the system hardware while processing the session.

Complete the following tasks to improve session performance:

	Optimize the target. Enables the Integration Service to write to the targets efficiently
	Optimize the source. Enables the Integration Service to read source data efficiently.

	Optimize the mapping. Enables the Integration Service to transform and move data efficiently..
	Optimize the transformation. Enables the Integration Service to process transformations in a mapping efficiently.

	Optimize the session. Enables the Integration Service to run the session more quickly.
	Optimize the grid deployments. Enables the Integration Service to run on a grid with optimal performance..

	Optimize the PowerCenter components. Enables the Integration Service and Repository Service to function optimally..
	Optimize the system. Enables PowerCenter service processes to run more quickly.

Bottlenecks Overview

The first step in performance tuning is to identify performance bottlenecks. Performance bottlenecks can occur in the source and target databases, the mapping, the session, and the system. The strategy is to identify a performance bottleneck, eliminate it, and then identify the next performance bottleneck until you are satisfied with the performance.

Look for performance bottlenecks in the following order:

	Target
	Source Mapping Session System

Use the following methods to identify performance bottlenecks:

	Run test sessions. You can configure a test session to read from a flat file source or to write to a flat file target to identify source and target bottlenecks.
	Analyze performance details. Analyze performance details, such as performance counters, to determine where session performance decreases.

	Analyze thread statistics. Analyze thread statistics to determine the optimal number of partition points.
	Monitor system performance. You can use system monitoring tools to view the percentage of CPU use, I/O waits, and paging to identify system bottlenecks. You can also use the Workflow Monitor to view system resource usage.

Using Thread Statistics

You can use thread statistics in the session log to identify source, target, or transformation bottlenecks. By default, the Integration Service uses one reader thread, one transformation thread, and one writer thread to process a session. The thread with the highest busy percentage identifies the bottleneck in the session.

The session log provides the following thread statistics:

	Run time. Amount of time the thread runs.
	Idle time. Amount of time the thread is idle. It includes the time the thread waits for other thread processing within the application. Idle time includes the time the thread is blocked by the Integration Service, but it not the time the thread is blocked by the operating system.

Busy time. Percentage of the run time the thread is by according to the following formula:

(run time - idle time) / run time X 100

You can ignore high busy percentages when the total run time is short, such as under 60 seconds. This does not necessarily indicate a bottleneck.

Thread work time. The percentage of time the Integration Service takes to process each transformation ina thread. The session log shows the following information for the transformation thread work time:

Thread work time breakdown:

<transformation name>: <number> percent

If a transformation takes a small amount of time, the session log does not include it. If a thread does not have accurate statistics, because the session ran for a short period of time, the session log reports that the statistics are not accurate.

Eliminating Bottlenecks Based on Thread Statistics

Complete the following tasks to eliminate bottlenecks based on thread statistics:

	If the reader or writer thread is 100% busy, consider using string datatypes in the source or target ports. Non-string ports require more processing.
	If a transformation thread is 100% busy, consider adding a partition point in the segment. When you add partition points to the mapping, the Integration Service increases the number of transformation threads it uses for the session. However, if the machine is already running at or near full capacity, do not add more threads.

If one transformation requires more processing time than the others, consider adding a pass-through partition point to the transformation.

Target Bottlenecks

The most common performance bottleneck occurs when the Integration Service writes to a target database. Small checkpoint intervals, small database network packet sizes, or problems during heavy loading operations can cause target bottlenecks.

Identifying Target Bottlenecks

To identify a target bottleneck, complete the following tasks:

Configure a copy of the session to write to a flat file target. If the session performance increases significantly, you have a target bottleneck. If a session already writes to a flat file target, you probably do not have a target bottleneck.

Read the thread statistics in the session log. When the Integration Service spends more time on the writer thread than the transformation or reader threads, you have a target bottleneck.

Eliminating Target Bottlenecks

Complete the following tasks to eliminate target bottlenecks:

Dropping Indexes and Key Constraints

Optimizing Flat File Targets

Increasing Database Checkpoint Intervals

Using Bulk Loads

Optimizing Oracle Target Databases

Increasing Database Network Packet Size

Source Bottlenecks

Performance bottlenecks can occur when the Integration Service reads from a source database. Inefficient query or small database network packet sizes can cause source bottlenecks.

Identifying Source Bottlenecks

You can read the thread statistics in the session log to determine if the source is the bottleneck. When the Integration Service spends more time on the reader thread than the transformation or writer threads, you have a source bottleneck.

If the session reads from a relational source, use the following methods to identify source bottlenecks:

	Filter transformation
	Read test mapping

Database query

If the session reads from a flat file source, you probably do not have a source bottleneck.

Using a Filter Transformation

You can use a Filter transformation in the mapping to measure the time it takes to read source data.

Add a Filter transformation after each source qualifier. Set the filter condition to false so that no data is processed passed the Filter transformation. If the time it takes to run the new session remains about the same, you have a source bottleneck.

Using a Read Test Mapping

You can create a read test mapping to identify source bottlenecks. A read test mapping isolates the read query by removing the transformation in the mapping.

To create a read test mapping, complete the following steps:

1.	Make a copy of the original mapping.
2.	In the copied mapping, keep only the sources, source qualifiers, and any custom joins or queries.

3.	Remove all transformations.
4.	Connect the source qualifiers to a file target.

Run a session against the read test mapping. If the session performance is similar to the original session, you have a source bottleneck.

Using a Database Query

To identify source bottlenecks, execute the read query directly against the source database.

Copy the read query directly from the session log. Execute the query against the source database with a query tool such as isql. On Windows, you can load the result of the query in a file. On UNIX, you can load the result of the query in /dev/null.

Measure the query execution time and the time it takes for the query to return the first row.

Eliminating Source Bottlenecks

Complete the following tasks to eliminate source bottlenecks:

Set the number of bytes the Integration Service reads per line if the Integration Service reads from a flat file source.

Have the database administrator optimize database performance by optimizing the query.

Increase the database network packet size.

Configure index and key constraints.

Expression Transformation Complete Overview

Transformation type:

Passive

Connected

Use the Expression transformation to calculate values in a single row. For example, you might need to adjust employee salaries, concatenate first and last names, or convert strings to numbers. You can also use the Expression transformation to test conditional statements before you pass the results to a target or other transformations.

Use the Expression transformation to perform non-aggregate calculations. To perform calculations involving multiple rows, such as sums or averages, use the Aggregator transformation.

Expression Transformation Components

You can create an Expression transformation in the Transformation Developer or the Mapping Designer.

An Expression transformation contains the following tabs:

Transformation. Enter the name and description of the transformation. The naming convention for an Expression transformation is EXP_TransformationName. You can also make the transformation reusable.

Ports. Create and configure ports.

Properties. Configure the tracing level to determine the amount of transaction detail reported in the session log file.

Metadata Extensions. Specify the extension name, datatype, precision, and value. You can also create reusable metadata extensions.

Saturday, October 1, 2011

how to create scdtype2

how to create rank transformation

how to create Union Transformation

Friday, September 30, 2011

Infosys Interview Questions (23rd sep 2011)

1.Explain u r project architecture?

2.What is staging area and why u r using in ur project?

3.if u have a source system contain records of three months jan,feb,march I want to load the data in to three targets jan,feb,mar draw the mapping?

4.diff b/w static lkp and dynamic lkp?

5.what is different types of joins in jnr t/r?

6.in filter t/r I can give filter condition to 4 what can happened?

7.In agg t/r I can use sorted input option but I cant pass sorted data what happened?

8.explain normalize t/r give one example how it can convert single input record to multiple o/p records?

9.what is verboseinitialization?

Thursday, September 22, 2011

What is the difference between a Primary Key and a Unique Index?

Column(s) that make the Primary Key of a table cannot be NULL since by definition, the Primary Key cannot be NULL since it helps uniquely identify the record in the table. The column(s) that make up the unique index can be nullable. A note worth mentioning over here is that different RDBMS treat this differently –> while SQL Server and DB2 do not allow more than one NULL value in a unique index column, Oracle allows multiple NULL values. That is one of the things to look out for when designing/developing/porting applications across RDBMS.
There can be only one Primary Key defined on the table where as you can have many unique indexes defined on the table (if needed).
Also, in the case of SQL Server, if you go with the default options then a Primary Key is created as a clustered index while the unique index (constraint) is created as a non-clustered index. This is just the default behavior though and can be changed at creation time, if needed.

So, if the unique index is defined on not null column(s), then it is essentially the same as the Primary Key and can be treated as an alternate key meaning it can also serve the purpose of identifying a record uniquely in the table.

Tuesday, August 9, 2011

Joiner Transformation Overview

Joiner Transformation Overview

Transformation type:

Active

Connected

Use the Joiner transformation to join source data from two related heterogeneous sources residing in different locations or file systems. You can also join data from the same source. The Joiner transformation joins sources with at least one matching column. The Joiner transformation uses a condition that matches one or more pairs of columns between the two sources.

The two input pipelines include a master pipeline and a detail pipeline or a master and a detail branch. The master pipeline ends at the Joiner transformation, while the detail pipeline continues to the target.

To join more than two sources in a mapping, join the output from the Joiner transformation with another source pipeline. Add Joiner transformations to the mapping until you have joined all the source pipelines.

following limitations on the pipelines you connect to the Joiner transformation:

· You cannot use a Joiner transformation when either input pipeline contains an Update Strategy transformation.

· You cannot use a Joiner transformation if you connect a Sequence Generator transformation directly before the Joiner transformation.

you must configure the transformation properties, join type, and join condition.

You can configure the Joiner transformation for sorted input to improve Integration Service performance.

Joiner transformation properties:

Ø Case-Sensitive String Comparison

Ø Cache Directory

Ø Join Type

Ø Null Ordering in Master: Not applicable for this transformation type.

Ø Null Ordering in Detail: Not applicable for this transformation type.

Ø Tracing Level

Ø Joiner Data Cache Size

Ø Joiner Index Cache Size

Ø Sorted Input

Ø Master Sort Order

Ø Transformation Scope

To improve performance for an unsorted Joiner transformation, use the source with fewer rows as the master source. To improve performance for a sorted Joiner transformation, use the source with fewer duplicate key values as the master.

By default, when you add ports to a Joiner transformation, the ports from the first source pipeline display as detail sources. Adding the ports from the second source pipeline sets them as master sources. To change these settings, click the M column on the Ports tab for the ports you want to set as the master source. This sets ports from this source as master ports and ports from the other source as detail ports.

Use one or more ports from the input sources of a Joiner transformation in the join condition. Additional ports increase the time necessary to join two sources. The order of the ports in the condition can impact the performance of the Joiner transformation. If you use multiple ports in the join condition, the Integration Service compares the ports in the order you specify.

The Designer validates datatypes in a condition. Both ports in a condition must have the same datatype. If you need to use two ports in the condition with non-matching datatypes, convert the datatypes so they match.

If you join Char and Varchar datatypes, the Integration Service counts any spaces that pad Char values as part of the string:

Char(40) = "abcd"

Varchar(40) = "abcd"

The Char value is “abcd” padded with 36 blank spaces, and the Integration Service does not join the two fields because the Char field contains trailing spaces.

Note: The Joiner transformation does not match null values. For example, if both EMP_ID1 and EMP_ID2 contain a row with a null value, the Integration Service does not consider them a match and does not join the two rows. To join rows with null values, replace null input with default values, and then join on the default values.

Note: A normal or master outer join performs faster than a full outer or detail outer join.

If you pass unsorted or incorrectly sorted data to a Joiner transformation configured to use sorted data, the session fails and the Integration Service logs the error in the session log file.

Do not place any of the following transformations between the sort origin and the Joiner transformation:

	Ø Custom
	Ø Unsorted Aggregator

	Ø Normalizer
	Ø Rank

	Ø Union transformation
	Ø XML Parser transformation

	Ø XML Generator transformation
	Ø Mapplet, if it contains one of the above transformations

Joining Two Branches of the Same Pipeline

When you join data from the same source, you can create two branches of the pipeline. When you branch a pipeline, you must add a transformation between the source qualifier and the Joiner transformation in at least one branch of the pipeline. You must join sorted data and configure the Joiner transformation for sorted input.

For example, you have a source with the following ports:

	Employee
	Department

Total Sales

In the target, you want to view the employees who generated sales that were greater than the average sales for their departments. To do this, you create a mapping with the following transformations:

Sorter transformation. Sorts the data.

Sorted Aggregator transformation. Averages the sales data and group by department. When you perform this aggregation, you lose the data for individual employees. To maintain employee data, you must pass a branch of the pipeline to the Aggregator transformation and pass a branch with the same data to the Joiner transformation to maintain the original data. When you join both branches of the pipeline, you join the aggregated data with the original data.

	Sorted Joiner transformation. Uses a sorted Joiner transformation to join the sorted aggregated data with the original data.
	Filter transformation. Compares the average sales data against sales data for each employee and filter out employees with less than above average sales.

Source->src_source->str_src->.|……pipeline1…….Jnr->fil->trg

| pipeline 2 |

Agg|

Joining two branches might impact performance if the Joiner transformation receives data from one branch much later than the other branch. The Joiner transformation caches all the data from the first branch, and writes the cache to disk if the cache fills. The Joiner transformation must then read the data from disk when it receives the data from the second branch. This can slow processing.

Joining Two Instances of the Same Source

You can also join same source data by creating a second instance of the source. After you create the second source instance, you can join the pipelines from the two source instances. If you want to join unsorted data, you must create two instances of the same source and join the pipelines.

Note: When you join data using this method, the Integration Service reads the source data for each source instance, so performance can be slower than joining two branches of a pipeline.

Guidelines

Use the following guidelines when deciding whether to join branches of a pipeline or join two instances of a source:

	Join two branches of a pipeline when you have a large source or if you can read the source data only once. For example, you can only read source data from a message queue once.
	Join two branches of a pipeline when you use sorted data. If the source data is unsorted and you use a Sorter transformation to sort the data, branch the pipeline after you sort the data.

	Join two instances of a source when you need to add a blocking transformation to the pipeline between the source and the Joiner transformation.
	Join two instances of a source if one pipeline may process slower than the other pipeline.

Join two instances of a source if you need to join unsorted data.

Perform joins in a database when possible.

Performing a join in a database is faster than performing a join in the session. In some cases, this is not possible, such as joining tables from two different databases or flat file systems. If you want to perform a join in a database, use the following options:

· Create a pre-session stored procedure to join the tables in a database.

· Use the Source Qualifier transformation to perform the join.

Expression Transformation Overview

Expression Transformation Overview

Transformation type:

Passive

Connected

Use the Expression transformation to calculate values in a single row. For example, you might need to adjust employee salaries, concatenate first and last names, or convert strings to numbers. You can also use the Expression transformation to test conditional statements before you pass the results to a target or other transformations.

Use the Expression transformation to perform non-aggregate calculations. To perform calculations involving multiple rows, such as sums or averages, use the Aggregator transformation.

The naming convention for an Expression transformation is EXP_TransformationName.

You can enter multiple expressions in a single Expression transformation by creating an expression for each output port.

Note: After you make the transformation reusable, you cannot copy ports from the source qualifier or other transformations. You can create ports manually within the transformation.

Labels

Tuesday, October 4, 2011

Performance Tuning Overview

Optimizing Oracle Target Databases

Source Bottlenecks

Identifying Source Bottlenecks

Using a Filter Transformation

Using a Read Test Mapping

Using a Database Query

Eliminating Source Bottlenecks

Expression Transformation Complete Overview

Saturday, October 1, 2011

how to create scdtype2

how to create rank transformation

how to create Union Transformation

Friday, September 30, 2011

Infosys Interview Questions (23rd sep 2011)

Thursday, September 22, 2011

What is the difference between a Primary Key and a Unique Index?

Tuesday, August 9, 2011

Joiner Transformation Overview

Guidelines

Expression Transformation Overview

Total Pageviews