Tuesday, October 4, 2011

Performance Tuning Overview


Performance Tuning Overview

The goal of performance tuning is to optimize session performance by eliminating performance bottlenecks. To tune session performance, first identify a performance bottleneck, eliminate it, and then identify the next performance bottleneck until you are satisfied with the session performance. You can use the test load option to run sessions when you tune session performance.

If you tune all the bottlenecks, you can further optimize session performance by increasing the number of pipeline partitions in the session. Adding partitions can improve performance by utilizing more of the system hardware while processing the session.

Complete the following tasks to improve session performance:

Optimize the target. Enables the Integration Service to write to the targets efficiently
Optimize the source. Enables the Integration Service to read source data efficiently.

Optimize the mapping. Enables the Integration Service to transform and move data efficiently..

Optimize the transformation. Enables the Integration Service to process transformations in a                      mapping efficiently.


Optimize the session. Enables the Integration Service to run the session more quickly.

Optimize the grid deployments. Enables the Integration Service to run on a grid with optimal performance..


Optimize the PowerCenter components. Enables the Integration Service and Repository Service to function optimally..

Optimize the system. Enables PowerCenter service processes to run more quickly.

Bottlenecks Overview
The first step in performance tuning is to identify performance bottlenecks. Performance bottlenecks can occur in the source and target databases, the mapping, the session, and the system. The strategy is to identify a performance bottleneck, eliminate it, and then identify the next performance bottleneck until you are satisfied with the performance.
Look for performance bottlenecks in the following order:

Target

Source
Mapping
Session
System


Use the following methods to identify performance bottlenecks:

Run test sessions. You can configure a test session to read from a flat file source or to write to a flat file target to identify source and target bottlenecks.

Analyze performance details. Analyze performance details, such as performance counters, to determine where session performance decreases.


Analyze thread statistics. Analyze thread statistics to determine the optimal number of partition points.

Monitor system performance. You can use system monitoring tools to view the percentage of CPU use, I/O waits, and paging to identify system bottlenecks. You can also use the Workflow Monitor to view system resource usage.

Using Thread Statistics
You can use thread statistics in the session log to identify source, target, or transformation bottlenecks. By default, the Integration Service uses one reader thread, one transformation thread, and one writer thread to process a session. The thread with the highest busy percentage identifies the bottleneck in the session.
The session log provides the following thread statistics:

Run time. Amount of time the thread runs.

Idle time. Amount of time the thread is idle. It includes the time the thread waits for other thread processing within the application. Idle time includes the time the thread is blocked by the Integration Service, but it not the time the thread is blocked by the operating system.


Busy time. Percentage of the run time the thread is by according to the following formula:
(run time - idle time) / run time X 100
You can ignore high busy percentages when the total run time is short, such as under 60 seconds. This does not necessarily indicate a bottleneck.

Thread work time. The percentage of time the Integration Service takes to process each transformation ina thread. The session log shows the following information for the transformation thread work time:
Thread work time breakdown:
        <transformation name>: <number> percent
        <transformation name>: <number> percent
        <transformation name>: <number> percent
If a transformation takes a small amount of time, the session log does not include it. If a thread does not have accurate statistics, because the session ran for a short period of time, the session log reports that the statistics are not accurate.

Eliminating Bottlenecks Based on Thread Statistics
Complete the following tasks to eliminate bottlenecks based on thread statistics:

If the reader or writer thread is 100% busy, consider using string datatypes in the source or target ports. Non-string ports require more processing.

If a transformation thread is 100% busy, consider adding a partition point in the segment. When you add partition points to the mapping, the Integration Service increases the number of transformation threads it uses for the session. However, if the machine is already running at or near full capacity, do not add more threads.

If one transformation requires more processing time than the others, consider adding a pass-through partition point to the transformation.

Target Bottlenecks
The most common performance bottleneck occurs when the Integration Service writes to a target database. Small checkpoint intervals, small database network packet sizes, or problems during heavy loading operations can cause target bottlenecks.

Identifying Target Bottlenecks
To identify a target bottleneck, complete the following tasks:

Configure a copy of the session to write to a flat file target. If the session performance increases significantly, you have a target bottleneck. If a session already writes to a flat file target, you probably do not have a target bottleneck.

Read the thread statistics in the session log. When the Integration Service spends more time on the writer thread than the transformation or reader threads, you have a target bottleneck.

Eliminating Target Bottlenecks
Complete the following tasks to eliminate target bottlenecks:

  • Dropping Indexes and Key Constraints
  • Optimizing Flat File Targets
  • Increasing Database Checkpoint Intervals
  • Using Bulk Loads
  • Optimizing Oracle Target Databases

  • Increasing Database Network Packet Size

Source Bottlenecks

Performance bottlenecks can occur when the Integration Service reads from a source database. Inefficient query or small database network packet sizes can cause source bottlenecks.

Identifying Source Bottlenecks

You can read the thread statistics in the session log to determine if the source is the bottleneck. When the Integration Service spends more time on the reader thread than the transformation or writer threads, you have a source bottleneck.
If the session reads from a relational source, use the following methods to identify source bottlenecks:


Filter transformation

Read test mapping


Database query

If the session reads from a flat file source, you probably do not have a source bottleneck. 

Using a Filter Transformation

You can use a Filter transformation in the mapping to measure the time it takes to read source data.
Add a Filter transformation after each source qualifier. Set the filter condition to false so that no data is processed passed the Filter transformation. If the time it takes to run the new session remains about the same, you have a source bottleneck.

Using a Read Test Mapping

You can create a read test mapping to identify source bottlenecks. A read test mapping isolates the read query by removing the transformation in the mapping.
To create a read test mapping, complete the following steps:
1.
Make a copy of the original mapping.
2.
In the copied mapping, keep only the sources, source qualifiers, and any custom joins or queries.

3.
Remove all transformations.
4.
Connect the source qualifiers to a file target.
Run a session against the read test mapping. If the session performance is similar to the original session, you have a source bottleneck.

Using a Database Query

To identify source bottlenecks, execute the read query directly against the source database.
Copy the read query directly from the session log. Execute the query against the source database with a query tool such as isql. On Windows, you can load the result of the query in a file. On UNIX, you can load the result of the query in /dev/null.
Measure the query execution time and the time it takes for the query to return the first row. 

Eliminating Source Bottlenecks


Complete the following tasks to eliminate source bottlenecks:

Set the number of bytes the Integration Service reads per line if the Integration Service reads from a flat file source.

Have the database administrator optimize database performance by optimizing the query.


Increase the database network packet size.

Configure index and key constraints.









Expression Transformation Complete Overview


Transformation type:
Passive
Connected

Use the Expression transformation to calculate values in a single row. For example, you might need to adjust employee salaries, concatenate first and last names, or convert strings to numbers. You can also use the Expression transformation to test conditional statements before you pass the results to a target or other transformations.
Use the Expression transformation to perform non-aggregate calculations. To perform calculations involving multiple rows, such as sums or averages, use the Aggregator transformation.

Expression Transformation Components
You can create an Expression transformation in the Transformation Developer or the Mapping Designer.
An Expression transformation contains the following tabs:

Transformation. Enter the name and description of the transformation. The naming convention for an Expression transformation is EXP_TransformationName. You can also make the transformation reusable.

Ports. Create and configure ports.

Properties. Configure the tracing level to determine the amount of transaction detail reported in the session log file.

Metadata Extensions. Specify the extension name, datatype, precision, and value. You can also create reusable metadata extensions.

Friday, September 30, 2011

Infosys Interview Questions (23rd sep 2011)


1.Explain u r project architecture?
2.What is staging area and why u r using in ur project?
3.if u have a source system contain records of three months jan,feb,march I want to load the data in to  three targets jan,feb,mar draw the mapping?
4.diff b/w static lkp and dynamic lkp?
5.what is different types of joins in jnr t/r?
6.in filter t/r I can give filter condition to 4 what can happened?
7.In agg t/r I can use sorted input option but I cant pass sorted data what happened?
8.explain normalize t/r give one example how it can convert single input record to multiple o/p records?
9.what is verboseinitialization?

Thursday, September 22, 2011

What is the difference between a Primary Key and a Unique Index?


  1. Column(s) that make the Primary Key of a table cannot be NULL since by definition, the Primary Key cannot be NULL since it helps uniquely identify the record in the table. The column(s) that make up the unique index can be nullable. A note worth mentioning over here is that different RDBMS treat this differently –> while SQL Server and DB2 do not allow more than one NULL value in a unique index column, Oracle allows multiple NULL values. That is one of the things to look out for when designing/developing/porting applications across RDBMS.
  2. There can be only one Primary Key defined on the table where as you can have many unique indexes defined on the table (if needed).
  3. Also, in the case of SQL Server, if you go with the default options then a Primary Key is created as a clustered index while the unique index (constraint) is created as a non-clustered index.  This is just the default behavior though and can be changed at creation time, if needed.
So, if the unique index is defined on not null column(s), then it is essentially the same as the Primary Key and can be treated as an alternate key meaning it can also serve the purpose of identifying a record uniquely in the table.