Question 1. Mention What Is Abinitio?
Answer :
“Abinitio” is a latin word meaning “from the beginning.” Abinitio is a tool used to extract, transform and load data. It is also used for data analysis, data manipulation, batch processing, and graphical user interface based parallel processing.

Question 2. Explain What Is The Architecture Of Abinitio?
Answer :
Architecture of Abinitio includes

GDE (Graphical Development Environment)

Co-operating System

Enterprise meta-environment (EME)


Question 3. Mention What Is The Role Of Co-operating System In Abinitio?
Answer :
The Abinitio co-operating system provide features like

Manage and run Abinitio graph and control the ETL processes

Provide Ab initio extensions to the operating system

ETL processes monitoring and debugging

Meta-data management and interaction with the EME

Question 4. Explain What Does Dependency Analysis Mean In Abinitio?
Answer :
In Ab initio, dependency analysis is a process through which the EME examines a project entirely and traces how data is transferred and transformed- from component-to-component, field-by-field, within and between graphs.

Question 5. Explain How Abinitio Eme Is Segregated?
Answer :
Abinition is logically divided into two segments

Data Integration Portion

User Interface ( Access to the meta-data information)

Question 6. Mention How Can You Connect Eme To Abinitio Server?
Answer :
To connect with Ab initio Server, there are several ways like


Login to EME web interface- http://serverhost:[serverport]/abinitio

Through GDE, you can connect to EME data-store

Through air-command

Question 7. List Out The File Extensions Used In Abinitio?
Answer :
The file extensions used in Abinitio are 

.mp: It stores Ab initio graph or graph component

.mpc: Custom component or program

.mdc: Dataset or custom data-set component

.dml: Data manipulation language file or record type definition

.xfr: Transform function file

.dat: Data file (multifile or serial file)

Question 8. Mention What Information Does A .dbc File Extension Provides To Connect To The Database?
Answer :
The .dbc extension provides the GDE with the information to connect with the database are 

Name and version number of the data-base to which you want to connect

Name of the computer on which the data-base instance or server to which you want to connect runs, or on which the database remote access software is installed

Name of the server, database instance or provider to which you want to link

Question 9. Explain How You Can Run A Graph Infinitely In Ab Initio?
Answer :
To execute graph infinitely, the graph end script should call the .ksh file of the graph. Therefore, if the graph name is then in the end script of the graph it should call to abc.ksh. This will run the graph for infinitely.

Question 10. Mention What The Difference Between “look-up” File And “look Is Up” In Abinitio?
Answer :
Lookup file defines one or more serial file (Flat Files); it is a physical file where the data for the Look-up is stored.  While Look-up is the component of abinitio graph, where we can save data and retrieve it by using a key parameter.

Question 11. Mention What Are The Different Types Of Parallelism Used In Abinitio?
Answer :
Different types of parallelism used in Abinitio includes

Component parallelism: A graph with multiple processes executing simultaneously on separate data uses parallelism

Data parallelism: A graph that works with data divided into segments and operates on each segments respectively, uses data parallelism.

Pipeline parallelism: A graph that deals with multiple components executing simultaneously on the same data uses pipeline parallelism. Each component in the pipeline read continuously from the upstream components, processes data and writes to downstream components.  Both components can operate in parallel.

Question 12. Explain What Is Sort Component In Abinitio?
Answer :
The Sort Component in Abinitio re-orders the data. It comprises of two parameters “Key” and “Max-core”.

Key: It is one of the parameters for sort component which determines the collation order

Max-core: This parameter controls how often the sort component dumps data from memory to disk

Question 13. Mention What Dedup-component And Replicate Component Does?
Answer :

Dedup component: It is used to remove duplicate records

Replicate component: It combines the data records from the inputs into one flow and writes a copy of that flow to each of its output ports

Question 14. Mention What Is A Partition And What Are The Different Types Of Partition Components In Abinitio?
Answer :
In Abinitio, partition is the process of dividing data sets into multiple sets for further processing.  Different types of partition component includes

Partition by Round-Robin: Distributing data evenly, in block size chunks, across the output partitions

Partition by Range: You can divide data evenly among nodes, based on a set of partitioning ranges and key

Partition by Percentage: Distribution data, so the output is proportional to fractions of 100

Partition by Load balance: Dynamic load balancing

Partition by Expression: Data dividing according to a DML expression

Partition by Key: Data grouping by a key

Question 15. Explain What Is Sandbox?
Answer :
A SANDBOX is referred for the collection of graphs and related files that are saved in a single directory tree and behaves as a group for the purposes of navigation, version control, and migration.

Question 16. Explain What Is De-partition In Abinitio?
Answer :
De-partition is done in order to read data from multiple flow or operations and are used to re-join data records from different flows. There are several de-partition components available which includes Gather, Merge, Interleave, and Concatenation.

Question 17. List Out Some Of The Air Commands Used In Abintio?
Answer :
Air command used in Abinitio includes

air object Is<EME path for the object-/Projects/edf/..> :  It is used to see the listings of objects in a directory inside the project

air object rm<EME path for the object-/Projects/edf/..> : It is used to remove an object from the repository

air object versions-verbose<EME path for the object-/Projects/edf/..> : It gives the version history of the object.

Other air command for Abinitio include air object cat, air object modify, air lock show user, etc.

Question 18. Mention What Is Rollup Component?
Answer :
Roll-up component enables the users to group the records on certain field values.  It is a multiple stage function and consists initialize 2 and Rollup 3.

Question 19. Mention What Is The Syntax For M_dump In Abinitio?
Answer :
The syntax for m_dump in Abinitio is used to view the data in multifile from unix prompt. The command for m_dump includes

m_dump a.dml a.dat: This command will print the data as it manifested from GDE when we view data in formatted text

m_dump a.dml a.dat>b.dat: The output is re-directed in b.dat and will act as a serial file.b.dat that can be referred when it is required.

Question 20. We Know Rollup Component In Abinitio Is Used To Summarize Group Of Data Record Then Why Do We Use Aggregation?
Answer :

Aggregation and Rollup, both are used to summarize the data.

Rollup is much better and convenient to use.

Rollup can perform some additional functionality, like input filtering and output filtering of records.

Aggregate does not display the intermediate results in main memory, where as Rollup can.

Analyzing a particular summarization is much simpler compared to Aggregations.

Question 21. What Kind Of Layouts Does Abinitio Support?
Answer :

Abinitio supports serial and parallel layouts.

A graph layout supports both serial and parallel layouts at a time.

The parallel layout depends on the degree of the data parallelism

A multi-file system is a 4-way parallel system

A component in a graph system can run 4-way parallel system.

Question 22. How Do You Add Default Rules In Transformer?
Answer :
The following is the process to add default rules in transformer

Double click on the transform parameter in the parameter tab page in component properties

Click on Edit menu in Transform editor

Select Add Default Rules from the dropdown list box.

It shows Match Names and Wildcard options. Select either of them.

Question 23. How To Run A Graph Infinitely?
Answer :
To run a graph infinitely:

The .ksh graph file should be called by the end script in the graph.

If the graph name is then the graph should call the abc.ksh file.

Question 24. What Is A Local Lookup?
Answer :

Local lookup file has records which can be placed in main memory

They use transform function for retrieving records much faster than retrieving from the disk.

Question 25. What Is A Look-up?
Answer :

A lookup file represents a set of serial files / flat files

A lookup is a specific data set that is keyed.

The key is used for mapping values based on the data available in a particular file

The data set can be static or dynamic. 

Hash-joins can be replaced by reformatting and any of the input in lookup to join should contain less number of records with a slim length of records

Abinitio has certain functions for retrieval of values using the key for the lookup.

Question 26. What Is A Ramp Limit?
Answer :

A limit is an integer parameter which represents a number of reject events

Ramp parameter contain a real number representing a rate of reject events of certain processed records

The formula is – No. of bad records allowed = limit + no. of records x ramp

A ramp is a percentage value from 0 to 1.

These two provides the threshold value of bad records.

Question 27. What Is A Rollup Component? Explain About It.
Answer :

Rollup component allows the users to group the records on certain field values.

It is a multi stage function and contains

Initialize 2. Rollup 3. Finalize functions which are mandatory

To counts of a particular group Rollup needs a temporary variable

The initialize function is invoked first for each group

Rollup is called for each of the records in the group.

The finally function calls only once at the end of last rollup call.

Question 28. How To Add Default Rules In Transformer?
Answer :

Open Add Default Rules dialog box.

Select Match Names – to match the names that generates a set of rules to copy input fields to out fields with same name

Use Wildcard(. *) Rule : This rule generates only one rule to copy input fields to output fields with the same name

If not displayed – display the Transform Editor Grid

Click the Business Rule tab . Select Edit?Add Default Rules

Nothing is needed to write in the reformat .xfr file in case of reformat, if there is no need to use any real transform other than reducing the set of fields.

Question 29. What Is The Difference Between Partitioning With Key / Hash And Round Robin?
Answer :
Partitioning by Key / Hash Partition :

The partitioning technique that is used when the keys are diverse

Large data skew can exist when the key is present in large volume

It is apt for parallel data processing

Round Robin Partition : 

This partition technique uniformly distributed the data on every destination data partitions

When number of records is divisible by number of partitions, then the skew is zero.

For example: a pack of 52 cards is distributed among 4 players in a round-robin fashion.

Question 30. Explain The Methods To Improve Performance Of A Graph?
Answer :
The following are the ways to improve the performance of a graph :

Make sure that a limited number of components are used in a particular phase

Implement the usage of optimum value of max core values for the purpose of sorting and joining components.

Utilize the minimum number of sort components

Utilize the minimum number of sorted join components and replace them by in-memory join / hash join, if needed and possible

Restrict only the needed fields in sort, reformat, join components

Utilize phasing or flow buffers when merged or sorted joins

Use sorted join, when two inputs are huge, otherwise use hash join

Question 31. What Is The Function That Transfers A String Into A Decimal?
Answer :
Use decimal cast with the size in the transform() function, when the size of the string and decimal is same.
Ex: If the source field is defined as string(8).
– The destination is defined as decimal(8)
– Let us assume the field name is salary.
– The function is out.field :: (decimal(8)) in salary
– If the size of the destination field is lesser that the input then string_substring() function can be used
Ex : Say the destination field is decimal(5) then use…
– out.field :: (decimal(5))string_lrtrim(string_substring(in.field,1,5))
– The ‘ lrtrim ‘ function is used to remove leading and trailing spaces in the string

Question 32. Describe The Evaluation Of Parameters Order:
Answer :
Following is the order of evaluation:

Host setup script will be executed first

All Common parameters, that is, included , are evaluated

All Sandbox parameters are evaluated

The project script – project-start.ksh is executed

All form parameters are evaluated

Graph parameters are evaluated

The Start Script of graph is executed

Question 33. Explain Pdl With An Example?
Answer :
To make a graph behave dynamically, PDL is used

– Suppose there is a need to have a dynamic field that is to be added to a predefined DML while executing the graph

– Then a graph level parameter can be defined 

– Utilize this parameter while embedding the DML in output port.

For Example : define a parameter named myfield with a value “string(“ | “”) name;”

Use ${mystring} at the time of embedding the dml in out port.

Use $substitution as an interpretation option

Question 34. State The Working Process Of Decimal_strip Function?
Answer :

A decimal strip takes the decimal values out of the data.

It trims any leading zeros

The result is a valid decimal number

decimal_strip(“-0184o”) := “-184”
decimal_strip(“oxyas97abc”) := “97”
decimal_strip(“+$78ab=-*&^*&%cdw”) := “78”
decimal_strip(“Honda”) “0”

Question 35. State The First_defined Function With An Example?
Answer :

This function is similar to the function NVL() in Oracle database

It performs the first values which are not null among other values available in the function and assigns to the variable

Example: A set of variables, say v1,v2,v3,v4,v5,v6 are assigned with NULL.
Another variable num is assigned with value 340 (num=340)
num = first_defined(NULL, v1,v2,v3,v4,v5,v6,NUM)
The result of num is 340

Question 36. What Is Max Core Of A Component?
Answer :

MAX CORE is the space consumed by a component that is used for calculations

Each component has different MAX COREs

Component performances will be influenced by the MAX CORE’s contribution

The process may slow down / fasten if a wrong MAX CORE is set

Question 37. What Are The Operations That Support Avoiding Duplicate Record?
Answer :
Duplicate records can be avoided by using the following:

Using Dedup sort

Performing aggregation

Utilizing the Rollup component

Question 38. What Parallelisms Does Abinitio Support?
Answer :
AbInitio supports 3 parallelisms. They are
Data Parallelism : Same data is parallelly worked in a single application
Component Parallelism : Different data is worked parallelly in a single application
Pipeline Parallelism : Data is passed from one component to another component. Data is worked on both of the components.

Question 39. State The Relation Between Eme, Gde And Co-operating System?
Answer :

EME stands for Enterprise Metadata Environment 

It is a repository to AbInitio. It holds transformations, database configuration files, metadata and target information


GDE – Graphical Development Environment

It is an end user environment. Graphs are developed in this environment

It provides GUI for editing and executing AbInitio programs 

Co-operative System:

Co-operative system is the server of AbInitio.

It is installed on a specific OS platform known as Native OS. 

All generated graphs in GDE are later deployed and executed in co-operative system.

Question 40. What Is A Deadlock And How It Occurs?
Answer :

A graphical / program hand is known as deadlock.

The progression of a program would be stopped when a dead lock occurs.

Data flow pattern likely causes a deadlock

If a graph flows diverge and converge in a single phase, it is potential for a deadlock

A component might wait for the records to arrive on one flow during the flow converge, even though the unread data accumulates on others.

In GDE version 1.8, the occurrence of a dead lock is very rare.

Question 41. What Is The Difference Between Check Point And Phase?
Answer :
Check point:

When a graph fails in the middle of the process, a recovery point is created, known as Check point

The rest of the process will be continued after the check point

Data from the check point is fetched and continue to execute after correction.


If a graph is created with phases, each phase is assigned to some part of memory one after another. 

All the phases will run one by one

The intermediate file will be deleted

