BI Quest [A blog on Informatica]

Share what you learn

All about lookups

Posted by Ankur Nigam on July 6, 2014

What is a lookup transformation? : Lookup is a transformation a mechanism within Informatica which serves the purpose of referencing data. The source of data, which is stored and referred, could be a database or a flat file. Lookup transformation is built upon this underlying source data. Basically it receives input, matches them against columns, and returns the value through Return Port/Lookup port. Uptill Informatica 8.1 lookup transformation was only passive transformation i.e. it used to return only one row on receiving input. But from 9.1 onwards it could be configured to act active or passive either.

Why it is required? : It actually serves the purpose of referring. So you could have some key values of which you want to find out associated values which are lying in database/flat file. Lookup comes to rescue.

What will happen if I use it? : Once you have lookup transformation, you can now pass those key columns to seek desired information. For example if I pass OrderID I want to know the CustomerID who ordered it. It looks similar to a function call or a seek operation. Well indeed it is that much only. What gets complicated is when you add up other spices to share your lookups, when to build it, persistency etc.

What gets created when I use lookup? : Normally it is going to create a cache file if you have enabled the option of creating lookup cache. Cache file is collection of data which is built by querying source and storing the data in a directory on your hard drive. Any expression requiring search on the source table or file can use this cache file (via lookup). Cache files are always ordered set of data. Along with cache we also have an index file. We will get to it later and explore in more detail each of the term.

However, if you have not selected the option of using Cache then lookup is done by connecting to source directly. Note that lookup on relational tables can be comprehended without using a cache but a flat file lookup has to be cached.

Caches – What are these? How they help? : I would say that cache is very loosely used term in Informatica. Traditionally cache is a mechanism to temporarily store data in a file away from source; so that a network roundtrip to source can be avoided and the locally kept file having data can be used. However in Informatica Cache files built by lookup are collection of not just data but index also. Recall as I said above Cache file is ordered set of data. Ordering happens on key columns which identify a record uniquely. So suppose if I built a lookup on Orders tables, out of data returned by this query Select OrderID, CustomerID from Orders and I seek customerID on the basis of orderID then orderID is the key column and data in cache file will be arranged by OrderID. A separate index file will also be built simultaneously to facilitate search in the data cache file. By default Informatica always indexes on the key column, unless otherwise, you overwrite the SQL query and personally flavor up your ORDER BY clause.

You use cache you gain : How? Well if you do not use cache, then each data queried via lookup is actually seeked from source directly. In case of relational table, connection is established and a query is issued against database to get the associated value of the key columns. Seems messy? Well ofcourse it is. So to avoid such situation we use lookup which have caching enabled.

Sharing up the Lookup : If it seems that the information in lookup can be used by other target load flow also, you can very well share it. By default, when you add a lookup transformation, Informatica Designer enables cache, and the cache it creates is, Static. Another type of cache is Dynamic Cache which actually is yet another cache file but with an added feasibility of having data updated or inserted in the cache. We will discuss more abut dynamic cache in detail, later.

Sharing at mapping level and Sharing at session level : If you want to share the cache file within different load orders in a single mapping, you simply have to enable caching and that’s it. However if you want to enable sharing across mappings then set the cache to be persistent and give a prefix name to cache. Prefix and persistency ensures that the already exiting cache file should not be rebuilt if other mappings are also using it.

Different types of lookups and their (better halves) cache

Types of Lookup

  1. Types of lookups : There are majorly two types of lookups, Connected and Unconnected. Rest all are cache settings which add more flavor. These are summarized below and we will go through each of them one by one.
    1. Connected lookup : This type of lookup transformation is a part of pipeline and receives input from upstream transformation. It outputs the port checked as lookup and connected to downstream transformation. (Remember when I say upstream it means given a transformation in focus, the information flow from the preceding transformation, and downstream means information flow from this transformation, to next transformation….mmm may lookup dumb to explain here, but frankly I kept floating in these streams struggling to understand what they mean actually 🙂 ).
    2. Unconnected lookup : Here we have lookup that is part of mapping however is not a part of any pipeline flow. This lookup acts similar to function call. It is indeed called using notation :LKP.<Lookup name> from any transformation. Rest is same that you pass the key column value to seek the desired value. Note that Unconnected lookup can return only and only one value (port).
  2. Different Caches : Informatica supports updatable caches and static caches. These are explained below :
    1. Dynamic Cache : Informatica supports cache which are intelligent, in the sense that if a certain lookup value is not present then it will be inserted in the cache, so that next time when you do a lookup, the value is returned from cache. This type of cache which can be updated is known as dynamic cache. Infact in dynamic cache you can not even insert but also update the values. Suppose based on key columns (columns on which lookup condition is defined) a certain value is fetched from lookup cache. However that value is different from the value entering into lookup, then the cache will be updated to have new values. Updateable lookup is setup by configuring following properties:
      1. By “associating” the lookup port with incoming port value.
      2. By unchecking the “ignore in comparison” property for the abovementioned ports.
      3. By checking the “ignore in comparison” property for the others columns (mostly key columns).
    2. Static Cache : These are on contrary to Dynamic counterparts, fixed or stagnant collection of values. They are the most simplest form of cache. These are built upon data queried from source; and their data file and index file is non updatable. If the value fetched from lookup is not existing, then NULL is returned else whatever value is present in the Cache, that is returned.
  3. Persistence / Sharing : Cache can remain alive after the mapping which created it finishes execution, if you set the Lookup cache persistence property. Coming to sharing the cache; well that get’s a bit tricky. It can be shared between different TLO (Target Load Order) within the mapping or even between different mappings/sessions. However sharing cache can be done only if there is no change in the lookup columns and key columns. This is a thumb-rule. Next, if you want to share the cache across session / mapping then it ought to have a name. This name can be set against Cache File Name Prefix setting in lookup transformation. However if lookup is not persistence as soon as session finishes execution, cache file is deleted.

To summarize, given above situations we land up with following lookup and cache settings:-

  1. Named / Unnamed Cache – Effects scope of Sharing the Cache file across sessions.
  2. Persistent / Non Persistent Cache – Effects whether lookup cache file can stay alive after session finishes.
  3. Static / Dynamic Cache – Effects whether you built cache file once or is it updatable.
  4. Connected / Unconnected lookups – Effects whether lookup returns more than one port or not.

Collectively you can have the following settings:-

  1. Connected lookup with Static, Persistent, Named/Unnamed Cache
  2. Connected lookup with Static, Non-Persistent, Unnamed Cache
  3. Connected lookup with Dynamic, Persistent, Named/Unnamed Cache
  4. Connected lookup with Dynamic, Non-Persistent, Unnamed Cache
  5. Unconnected lookup with Static, Persistent, Named/Unnamed Cache
  6. Unconnected lookup with Static, Non-Persistent, Unnamed Cache

Lookup

And you cannot have the following:-

  1. Unconnected lookup with Dynamic Cache (named persistent or whatever):-

As far as sharing of cache goes, you can have following:-

  1. Static Unnamed with Static Unnamed :- Shared within mapping by default
  2. Static Named with Static Named :- Shared within mapping and across mappings
  3. Dynamic with Dynamic Named :- Shared within mapping and across mappings
  4. Dynamic with Static Named :- Shared within mapping and across mappings

And you can not have the following:-

  1. Dynamic with Static Unnamed
  2. Dynamic with Dynamic Unnamed.

One last thing – Two same dynamic lookups in a Target Load Order cannot exist, because cache building is continued process until the entire target has been loaded. Put it simply, until your Target Load Order has finished execution, subsequent dynamic cache usage might differ it from the previous cache values!

(So this seeds up the next part of how lookup is actually created. I will create it shortly…hopeful about it 🙂 )

Posted in Informatica, Transformers | Leave a Comment »

Loading Top/Bottom N records in target

Posted by Ankur Nigam on March 3, 2013

I ran into a problem again. This time the requirement was not from BOSS but a scenario which often I see in the blogs. One of it is loading Top/Bottom N records in the target. People present different solution for each of them. I took it a little further and tried to develop a mapping which can load Top or Bottom record, depending upon the value passed. Let’s say for example it is required to load top 3 records then the parameter value would be 3 or it would be -3 in case of loading bottom records. The information will be parameterized to enable full flexibility.

Screenshot-2013-03-03_16.28.13

My mapping looks something like this:-

Screenshot-2013-03-03_16.32.59

In a nutshell I will explain what I have done:-

Step 1 : Read records
Step 2 : Sequence them
Step 3 : In a router create two groups ; $$LoadTill > 0 (First Records) and $$LoadTill < 0 (Bottom Records)
Step 4 : Sort all the rows from group First Records in Ascending manner
Step 5 : Sort all the rows from group Last Records in Descending manner
(Below steps are common for all the abovemntioned groups)
Step 6 : Pass the records from Sorter to Expression transformation and generate a sequence number using variable port
Step 7 : In the same expression deduce whether this record is eligible to be written to target
Step 8 : Load target, only if the record counter is same or less than $$LoadTill

Posted in Informatica, Scenario Based Questions | Leave a Comment »

FR_3002, FR_3019 – Operating system error message [The parameter is incorrect]

Posted by Ankur Nigam on August 26, 2011

You might get this error when running a session having File List. There are many of them but the very subtle one is that even if one of the file name is wrongly typed the session will fail when reading the very first file out of the list. This may lead to head scratching & you might end up as zombie. Ideally the filename which is wrong should be flagged in the logs, however what you see is the very first file tagged in front of Error No.

I had my stock Items file something like this

Here the last filename is not having txt suffixed to it, hence the session failed when reading the very first file, though this file in all essence was correct.

 

Posted in Houston we have a problem! | Tagged: , , | 1 Comment »

Using File List as source to read from multiple flat files

Posted by Ankur Nigam on August 25, 2011

In production scenario sometimes you can land up in a situation where you are asked to read from multiple files having different data but same architecture. This is quite in contrast to situations where you need to output in different files. Informatica allows you to read from multiple files in a single go, provided the columns are same across all files. When we set indirect as a source option in session, the informatica architecture reads the file names from the file list you provided and one-by-one starts loading them in the target.

In my scenario which we are going to walk-through, I will create a mapping that reads data of items from different category level files and push them in the items target table in oracle.

The very first step is to import your source file and define its attributes.

For now don’t worry about properties other than delimiters and column lengths. Next create a mapping to something like this given below:-

This is a fairly simple mapping wherein I am fetching the category ID from the category name received in file and pushing all the data down to my target.

Now lets open our session configuration and set the property Source filetype over there.

Set it as Indirect which means that whatever filename I provide (in variable SourceInpFile) contains the names of the files having actual data. Below is the example of my StockItems file having filenames of different data files. I have assigned this filename value to the sourcefilename variable.

When I run the session all the files will be read one by one and the data will be loaded in the target

Data will be loaded in the target:-

Posted in Session Labyrinth | Tagged: , , , | 3 Comments »

Flat File Target & different File Names

Posted by Ankur Nigam on August 23, 2011

Often there is a requirement for deterministic output based on the input when you are writing to flat files. For example you are required to write a mapping that spawns sales output as CSV for every region in a separate file. The name of each file has to be different and each such file should have the sales data for that region only.

I had a scenario somewhat similar to this wherein I have to output categorized items i.e. I have a data model something like this:-

My Items table is joined to category on category id & I was required to output inventory for each category in separate file. The name of each file should be the name of the category.

This can be done through a control Transaction Control. This is a very useful tool but least used. Basically it helps you in controlling when the commit has to be done. For example you want to commit at a point when a certain column value has changed. In our case whenever the category changes the previous data belonging to old category has to be written which means since a new category data has arrived the old category data still held in buffer should now be written into the target.

In addition to Transaction Control, we also need a mechanism to control what filename should be generated. This can be done through a FileName port available in FlatFile Targets. This port is a write-only port and you can only set value to it. When used in session the port value becomes read-only and is used by Integration service to set the name of file.The Output filename attribute will always be ignored if you have used FileName port of target FlatFile. In a sense that a zero byte file of Output filename attribute will be created with no data in it.

So on one hand we control over when and what has to be written, on the other hand we control what file name has to be given to file for writing the data.

Following is the mapping which I have created for this scenario:-

Since I have already defined a relationship between the participating tables the query would be generated accordingly. Please also note that I have sorted the data on the basis of Category Name.The sorted port property in this case is 1.

The main work is done by expression control and transaction control. A more detailed insight of the expression control is given below:-

The basic task of this expression to check if a new category has come or not. I have declared a variable VAR_IS_NEWITEM which is 1 if the incoming category name matched the old category name (stored in VAR_OLD_ITEM). Since matching is done before VAR_OLD_ITEM is assigned new value, we can safely track changes as long as we are getting sorted inputs in expression (recall that sorted output is emitted from Source Qualifier expression)

Coming down further I have outputted the flag whether category is new or not in IS_NEWITEM along withe the desired filename which in my case is <CATEGORY_NAME>.TXT

Moving forward I have now directed this output into a Transaction Control.

Here I am checking if the Category is new then Commit into target. Transaction Control has following properties:-

TC_CONTINUE_TRANSACTION. No action to be taken for the current row. This is the default behavior.

TC_COMMIT_BEFORE. All the old transaction rows before the current row already written in the target have to be committed and a new transaction to begin from current row. In other words old transaction data boundary reached & therefore start new transaction boundary after committing the old transaction data in the target.

TC_COMMIT_AFTER. The current row has to be written and old transaction plus this new row has to be committed.  Thereafter start a new transaction boundary.

TC_ROLLBACK_BEFORE. It means to rollback all old transaction data from target before this row & then begin a new transaction boundary from current row.

TC_ROLLBACK_AFTER. The current row has to be written and old transaction plus this new row has to be roll backed.  Thereafter start a new transaction boundary. The current row is in the rolled back transaction.

Long story said short I need to commit only when the new category has arrived till then keep writing in the target. Since each time also passing the filename from expression into the Filename Port of the target, Integration service will always pick up the value in Filename port of target as the Output filename. In this case where I have used a Filename port the session level Output Filename attribute of target will be ignored in the sense that though the file will be created but with no data.

Following was the query which was issued to database

SELECT ITEM_CATEGORIES.CATEGORY_NAME, STOCK_ITEMS.ITEM_NAME, STOCK_ITEMS.UNIT_PRICE, STOCK_ITEMS.STOCK_QTY
FROM ITEM_CATEGORIES, STOCK_ITEMS
WHERE STOCK_ITEMS.CATEGORY_ID = ITEM_CATEGORIES.CATEGORY_ID ORDER BY ITEM_CATEGORIES.CATEGORY_NAME

which returned data :-

When the session ran the data was neatly organized in different files as per category which is what I required. Note that there is a 0 byte file also created with no data:-

Posted in Transformers | Tagged: , , | 2 Comments »

Error occurred loading library [pmora8.dll] & invalid Win32 application

Posted by Ankur Nigam on August 18, 2011

Are you getting this error when connecting to oracle, while creating a repository?

Database driver event...Error occurred loading library [pmora8.dll].
System error encountered is 193. Error text is %1 is not a valid Win32 application

If so, then most probably you are connecting to a 64 bit database which does not have its 32 bit client installed on your machine. Your informatica is 32 bit and will always use 32 bit client to connect to the database. However you do not have a 32 bit client installed on your machine. Informatica can not find it and therefore could not connect to 64 bit oracle. Result is that you get this error.

Solution is to install Oracle 32 bit client for your version of oracle. Get it from oracle download center. Make sure that the home of this Oracle client should be different from your originally installed oracle 64 bit client & the home-path is listed in PATH environment variable.

You can also get an error when you run sqlplus after install …. the error could be TNS:protocol adapter . Don’t worry and try to check by going to Start > Programs > {Your 64 bit oracle} > sqlplus. It could be that the 32 bit client’s sqlplus got fired up.

For more info on TNS read this post from EdStevens:-

https://forums.oracle.com/forums/thread.jspa?threadID=1982093

Posted in Houston we have a problem! | Tagged: , , | Leave a Comment »

Understanding Treat Source Rows property and Target “Insert, Update…” properties

Posted by Ankur Nigam on August 17, 2011

Informatica has plethora of options to perform IUD (Insert, Update, Delete) operations on tables. One of the most common method is using the Update strategy, while the underdog is using Treat Source Rows set to {Insert; Update; Delete} and not data driven. I will be focusing on latter in this topic.

In simple terms when you set the Treat Source Rows property it indicates Informatica that the row has to be tagged as Insert or Update or Delete. This property coupled with target level property of allowing Insert, Update, Delete works out wonders even in absence of Update Strategy. This also leads to a clear-cut mapping design. I am not opposing the use of Update Strategy but in some situations this leads to a slight openness in the mapping wherein I don’t have to peek into the reason of action the Strategy is performing e.g. IIF(ISNULL(PK)=1,DD_INSERT,DD_UPDATE).

Lets buckle up our belts and go on a ride to understand the use of these properties.

Assume a scenario where I have following Table Structure in Stage

Keeping things simple the target table would be something like this

As you can see the target has UserID as a surrogate key which I will populate through a sequence. Also note that Username is unique.

Now I have a scenario where I have to update the existing records and insert the new ones as supplied in the staging table.

Before beginning with writing code, lets first understand TSA and target properties is more detail. Treat Source Rows accepts 4 types of settings:

  1. Insert :- When I set this option Informatica will mark all rows read from source as Insert. Means that the rows will only be inserted.
  2. Update :- When I set this option Informatica will mark all rows read from source as Update. It means that rows when arrive target they have to be updated in it.
  3. Delete :- The rows will be marked as “to be deleted” from target once having been read from Source.
  4. Data Driven :- This indicates Informatica that we are using an update strategy to indicate what has to be done with rows. So no marking will be done when rows are read from source. Infact what has to be done with rows arriving to target will be decided immediately before any IUD operation on target

However setting TSA alone will not let you modify rows in the target. Each target in itself should be able to accept or I should say allow IUD operations. So when you have set TSA property you have to also set the target level property also that whether the rows can be inserted, updated or deleted from the target. This can be done in following ways:-

Insert and delete are self-explanatory however update has been categorized into 3 sections. Please note that setting any of them will allow update on your tables:-

  1. Update as Update :- This is simple property which says that if the row arrives target, it has to be updated in target. So if you check the logs Informatica will generate an Update template something like “UPDATE INFA_TARGET_RECORDS SET EMAIL = ? WHERE USERNAME = ?”
  2. Update as Insert :- This means that when row arrives target and it is a row which has to be updated, then the update behaviour should be to insert this row in target. In this case Informatica will not generate any update template for the target instead the incoming row will be inserted using the template “INSERT INTO INFA_TARGET_RECORDS(USERID,USERNAME,EMAIL)  VALUES ( ?, ?, ?)”
  3. Update else Insert :- Means that the incoming row flagged as update should be either updated or inserted. In a nutshell it means that if any key column is present in the incoming row which also exists in target then Informatica will intelligently update that row in target. In case if the incoming key column is not present in the target the row will be inserted.

PS :- The last two properties require you to set the Insert property of target also because if this is not checked then Update as Insert & Update else Insert will not work and session will fail stating that the target does not allows Insert. Why? Well its simple because these update clauses have insert hidden in them.

Ok enough of theories? Fine … let’s get our hand dirty. Coming back to our scenario, we have the rows read from source and want them to be either inserted or updated in target depending upon the status of rows i.e. whether they are present in the target or not. My mapping looks something like this:

Here I have used a lookup table to fetch user ID for a username incoming from stage. In the router following has been set:-

The output from router is sent to respective instances of the target (INFA_TARGET_RECORDS) in case if user exists or not. INFA_TARGET_RECORDS_NEW in case of new records and INFA_TARGET_RECORDS_UPD in case of existing records.

Once this is in place I have to set the Treat Source Rows property as Update for this session. Also to enable Informatica to insert in the table I will have to :-

  1. Set the Insert & Update as Insert properties of the instance INFA_TARGET_RECORDS_NEW.
  2. Set Update as Update property for INFA_TARGET_RECORDS_UPD instance in the session.

What actually happened is that I have treated all rows from source to be flagged as update. Secondly I have modified the behaviour of the Update and set it as Update as Insert. Due to this property update has allowed me actually to insert the rows in target. When the session runs it will update the rows in target and insert the new rows in target (actually update as insert).

Try it out and let me know if it works for you. I am not attaching any run demo because its better if you do it and understand even more clearly what is happening behind the scenes.

Posted in Session Labyrinth | Tagged: , | 17 Comments »

Understanding Domain & Nodes

Posted by Ankur Nigam on August 15, 2011

Informatica thrives on HA (High Availability) & SOA (Service Oriented Architecture). The central idea of Informatica architecture is the resilience to failure & load balancing. Keeping this in mind Informatica scales its operation amongst nodes created in a domain. Domain is simply a collection of nodes whereas Nodes are your physical machines performing operation. Domain information is typically stored in a database. Once a domain is in place you need to create a Master gateway node. Without a Master gateway node no entry point exists in domain and you can not use the domain conversely no Informatica operations can be performed since you can not access worker nodes.

How to distinguish between an ordinary worker node and a gateway node? Simply by the way it is configured. A node can be set as Gateway in Domain configuration section Log and Gateway Configuration.

Let’s stop here and discuss more about Node. As I mention they are nothing more than physical machines. They are logically represented in Informatica domain as Nodes. When I add any node in Informatica domain I have to configure its Name, Host & port at which the Informatica Service Manager is listening. Wait a minute …. a what manager?

Service Manager – It manages domain related operations as well as application services. This service manager runs as a Windows service or daemon on Unix. Service Manager has to be present on each node otherwise Nodes will not work. A service manager performs domain related functions & application services. What kind of domain functions? Well that varies from node to node. A gateway node will have authentication, logging etc performed by Service Manager whereas simply logging on a worker node as a part of domain functions. Apart from this the Service Manager also initiates the application specific services.

Once a node is having a service manager and that means Informatica Services, then that node can perform all gateway operations as well as application specific operations. Application specific operations are your Repository Services, Integration Services etc.

So you have

Domain -> Node -> Gateway or Worker or Both

Hope this helps you to give an idea of Informatica architecture. Do read the manuals for more details. Apart from that fire up your administration console and study the various sections/configurations in there.

Posted in Architecture Level | Tagged: , , , | 5 Comments »