Informatica MDM File Extract Customer

Prerequisites

The following are the prerequisite conditions.

Area Prerequisite

Task Group Execution

At least once, Informatica_MDM_Extract task group must be executed successfully.

View in IDP Database

The below view must be present in IDP Database:

@IDP_SCHEMA.VW_INFA_@INFA_ORG_ID_@MDM_DOMAIN_ENTITY,

Create this view if not already present. This is a one time activity.

To create the view, replace the IDP SCHEMA, INFA ORG ID and MDM DOMAIN parameters as per your environment and then run the below query in IDP environment.

create or replace view @IDP_SCHEMA.VW_INFA_@INFA_ORG_ID_@MDM_DOMAIN_ENTITY as select x.HCP_MASTER_ID OBJECT_ID, m.LASTUPDATEDATE, 'HCP' as OBJECT_TYPE, m.COUNTRY from odp_core_staging.INFA_HCP_XREF x inner join odp_core_staging.INFA_HCP_MAIN m on (x.HCP_MASTER_ID = m.BUSINESSID) union select x.HCO_MASTER_ID OBJECT_ID, m.LASTUPDATEDATE, 'HCO' as OBJECT_TYPE, m.COUNTRY from odp_core_staging.INFA_HCO_XREF x inner join odp_core_staging.INFA_HCO_MAIN m on (x.HCO_MASTER_ID = m.BUSINESSID);

Procedure in IDP Database

The below procedure must be present in your environment.

INFA_MDM_POPULATE_EXTRACT_PUB_CTRL

Create this procedure if not already present. This procedure uses the previously created view and this is a one time activity. Click on the link below to download the file.

INFA_MDM_POPULATE_EXTRACT_PUB_CTRL

Notification Channels At least one notification channel must be configured among Email IDMS Teams and Slack. For more information, see Adding Alert Notification.

MDM File Extract Customer Overview

The Informatica MDM Customer Data Extract process helps to download the data from Informatica orgs using the s3Connector and sqlExecutor plugins. The data extract process is given below.

  1. At the first stage, data from Informatica MDM is extracted to the IDP Platform in the Landing Tables and then to Staging Tables in JSON format.

  2. The Normal Views are created from the data available at the Staging Tables.

  3. The data extracted in the Views are exported into flat files which are stored in S3 Bucket.

  4. Data files can be extracted in two different modes using the parameter ExportDataRange. They are Full and Incremental.

  5. The Full extract mode extracts the entire data from Informatica MDM.

  6. In the case of incremental extract mode, the file extract process identifies the entities that are to be extracted to flat files with the help of Last Run Date in the extract normal views and stores the information of these entities in Publish Control Table (ODP_CORE_LOG.MDM_EXTRACT_PUB_CNTL) with the status as PENDING_EXTRACT.

  7. All the information (attributes, nested attributes, relations & merges) is extracted for each customer entity in the extract publish control table. After the completion of file extraction process, status is set to SUCCESSFUL.

  8. Files are extracted to s3 location (<defaultBucket>/infa_customer_extract/0Kt4sn9AatgcdPV0rBbtUg/).

Follow the below steps to execute the Customer Data Extract.

  1. Import the task group template.

  2. Configure the task group.

  3. Extract data from Informatica MDM.

Import the Task Group Template

To extract Customer Data from Informatica MDM, you need to first create a Task Group in IDP OA platform to execute the extract process. Follow the below steps to create a task group in IDP OA.

  1. Using WinSCP or s3 browser, connect to the IDP default s3 bucket and go to the folder <bucket_name>/templates/product.

  2. Find the template Informatica_MDM_File_Extract_US_Customer_<version>.json and download the latest version template to the local machine.

  3. Open the pipeline template Informatica_MDM_File_Extract_US_Customer_<version>.json in any text editor and replace all the occurrences of the below place holders and save the file.

    Placeholder

    Replaceable String

    Description

    Default Value

    <__INFA_ORG_ID__>

    Change the Informatica ORG ID

    E.g. 0Kt4sn9AatgcdPV0rBbtUg

      No default value. Should be provided with valid value.

    <DATABASE_NAME>

    Database Name

    E.g. IDP_IQVIADEV_OAIDP_DEV1_USV_ENV_DWH

     

    No default value. Should be provided with valid database name.

    Note:   

    IDP_SCHEMA_NAME is by default MDM_OUT_STAGING, can be changed if needed.

  4. Login to IDP OA platform and under the Data Management section, click Data Pipeline.

  5. On the Landing Page, click Data Pipeline tile to open the Task Group Pipeline Flow.

  6. Click Task Group from Template, select the latest downloaded template Informatica_MDM_File_Extract_Customer and then click OPEN.

  7. The pipeline task group for Informatica_MDM_File_Extract_US_Customer is created. This task group is used for executing the data extract process.

Configure the Task Group

Verify the below task group and task parameters. Configure them accordingly based on the requirement.

Parameter Name

Parameter Level

Default Value

Description

IDP_SCHEMA_NAME Task Group MDM_OUT_STAGING -
INFA_ORG_ID Task Group 0Kt4sn9AatgcdPV0rBbtUg -

IS_FULL_EXTRACT

Task Group

false

To extract full customer data to files, change the value of this parameter to true

Keep the value as false, to extract incremental (delta) data.

Example:

For value true, date used to extract the data to files is always between 1900-01-02 00:00:00 to current timestamp.

For value false, date used to extract the data to files is the previous extract date to current timestamp.

    File Extract Run1: 1900-01-02 00:00:00 to 2021-08-17 03:52:17

File Extract Run2: 2021-08-17 03:52:18 to 2021-08-18 09:33:46

File Extract Run3: 2021-08-18 09:33:47 to 2021-08-19 11:01:21

COUNTRY_LIST Task Group true

You can extract country specific data such as: US,DE,CA

(Mention the country codes that you want to extract each code separated by a comma).

COUNTRY_BASED_EXTRACT

CUSTOM_COUNTRY_LIST

Task Group true

When both COUNTRY BASED EXTRACT and CUSTOM COUNTRY LIST are true, you can extract country specific data in separate folders for each country that you give in the country list.

When COUNTRY BASED EXTRACT is true and CUSTOM COUNTRY LIST is false, data for all the countries are extracted into separate folders for each country.

When COUNTRY BASED EXTRACT is false and CUSTOM COUNTRY LIST is true, flat files with country data is extracted into a single folder for the countries that are listed.

Example: US,DE,CA

When both COUNTRY BASED EXTRACT and CUSTOM COUNTRY LIST are false, the universal data is extracted.

PIPELINE_CODE Task Group   The pipeline code is Informatica_MDM_File_Extract_Customer.
MDM_DOMAIN Task Group   MDM Domain is INFACUSTOMER
CUSTOMER_TYPE_LIST Task (File_Extract) \'HCO\',\'HCP\'

You can extract customer type list such as: \'HCO\',\'HCP\'

Mention the entity name followed by a backslash placed within the single quotation marks with each entity separated by a comma.

EXTRACT_FILE_FORMAT

Task (File_Extract)

CSV

Extract file format.

Can be any one of the below values.

CSV = pipe delimited

CSV_COMPRESSED = pipe delimited with gz compression.

PARQUET = parquet

PARQUET_COMPRESSED = parquet with snappy compression

EXTRACT_FILE_NAME_PREFIX

Task (File_Extract)

MDM_CM

Value of this parameter is used as a prefix name to all the extracted files. Change this value if a different prefix name is required.

Example:

MDM_CM_HCO_20210816092140.csv

MDM_CM_HCP_20210816092140.csv

MDM_CM_COMM_20210816092140.csv

MDM_CM_AFFILIATION_20210816092140.csv

DATABASE_NAME Task Group   IDP_IQVIADEV_OA_DEV_NEWSANDBOX_ENV_DWH
JSON Task (File_Extract)  

By default this parameter contains a json array config which contains list of sql queries that need to be extracted as flat files.

Task group users can add their list of queries to extract any extra data from outbound views based on requirement. Below json config is an example on how to extract HCP data from reltio outbound views to flat files.

{"exportQuery":"SELECT MDM_EXTRACT_PUB_CNTL.OBJECT_TYPE, HCO_Unit.UNIT_SOURCEPKEY, HCO_Unit.UNIT_PARENTID, HCO_Unit.UNIT_FIELDGROUP_STATE, HCO_Unit.UNIT_UNITNAME, HCO_Unit.UNIT_UNITOFFICIALCODE, HCO_Unit.UNIT_UNITVALUE, HCO_Unit.UNIT_UNITVALUEX, HCO_Unit.UNIT_UNITTYPE, HCO_Unit.DW_ID, HCO_Unit.BUSINESS_KEY, HCO_Unit.CREATE_DTT, HCO_Unit.UPDATE_DTT, HCO_Unit.SCD_EFF_DT, HCO_Unit.SCD_END_DT, HCO_Unit.CREATED_BY, HCO_Unit.UPDATED_BY, HCO_Unit.SRC_COUNTRY, HCO_Unit.SRC_DATA_ASSET_NAME FROM ODP_CORE_STAGING.HCO_Unit inner join ODP_CORE_LOG.MDM_EXTRACT_PUB_CNTL ON ODP_CORE_STAGING.HCO_Unit.UNIT_PARENTID = ODP_CORE_LOG.MDM_EXTRACT_PUB_CNTL.OBJECT_ID WHERE ODP_CORE_LOG.MDM_EXTRACT_PUB_CNTL.EXTRACT_STATUS = \'PENDING_EXTRACT\' AND ODP_CORE_LOG.MDM_EXTRACT_PUB_CNTL.MDM_DOMAIN = \'@MDM_DOMAIN\' AND ODP_CORE_LOG.MDM_EXTRACT_PUB_CNTL.object_type IN (@CUSTOMER_TYPE_LIST)","fileName":"HCO_Unit","emptyFileOption":"HEADERS"}] exportQuery This attribute should contain any valid query which joins a reltio outbound view with table ODP_CORE_LOG.MDM_EXTRACT_PUB_CNTL.

Always pub control table should be filtered with specific conditions as given in the above example.

EXTRACT_S3_BASE_FOLDER

Task (File_Extract)

customer_extract

This is the customer extract base folder where extract files are stored.

E.g. customer_extract/informatica_mdm_cm/20210816092140/

Extract Data from Informatica MDM

After the successful import, navigate to the task group Informatica_MDM_File_Extract_US_Customer.

Click the Informatica_MDM_File_Extract_US_Customer task group to open it. See figure below.

Note:   

The Status and Last Executed fields are marked as NA until you run the task group.

Navigate to the Tasks tab. The tasks associated with Reltio File Extract US Customer are listed in the Tasks tab.

There are four tasks present in this task group (MDM_File_Extract_US_Customer).

  1. Stage_Subscription_Data

  2. Populate_Extract_Publish_Control

  3. File_Extract

  4. Update_Extract_Publish_Control

The task Stage_Subscription_Data performs the below steps.

  1. This task will extract specific entities along with the regular data extract process.

  2. To extract the entities, create a folder with the name Informatica_MDM_Customer_Extract/cm_subscription_extract/input/ in S3 bucket under Informatica MDM folder.

  3. Place the list of IDs (reltio entity id) that you want to extract in cm_subscription_extract.dat file as per the format given in para 4 below and place it in the cm_subscription_extract/input folder in S3 bucket.

  4. The .dat file will only have one ID column with the list of IDs placed below it within double quotes as shown in the figure below:


  5. The file pattern must be cm_subscription_extract_*.dat

  6. After the successful execution, these files are extracted along with the current file extract process.

Note:   

This task is valid only for the current file extract process. Once the IDs are extracted successfully, the list is no longer valid. You will have to place a fresh cm_subscription_extract.dat file with the IDs that you want to extract in the subsequent file extract process. If the extract is unsuccessful in the current extract, then the IDs are extracted in the subsequent file extract process.

If the subscription data file is not placed in the dedicated s3 location during the regular extract process, it is assumed that the cm_subscription_extract.dat file is empty.

Task Populate_Extract_Publish_Control performs the below steps.

  1. Get the last run date from ODP_CORE_LOG.MDM_PROCESS_CONTROL_LOG.

  2. Identify the entities in outbound views with informatica update date between Last Run Date, Current Date and Time.

  3. Populate the identified entities to ODP_CORE_LOG.MDM_EXTRACT_PUB_CNTL with extract status as PENDING_EXTRACT.

  4. Sets the last run date in ODP_CORE_LOG.MDM_PROCESS_CONTROL_LOG with current date and time.

The task File_Extract performs the below steps.

  1. Extracts data from normal views based on the JSON parameter configuration to the flat files under the configured folder in s3 location for all the customer entities present in MDM_EXTRACT_PUB_CNTL.

  2. Each extract is placed in a folder with the folder name as date timestamp.

    (Exampleoaidp-dev-usv-iqviadev-odp/CM_extract/INFORMATICA_MDM_CM/20210817053631/).

Task Update_Extract_Publish_Control does the below step.

  1. Updates the PENDING_EXTRACT status to SUCCESSFUL in MDM_EXTRACT_PUB_CNTL table for all the Customer entities.

Login to IDP OA platform and find the task group Informatica_MDM_File_Extract_US_Customer and click RUN. After executing the task group successfully, the extract files are available at Informatica_MDM_Customer_Extract/DataExtract/ located in s3 folder.

Note:   

The extracted files location will depend on the ORG ID defined in the parameters section.

Example:   infa_customer_extract/0Kt4sn9AatgcdPV0rBbtUg/

Troubleshooting

  • Total four tasks are present in the task group Informatica_MDM_File_Extract_US_Customer.

  • In case of failure in any task, fix the error and restart the task from the failed task till the end. If unable to identify or fix the error, contact MDM support team.