MDM Reltio Extract
Prerequisites
The following are the prerequisite conditions for Generic Customer Multi Source Data Load to China hosted Reltio:
Area | Prerequisite |
---|---|
Reltio Connection Properties |
Make sure to add the resolveRelationEdgeTypes in OA reltio connection properties under options property as highlighted in red below. If already done then ignore this step. Example: auth_host=auth.reltio.com;auth_path=/oauth/token;client_id=cmVsdGlvX3VpOm1ha2l0YQ==;svc_host=test.reltio.com;export_svc_path=/jobs/export;export_status_svc_path=/reltio;tenant_id=z2wSGtOLkuXY8T9; fileFormat=json;distributed=true;taskPartsCount=6;options=parallelExecution,resolveRelationEdgeTypes,resolveMergedEntities;partSize=100mb;s3Connection=s3_connector;s3Bucket=oaidp-dev-usv-iqviadev-odp;s3Folder=mdm/plugins;esbConnection=esb_default;rdmConnection=mdm_reltio_dev2_abcrdm If resolveRelationEdgeTypes is enabled for the first time for a reltio tenant then, latest task group Reltio_MDM_Extract should be run in full refresh (IDL) mode. |
Setup Entities and Relations to be Ignored during Extract
It is recommended to exclude the Location entity type and has address relation type as they are not used. Address attributes are taken from Entities Address reference attributes itself. Hence these are redundant.
-
Login to IDP OA platform and under Business Unit, select IT Support.
-
Click Entity Collection.
-
Click
of ODP.Admin.ReltioIgnoreObjects as shown in the image below.
-
Create two entries as shown below. Replace ReltioConnectionName with Reltio Connection Name configured in connection strings.
Import the Reltio MDM Extract Pipeline Template
To Import the Reltio MDM Extract Pipeline Template do as below:
-
Connect to IDP default s3 bucket and go to the folder <bucket_name>/templates/product.
-
Download Reltio MDM Extract_<version>.json file to local folder. If there are multiple files, download the latest version.
-
Open the downloaded template json file in a text editor and replace the following placeholders with appropriate values.
Placeholder Replaceable String __SOURCE__ Ideally it can be any source name, recommended to put OV __RELTIO_CONNECTION_NAME__ Reltio connection name configured in connection string. __COUNTRY__ Put a country name as US or if it'sa region put NA or EU. Note:
It is recommended to use separate Reltio users for Reltio connection and RDM connection in configuration settings.
-
Login to IDP OA platform and under the Data Management section, click Data Pipeline.
-
On the Landing Page, click Data Pipeline tile to open the Task Group Pipeline Flow.
-
Click Task Group from Template, select the latest downloaded template Reltio MDM Extract_<version>.json and then click OPEN.
-
The pipeline task group for Reltio MDM Extract will be created. This task group is used for executing the data load process.
Extract Data from Reltio
Follow the below steps to extract the data from Reltio:
-
From the IDP OA platform, open the task group Reltio MDM Extract and then click Tasks tab.
-
The below figure shows the list of Tasks and their Task Plugins that are displayed under Tasks tab.
-
To extract the data from Reltio, RUN the task group Reltio MDM Extract.
The task group Reltio_MDM_Extract contains two tasks:
-
Reltio Outbound
-
Post Staging Process
Below are the details of each task.
Reltio Outbound
This task extracts the data from reltio tenant into configured s3 folder as gzip files for each object (entities, relations, merges and RDM). These files are extracted and loaded to landing tables in redshift database. From landing tables data is loaded into staging tables.
The steps present in this task are executed in sequential order and not in parallel.
Landing Tables | Staging Tables |
---|---|
ODP_CORE_LANDING.<RELTIO_CONN_NAME>_ENTITIES_LND ODP_CORE_LANDING.<RELTIO_CONN_NAME>_RELATIONS_LND ODP_CORE_LANDING.<RELTIO_CONN_NAME>_MERGES_LND ODP_CORE_LANDING.<RELTIO_CONN_NAME>_RDM_LND |
ODP_CORE_STAGING.<RELTIO_CONN_NAME>_ENTITIES_HIST ODP_CORE_STAGING.<RELTIO_CONN_NAME>_RELATIONS_HIST ODP_CORE_STAGING.<RELTIO_CONN_NAME>_MERGES_HIST ODP_CORE_STAGING.<RELTIO_CONN_NAME>_RDM_HIST
ODP_CORE_STAGING.<RELTIO_CONN_NAME>_ENTITIES ODP_CORE_STAGING.<RELTIO_CONN_NAME>_RELATIONS ODP_CORE_STAGING.<RELTIO_CONN_NAME>_MERGES ODP_CORE_STAGING.<RELTIO_CONN_NAME>_RDM
|
Note: |
Make sure that there should not be any dependency objects created on below list of staging tables in redshift database. If there is a need to create views on these tables then use the no schema binding option on views. |
(ODP_CORE_STAGING.<RELTIO_CONN_NAME>_ENTITIES,
ODP_CORE_STAGING.<RELTIO_CONN_NAME>_RELATIONS,
ODP_CORE_STAGING.<RELTIO_CONN_NAME>_MERGES,
ODP_CORE_STAGING.<RELTIO_CONN_NAME>_RDM)
Sync Objects and Attributes:
Open the task group Reltio MDM Extract and then click the task Reltio Outbound to open it. Navigate to the Steps tab and click Sync Objects and then click Save as shown in the figure below.
Open the entities step, click Sync Attributes and then click Save.
Note: |
Do not change any other properties. |
Repeat the above step for relations, merges and RDM steps. The Extract pipeline will now be ready for execution.
Note: |
Make sure the user configured in Reltio connection strings has ROLE_ANALYTICS role. |
Post Staging Process
This task sets reltio end date for merge loser entity Uris present in entities table (ODP_CORE_STAGING.<RELTIO_CONN_NAME>_ENTITIES).
Enable Staging
In the Reltio Outbound task, for every step (entities, relations, merges and RDM), you can configure the Enable Staging checkbox. See the figure below.
-
When you Select the Enable Staging checkbox, the data will be loaded till the Staging tables.
-
When you Clear the Enable Staging checkbox, the data will be loaded only till the Landing tables.
Troubleshooting
The following are the troubleshooting details:
-
During IDL extract from Reltio to IDP, if Reltio has huge amount of data, there is a chance that Reltio's extract job is not finished within 24 hrs. Currently IDP has a limit of long running job cannot run more than 24 hrs. Although IDP job is abandoned due to the long running job limit, Reltio still runs the extraction in the S3 bucket. There are few ways to resolute the failure.
-
Create a ticket with Reltio if there is a need to increase the Resources in case of the tasks are WAITING_FOR_RESOURCES to run in parallel. To identify, please use the transaction: {{ReltioEnvUrl}}/jobs/{{TenantId}}/tasks and view the tasks running. If any old running jobs are present, ask Reltio to kill it or use (tasks/_stop or tasks/_force_stop)
-
If the above is done, please rerun the job once again and verify if still the issue persists.
-
-
If the job still fails, then proceed to the next steps for successful completion.
-
Wait for Reltio to finish the job. When the job finishes, {{ReltioEnvUrl}}/jobs/{{TenantId}}/tasks can show about the COMPLETION status OR the folder where it is getting extracted can have an empty _SUCCESS file.
-
Once the job finishes by Reltio, please run the stored procedure choosing the Reltio types to process one by one / Please choose whichever failed in the process to run.
Change the parameter RELTIO_CONN_NAME (Same Connection which is used to extract the data) & S3_INPUT_FOLDER (Folder name where Reltio extraction is finished )before running the pipeline.
To run the different profiles:
-
call ODP_CORE_LOG.MDM_RELTIO_DATA_EXTRACT_LOAD('<Reltio Connection Name>','S3 input folder', 'entities' );
-
call ODP_CORE_LOG.MDM_RELTIO_DATA_EXTRACT_LOAD('<Reltio Connection Name>','S3 input folder', 'relations' );
-
call ODP_CORE_LOG.MDM_RELTIO_DATA_EXTRACT_LOAD('<Reltio Connection Name>','S3 input folder', 'merges' );
-
call ODP_CORE_LOG.MDM_RELTIO_DATA_EXTRACT_LOAD('<Reltio Connection Name>','S3 input folder', 'rdm' );
-
-
After the stored procedure is done (it can take time based on the amount of data), please run the Outbound Staging process to push the data to Staging tables to finish the steps.
-