Additional Tables Flattening And Transliteration

Introduction

The streaming data can be used to derive multiple functionalities performed on the Reltio-based MDM System.

Reading Additional Tables

The additional tables which were populated with streaming data have the same structure as the EVENT_RTI.

table. It contains a column called event_payload, which is a json of the reltio event data. Filter the table data by object_name like 'configuration/entityTypes/' and you can get entity data. The OBJECT_ID has the entity_uri.

Flatten the JSON and get the required columns from it to populate the transliteration table.

Sample Transliteration Flattened Table Example

Table Name: Transliterate_Names

Column Name	Data type	Description
Event_id	varchar	Id from event_rti table
Entity_Uri	varchar	Uri of the entity being transliterated
Created By	varchar	User id
firstName	varchar	source transliterated firstName
lasName	varchar	source transliterated lastName
firstNameX	varchar	target transliterated firstName
lastNameX	varchar	target transliterated lastName
data_source	varchar	source providing the entity data, example. configuration/sources/OK
crosswalk_value	varchar	crosswalk value used to identify the entity for update
source_table	varchar	source table to be used for crosswalk

Sample Query to Populated Flatten Table

insert into Transliterate_Names (enitity_uri,createdBy,firstName,lastName)

select object_id as uri,

parse_json(EVENT_PAYLOAD):object:createdBy::String as createdBy,

parse_json(EVENT_PAYLOAD):object:attributes:firstName[0]:value::String as firstName,

parse_json(EVENT_PAYLOAD):object:attributes:lastName[0]:value::String as lastName,

parse_json(EVENT_PAYLOAD):object:attributes:Country[0]:value::String as Country

from ODP_CORE_STREAM.MDM_RELTIO_DEV2_V3_EVENT_RTI

where object_name like 'configuration/entityTypes%'

After the transliteration tables have been populated, a pipeline can be created to transliterate and populate the target columns The below section explains the creation and configuration of this pipeline.

Create/Configure Transliteration Pipeline

Prerequisites

Access to ODP s3 bucket
Permission to run a data pipeline

To Create/Configure Transliteration Pipeline

Go to the data pipeline URL: <application url>/datapipeline/task-group.

Example: https://iqviadev-dev-sandbox.dev-usv.iqodp.io/datapipeline/task-group
Create a new task group or use an existing task group. Click Task Group.
Click Add Task to add a new task.
Choose the Python Executor plugin type.
Copy the script folder from the S3 ODP bucket to your S3 location. The script folder should be in s3://<odp bucket>/Tools/transliteration_python_script

Example: s3://oa-dev-sandbox-env-iqviadev-odp/Tools/transliteration_python_script

Alternatively the script folder can be downloaded from Gitlab

https://gitlab.ims.io/End2EndPlatfrom/idp/data-mgmt-scripts/-/tree/dev/Scripts/Tools/transliteration_python_script.

Go to the Python task created from step 3.

Click Add Step to add a new step.

Name	Description
Name	Name of step.
Connection	Name of s3 connection that has access to the script you copied.
Python Folder	Folder path of script folder. Example. root/Folder 1/transliteration_python_script
Main Script File	Should be main.py.
Requirements File	Should be requirements.txt
Timeout in Minutes	Increase this if you are transliterating a large number of rows(above 10,000+).

Environment Variables

Environment variables control how the transliteration script is run. They are key value pairs using = and are separated by ;.

Example: DATABASE=;SCHEMA=ODP_CORE_LANDING;TABLE=TEST_TABLE;COLUMN=ORIGINAL_LANG,ORIGINAL_LANG_2;TRANSLITERATE_COLUMN=PYICU_TRANSLITERATE,PYICU_TRANSLITERATE_2;CONNECTION=database-default

Name	Required	Description
DATABASE	N	Name of database to use. By default it can use the default database specified in the connection.
SCHEMA	Y	The name of the schema your table is in.
TABLE	Y	The name of your table
COLUMN	Y	Name of column to transliterate. These can be a comma separated value. Example: ORIGINAL_LANG,ORIGINAL_LANG_2
TRANSLITERATE_COLUMN	Y	Name of column to create for transliteration. These can be comma separate values. Example: RESULT_COLUMN_1,RESULT_COLUMN_2
CONNECTION	N	Name of database connection to use. By default this is database-default.
BATCH_SIZE	N	Number of rows to pull from snowflake. By default this is 500. Lower this if you run into memory issues.
THREAD_POOL_SIZE	N	Number of threads to use. Default is 15. Lower if you run into memory issues. Increase for faster performance.

Run the Task Group.

If you select from the table, after the task group is run, the columns defined in TRANSLITERATE_COLUMN should be present with the transliterated value in Latin script(English alphabet).