Additional Tables Flattening And Transliteration

Introduction

The streaming data can be used to derive multiple functionalities performed on the Reltio-based MDM System.

Reading Additional Tables

The additional tables which were populated with streaming data have the same structure as the EVENT_RTI.

table. It contains a column called event_payload, which is a json of the reltio event data. Filter the table data by object_name like 'configuration/entityTypes/' and you can get entity data. The OBJECT_ID has the entity_uri.

Flatten the JSON and get the required columns from it to populate the transliteration table.

Sample Transliteration Flattened Table Example

Table Name: Transliterate_Names

Column Name

Data type

Description 

Event_id

varchar

Id from event_rti table

Entity_Uri

varchar

Uri of the entity being transliterated

Created By

varchar

User id 

firstName

varchar

source transliterated firstName

lasName

varchar

source transliterated lastName

firstNameX

varchar

target transliterated firstName

lastNameX

varchar

target transliterated lastName

data_source

varchar

source providing the entity data, example. configuration/sources/OK

crosswalk_value

varchar

crosswalk value used to identify the entity for update

source_table

varchar

source table to be used for crosswalk 

Sample Query to Populated Flatten Table

insert into Transliterate_Names (enitity_uri,createdBy,firstName,lastName)

select object_id as uri,

parse_json(EVENT_PAYLOAD):object:createdBy::String as createdBy,

parse_json(EVENT_PAYLOAD):object:attributes:firstName[0]:value::String as firstName,

parse_json(EVENT_PAYLOAD):object:attributes:lastName[0]:value::String as lastName,

parse_json(EVENT_PAYLOAD):object:attributes:Country[0]:value::String as Country

from ODP_CORE_STREAM.MDM_RELTIO_DEV2_V3_EVENT_RTI

where object_name like 'configuration/entityTypes%'

After the transliteration tables have been populated, a pipeline can be created to transliterate and populate the target columns The below section explains the creation and configuration of this pipeline.

Create/Configure Transliteration Pipeline

Prerequisites

  • Access to ODP s3 bucket

  • Permission to run a data pipeline

To Create/Configure Transliteration Pipeline

  1. Go to the data pipeline URL: <application url>/datapipeline/task-group.

    Example: https://iqviadev-dev-sandbox.dev-usv.iqodp.io/datapipeline/task-group

  2. Create a new task group or use an existing task group. Click Task Group.

  3. Click Add Task to add a new task.

  4. Choose the Python Executor plugin type.

  5. Copy the script folder from the S3 ODP bucket to your S3 location. The script folder should be in s3://<odp bucket>/Tools/transliteration_python_script

    Example: s3://oa-dev-sandbox-env-iqviadev-odp/Tools/transliteration_python_script

    Alternatively the script folder can be downloaded from Gitlab

    https://gitlab.ims.io/End2EndPlatfrom/idp/data-mgmt-scripts/-/tree/dev/Scripts/Tools/transliteration_python_script.

  6. Go to the Python task created from step 3.

    Click Add Step to add a new step.

    Name

    Description

    Name

    Name of step.

    Connection

    Name of s3 connection that has access to the script you copied.

    Python Folder

    Folder path of script folder. Example. 

    root/Folder 1/transliteration_python_script

    Main Script File

    Should be main.py.

    Requirements File

    Should be requirements.txt

    Timeout in Minutes

    Increase this if you are transliterating a large number of rows(above 10,000+).

Environment Variables

Environment variables control how the transliteration script is run. They are key value pairs using = and are separated by ;.

Example: DATABASE=;SCHEMA=ODP_CORE_LANDING;TABLE=TEST_TABLE;COLUMN=ORIGINAL_LANG,ORIGINAL_LANG_2;TRANSLITERATE_COLUMN=PYICU_TRANSLITERATE,PYICU_TRANSLITERATE_2;CONNECTION=database-default

Name

Required

Description

DATABASE

N

Name of database to use. By default it can use the default database specified in the connection.

SCHEMA

Y

The name of the schema your table is in.

TABLE

Y

The name of your table

COLUMN

Y

Name of column to transliterate. These can be a comma separated value.

Example: ORIGINAL_LANG,ORIGINAL_LANG_2

TRANSLITERATE_COLUMN

Y

Name of column to create for transliteration. These can be comma separate values.

Example: RESULT_COLUMN_1,RESULT_COLUMN_2

CONNECTION

N

Name of database connection to use. By default this is database-default.

BATCH_SIZE

N

Number of rows to pull from snowflake. By default this is 500. Lower this if you run into memory issues.

THREAD_POOL_SIZE

N

Number of threads to use. Default is 15. Lower if you run into memory issues. Increase for faster performance.

Run the Task Group.

If you select from the table, after the task group is run, the columns defined in TRANSLITERATE_COLUMN should be present with the transliterated value in Latin script(English alphabet).

Sample File