Move Staging to Canonical
Overview
Staging to Canonical Copy process moves data from IDP staging to appropriate canonical schema tables using snowflake database stored procedure called MOVE_STG_TO_CANONICAL based on the high level JSON configuration.
Features
The following are the features of Staging to Canonical:
-
Not limited to any specific schema. It helps to copy data from any schema to any other schema or same schema.
-
Simple string, constant, date formats, any function call.
-
Filter data on source table or view.
-
Reference data, fill the specific fields from other table reference. (reference can be a view or table).
-
One source to many target table can be configured but data is writing based on one to one mapping. (Example, few fields from HCP table can be mapped to PROFESSION table and rest can be mapped to HCP table).
-
RDM validation is supported based on configuration. while moving it can do following action based on the reject reason configured:
-
It nullifies the attributes.
-
It rejects the child records.
-
It rejects the parent and child.
-
-
Incremental copy can be configured per last run date.
Sample JSON Configuration
{
"dataSource": "GIF",
"sourceSchema": "ODP_CORE_STAGING",
"targetSchema": "MDM_CANONICAL",
"rdmSchema": "MDM_OUT_STAGING.RDM_SourceMappings",
"enableRdmLookup": true,
"dateFormat": "MM-DD-YYYY",
"mapping": [
{
"sourceTable": "MDM_GENERICIMPORT_COMM",
"targetTable": "C_COMM",
"isLookupTable": true,
"parentTable": "MDM_GENERICIMPORT_HCP|MDM_GENERICIMPORT_HCO",
"parentKey": "CUSTOMER_ID",
"fields": [
{
"sourceField": "DW_ID",
"targetField": "COMM_ID",
"targetFieldType": "key"
},
{
"sourceField": "DW_ID",
"targetField": "CUSTOMER_ID",
"targetFieldType": "reference",
"referenceTable": "select DW_ID,CUSTOMER_ID from ODP_CORE_STAGING.MDM_GENERICIMPORT_HCP union select DW_ID,CUSTOMER_ID from ODP_CORE_STAGING.MDM_GENERICIMPORT_HCO",
"referenceAlias": "a",
"referenceConstraint": "a.CUSTOMER_ID = st.CUSTOMER_ID"
},
{
"sourceField": "BUSINESS_KEY",
"targetField": "SRC_COMM_ID",
"targetFieldType": "String"
},
{
"sourceField": "CUSTOMER_TYPE",
"targetField": "CUSTOMER_TYPE",
"targetFieldType": "String"
},
{
"sourceField": "NUMBER",
"targetField": "VALUE",
"targetFieldType": "String"
},
{
"sourceField": "TYPEIMS",
"targetField": "TYPE",
"targetFieldType": "String",
"lookupHeader": "LKUP_IMS_COMMUNICATION_TYPE",
"rdmValidationFailureAction": "Error-Attribute"
},
{
"sourceField": "ACTIVE",
"targetField": "ACTIVE",
"targetFieldType": "String"
},
{
"sourceField": "DW_ID",
"targetField": "AFFILIATION_ID",
"targetFieldType": "reference",
"referenceTable": "select DW_ID,AFFILIATION_ID from ODP_CORE_STAGING.MDM_GENERICIMPORT_RELATIONS",
"referenceAlias": "b",
"referenceConstraint": "b.AFFILIATION_ID = st.AFFILIATION_ID"
},
{
"sourceField": "DW_ID",
"targetField": "CUSTOMER_ADDRESS_ID",
"targetFieldType": "reference",
"referenceTable": " select DW_ID,CUSTOMER_ID,ADDRESS_ID from ODP_CORE_STAGING.MDM_GENERICIMPORT_CUST_ADDRESS",
"referenceAlias": "c",
"referenceConstraint": " c.CUSTOMER_ID = st.CUSTOMER_ID and c.ADDRESS_ID = st.ADDRESS_ID "
},
{
"sourceField": "AREACODE",
"targetField": "AREA_CODE",
"targetFieldType": "String"
},
{
"sourceField": "COUNTRYCODE",
"targetField": "COUNTRY_CODE",
"targetFieldType": "String"
},
{
"sourceField": "DIGITCOUNT",
"targetField": "DIGIT_COUNT",
"targetFieldType": "String"
},
{
"sourceField": "EXTENSION",
"targetField": "EXTENSION",
"targetFieldType": "String"
},
{
"sourceField": "GEOAREA",
"targetField": "GEO_AREA",
"targetFieldType": "String"
},
{
"sourceField": "GEOCOUNTRY",
"targetField": "GEO_COUNTRY",
"targetFieldType": "String"
},
{
"sourceField": "LINETYPE",
"targetField": "LINE_TYPE",
"targetFieldType": "String"
},
{
"sourceField": "LOCALNUMBER",
"targetField": "LOCAL_NUMBER",
"targetFieldType": "String"
},
{
"sourceField": "RANK",
"targetField": "RANK",
"targetFieldType": "String"
},
{
"sourceField": "VALIDATIONSTATUS",
"targetField": "VALIDATION_STATUS",
"targetFieldType": "String",
"lookupHeader": "LKUP_IMS_VAL_STATUS",
"rdmValidationFailureAction": "Error-Attribute"
}
]
}
]
}
Attribute Definitions
sourceSchema |
Schema name where you want to copy the data from. |
targetSchema |
Schema name where you want to copy the data into. |
enableRdmLookup |
Flag to turn ON/OFF the rdm lookup. valid values are true/false. |
rdmSchema |
Schema name where you want to validate the RDM codes. |
dateFormat |
Source date field format- note, date fields should adhere to this configuration otherwise data cannot be moved. |
Table level |
|
sourceTable |
Source schema table baiscally this table can have alias name as st. |
targetTable |
Target schema table |
isLookupTable |
RDM validation pertaining to this table mapping. |
parentTable |
If you want to enforce RDM validation on reference filed, then define that references tables at top level for RDM processing. |
parentKey |
Key to refer the records in above parentTable. |
condition |
Filter source staging table based on conditions. conditions can be constant value or sub query based. for example: "condition": "st.ID_TYPE = ''ID''" "condition": "st.dw_id in (select customer_id from mdm_canonical.c_customer) and st.dw_id in (select customer_id from mdm_canonical.c_customer) ", "condition": "a.DW_ID is not null and b.DW_ID is not null and a.dw_id in (select customer_id from mdm_canonical.c_customer) and b.dw_id in (select customer_id from mdm_canonical.c_customer) " |
Field level |
|
sourceField |
Source table column |
targetField |
Target table column. |
targetFieldType |
|
key |
Key type target field values are used to identify the operation type, basically if the key exist in target table then update fires otherwise insert query fires. |
date |
Root node date format is used for conversion. |
string |
Data is copied as it is to the target. |
constant |
Target field is assigned with this constant value provided in the source field. |
Function |
Define the sourcetable column name with alias name st.<COLUMN_NAME>, for example. st.EMAILOPTOUT in the below example. Copy
|
reference |
Target field is assigned with inline query formed using the referenceTable, referenceAlias and refereceConstraint attributes. primary key, referenceConstraint mapping needs to be configured appropriately. since it is generic procedure, keys can't be validated while transforming. |
Copy
If you see the above example, referenceTable: It has DW_ID,CUSTOMER_ID in select clause which is nothing but the sourceField and referenceConstraint values. referenceAlias: This alias name is used in the referenceConstraint a.CUSTOMER_ID = st.CUSTOMER_ID. configure this value unique when you have more than one reference type configuration. referenceConstraint: Basically the join condition for the parent and the child table. here parent table is sourceTable that can have alias name as always st("sourceTable": "MDM_GENERICIMPORT_COMM") and child table is referenceTable. |
Mapping Configuration
Mapping JSON string has to be passed as procedure parameter value in the pipeline job. For each entity/sub entity, there is a configuration created for better UI presentation, however procedure supports the all configurations under one JSON mapping for processing. Client team needs to add/remove fields as per their configuration and update those specific parameter accordingly.