To work with Splice Machine target instance transform in data flow, follow below steps.

Note:

Selecting the Splice Machine Data Object to be added as target instance transform

Option I: Selecting the Splice Machine Data Object to be added as target instance transform

In the data flow canvas move to Data Flow pane and navigate to Target menu. Here, you can either select or drag and drop the Splice Machine target to canvas.

If the Splice Machine data objects are available then the Create New Splice Machine Target window prompts you to select a Splice Machine data object else 'No Data Object exists in Splice Machine' message is displayed.

  • To search for a specific project, enter the keyword in the Project search bar, and the drop down displays the search result list. Select the required project and click Ok.
  • Select the Splice Machine data object from local object else if there is any Global project that user has access to, then the Project drop-down will list them. Select the Global project from the list if the data object must be chosen from there.

Note: The Data object to be used as target should already be available in Diyotta for it to appear in the list. For creating or importing the data object, refer Creating new data object.

After selection, click OK and the data flow canvas displays the selected Splice Machine data object as target instance.

Link the source instance transform to Splice Machine target instance transform.

Option II: You can also create a target, by using Create As Target option.

In the canvas, select the source data object and click ellipses, a drop down lists available options, from the list select Create As Target.

Likewise, you can also select the option Create As Target from Actions menu.

In both the scenarios, Create As Target window appears and prompts you to provide the details for Splice Machine target instance transform and upon confirmation the target instance transform appears in the canvas. For more information, refer Creating a target from transform in Data flow.

After selection, click OK and the data flow canvas displays the selected Splice Machine data object as target instance.

Configuring Splice Machine target instance transform 

In the General Tab edit the basic details associated with the Splice Machine target instance transform.

Note: In general you can only view Source/Target properties and Script tabs but in target instance by default you can also view Data tab used for debugging purpose.

1. The Name field auto populates the target instance name which is prefixed with TGT and the name is editable.

2. In the Description text-box, provide a description and is optional.

3. The Data Object Name displays the associated data object name and upon clicking the associated name you are navigated to respective data object canvas.

  • To change the associated data object, click Change, then the Change Target data Object window appears and lists the Splice Machine data objects. From the list, click on required data object and click OK.

  • To search for a specific project, enter the keyword in the Project search bar, and the drop down displays the search result list. Select the required project and click Ok.
  • If there is any Global project that user has access to, then the Project drop-down will list them. Select the Global project from the list if the data objects need to be chosen from there.

4. The Data Point field displays the associated data point with the source instance transform and when you click the associated name you are navigated to respective data point canvas.

5. The Transient Object field displays whether its a transient object or not.

6. The Database Type field displays the type of database to which the target is associated 

7. The Schema Name field displays target associated schema name.

Configuring properties

The target properties tab displays the target level options that can be applied to load the data.

1. Truncate Before Load:

Select this option if the database table associated with the target instance transform needs to be truncated before loading the data. 

2. Create Target:

Select this option to create table before loading data in the database based on the attributes in the associated data object. The table is created only if it does not exist in the database.

3. Pre-SQL:

The Pre-SQL option can be used to run a SQL on the target system before the load process begins. The SQL will execute on the data point that is associated with the target instance transform.

  • To add Pre-SQL query, click on the Expression Editor icon beside the field.
  • The Expression Editor window opens and you can add the SQL here.
  • After adding the Pre-SQL, to verify that there are no syntax errors, click Validate. Upon successful validation success message appears. Once done, click OK.

4. Post-SQL: 

The Post-SQL option can be used to run a SQL on the target system after the load process completes. The SQL will execute on the data point that is associated with the target instance transform.

  • To add Post-SQL query, click on the Expression Editor icon beside the field.
  • The Expression Editor window opens and you can add the SQL here.

  • After adding the Post-SQL, to verify that there are no syntax errors, click Validate. Upon successful validation success message appears. Once done, click OK.

The expression defined in Pre-SQL and Post-SQL can include Splice Machine database functions, Parameters, Functions, Reusable expressions, and UDFs.

  • The list of functions can be seen by selecting Functions from the drop down. The functions that can be used in the SQL is not limited by the list shown. All the Splice Machine database functions can be used in the SQL.
  • The list of parameters can be seen by selecting Parameters from the drop-down. It will show those that can be used in data flow - Data Flow Parameter, Data Flow SQL Parameter, Project Parameter and System Parameter can be used in the SQL. For more information, refer Working with Data Flow Parameters, Working with Data Flow SQL Parameters, Working with Project Parameters, Diyotta System Parameters.
  • List of expressions and UDFs can be seen under corresponding header in drop-down.

Viewing the Attributes in target Instance Data Flow

You can view the attributes that are in the associated data object. The attributes listed are not editable. The attributes can be edited only in the associated data object.

Mapping the Attributes in target Instance Data Flow

In the grid, the attributes in the associated data object are displayed under Target Attribute and you can either manually or automatically map these to attributes from the connected transforms. The mapped attributes will be displayed under the Source Attribute

Manual mapping: You can manually map the attributes to the target attribute by selecting it from the source Attributes drop down against the specific target attribute. The list will display all the attributes that can be mapped.

Auto mapping: You can automatically map all the target attributes by selecting the Auto Map icon.

All the target attributes are mapped by matching the name of the target attribute with the attributes in the linked transforms. If there are multiple transforms prior to the target then the attribute with the same name and which is closest to the target instance will be mapped. If there are no attributes that match the name of the target attribute then the source attribute corresponding to it will remain blank. This will have to be mapped manually.

To modify or remove a mapping, you can select a specific attribute and click Unlink target attribute icon.

You can also unmap all the associated source attributes by clicking Unlink all target Attributes icon.

Configuring the load type

You can specify what type of operation needs to be performed when loading data into the target table. Choose the operation from the Load Type drop-down.

Note: 

  • The load type drop-down will list only Insert operation if the target instance transform is external to the data flow.
  • For native target instance transforms, the load type will display Insert, Update, Upsert, Delete, and SCD operations.

Load Type: Insert

To insert the data into the target table based on the mapped attributes, select the load type as Insert. During execution the Insert Into statement is generated against the target instance transform. If a target attribute is not mapped in the attribute mapper then, in the insert statement the attribute is excluded and database NULL value will be loaded into it.

You can view the generated SQL in Script tab.

Load Type: Update

To update the values in the columns in the target table based on the value in one or multiple columns in the source, select the load type as Update. Select the attributes based on which the target table should be updated. Set this by enabling the Unique Key checkbox against it. During execution Insert Overwrite statement is generated and only those rows are written back to the table for which the key column values match between the target table and connected source transform output. If a target attribute is not mapped in the attribute mapper then, in the insert overwrite statement target attribute itself will be used to load data back into the target table.

You can view the generated SQL in Script tab.

Load Type: Upsert

To perform insert if not exists and update if exists operation during load using single target instance transform, select load type as Upsert. Select the attributes based on which it should be determined if the row should be inserted or updated. Set this by enabling the Unique Key checkbox against it. During execution Update statement is generated and only those rows are written back to the table for which the key column values match between the target table and connected source transform output. Insert statement is generated and only those rows are written into the target table for which the key column values do not match between the target table and connected source transform output. If a target attribute is not mapped in the attribute mapper then, in the insert statement the attribute is excluded and database NULL value will be loaded into it. In this case for the insert overwrite statement target attribute itself will be used to load data back into the target table. The attributes selected under insert are included in Insert SQL.

You can view the generated SQL in Script tab.

Load Type: Delete

To delete the values in the target table based on the value in one or multiple columns in the source, select the load type as Delete. Select the attributes based on which the target table rows should be deleted. Set this by enabling the Unique Key checkbox against it. During execution Insert Overwrite statement is generated and only those rows are written back to the table for which the key column values do not match between the target table and connected source transform output. The mapping of non key columns do not matter for this operation.

You can view the generated SQL in Script tab.

Load Type: SCD (Slowly Changing Dimension/Type 2 data load)

To load data into Type 2 table or Slowly Changing Dimension tables you can use load type as SCD. To perform this operation during load, configure the required options as below.

In the attributes mapper map the target attributes to the source. Mark the fields to be considered as the unique keys and the indicative keys to manage the change records. Select the attributes by checking the box against them.

1. Unique: Select the attributes based on which the update and insert operation should be determined on the target table rows.

2. SCD Key: Select the attributes that should be used to determine change to a record. Is it mandatory to select an attribute for this if hash key is not selected.

The SCD load based on unique key and SCD key is used when source has incremental data and based on unique key it is determined which record needs to be compared and based on SCD key it is determined if there is a change. When the unique key value does not exist in the target then the record is inserted as new record with default start and end date. If the record exists and the SCD key values do not match then the existing record in the target is ended by setting the end date. A new record with updated values is inserted with new start date and default end date.

3. SCD Start: Select the attribute which will be used as record start date column. The attribute marked as SCD Start should be mapped to source attribute that has value to be used as start date for the change records. It is mandatory to select an attribute for this.

4. SCD End: Select the attribute which will be used as record end date column. The attribute marked as SCD End should not be mapped to any source attribute. It is mandatory to select an attribute for this.

5. Current Indicator: Select the attribute which will be used to indicate the latest change record for a unique key. The value of this column is either 0 or 1 and only numeric fields can be selected from target attribute list to update this value.

SCD Properties

  • Click on the arrow next to SCD to open and configure the SCD Properties.

  • Then SCD Properties Editor wizard appears as shown below:

  1. SCD Type: Specify how the changes in the records be managed. Only DATE type is supported.
  2. Default Start Date Value: Specify what value should be used as the start date for the new record. To provide this value, click the icon beside the field to open the Expression Editor. You can input the value and validate it.
  3. Default End Date Value: Specify what value should be used as the end date for the latest change record. To provide this value, click the icon beside the field to open the Expression Editor. You can input the value and validate it.
  4. SCD Offset Value: If attribute selected in target mapper for SCD start and SCD end is of timestamp data type, then the specified value is treated as number of seconds and if it is of datatype date, then it is treated as number of days. This value is reduced from the SCD start attribute for new change record and used as end date for the old record. By default, the SCD Offset Value is -1. If the value is less than or equal to zero, then it's considered as zero and value greater than zero is used as is.

You can view the generated SQL in Script tab.

Editing Extract Properties in Target Instance

The extract properties are displayed only when the Splice Machine target instance is external. The Extract properties are displayed for the native data point used to extract data flow.

To change the extract properties for the Splice Machine target, click Extract Properties tab.

By default these properties are set to recommended/default values from data point and the values can be overridden here. To work with extract properties of specific data flow native data point type, refer Working with Data Point and view the extract properties for the specific data point.

Note: To revert the changes to the default values, click Reset All to Default.

Editing Load Properties in Splice Machine target Instance

The load properties are displayed only when the target instance is external and the data flow is not generic. The load properties are displayed for the target data point used to create target data object.

To change the load properties for the Splice Machine target, click Load Properties tab.

By default these properties are set to recommended/default values from data point and the values can be overridden here. To work with load properties, refer Editing Load Properties in Splice Machine Data Point.

Note: To revert the changes to the default values, click Reset All to Default.

Data tab

Data section displays the output of the logic implemented in the data flow. When Data is clicked then, the transformations starting from source instance transform is executed and the data that would be loaded in the target instance transform selected is displayed. The data is not loaded in the target.