Venn group transformation is used to perform union and minus operation on the output of the connected transforms. This transform allows following operations.

  • Union: The Union operator is used to combine the output from the transforms connected to the Venn group transform. It removes the duplicate rows between the input data.
  • Union All: The Union All operator is used to combine the output from the transforms connected to the Venn group transform. It retains the duplicate rows between the input data. 
  • Minus: The Minus operator is used to return all rows from the connected primary transform that are not present in the output of the second connected transform. Hive native data flow does not support Minus operation.
  • Minus AllThe Minus All operator is used to return all rows from the connected primary transform that are not present in the output of the second connected transform. It retains the duplicate rows between the input data. This operator is supported only by Teradata native data flows.
  • Union Distinct: The Union All operator is used to combine the output from the transforms connected to the Venn group transform. It retains the duplicate rows between the input data. This operator is supported only by Big Query native data flows.
  • Intersect Distinct: This operator returns rows that are found in the result sets of both the primary and secondary transform objects connected. This operator is supported only by Big Query native data flows.

  • Except Distinct: This operator returns rows from the primary transform object that are not present in the secondary transform objects connected. This operator is supported only by Big Query native data flows.

To work with Venn group transform in data flow,

Note:

Selecting the Venn group transform to be added to data flow

In the data flow canvas move to Data Flow pane and navigate to Transforms menu. Here, you can either select or drag and drop the Venn group transform to canvas. Now, link the required transform to Venn group Transform.

Configuring Venn group transform

General Tab: Provide the basic details for the Venn group transform.

1. The Name field auto populates the transform name and its editable.

2. In the Description text-box, provide a description and is optional.

3. By default, Diyotta does not create temporary tables during execution for the transformation in the data flow. If the temporary table needs to be created for a transform during execution, then, enable the checkbox Persist Data. The temporary table created will be dropped once the data flow executes successfully.

Properties tab: The properties tab displays the linked transforms to the Venn group.

  1. Primary Object: Displays the transform that will be treated as the primary data set. By default it shows the transform that was linked first to the Venn group transform. You can assign any other linked transform as primary by selecting from the drop-down.
  2. Secondary Object: Lists the other transforms linked to Venn group transform. If there are multiple transforms under secondary object list and you want to change the order in which secondary objects should be used in the generated SQL during execution then, use the drop-down next to the secondary object name and select the appropriate transform. You can also select a secondary object and use.
  3. Venn type: Select the type of operation to be performed between the objects. The operation specified against the secondary object in first position is applied against the primary object and the secondary object in first positionThe operation specified against the secondary object in second position is applied against the secondary object in first position and the secondary object in second position. If there are more secondary objects in the list then this sequence follows.

Derived Attributes tab: You can add attributes in the Venn group transform which will be the fields from the connected transforms.

Adding new attributes from primary object:

  • When there are no attributes present, then click on Click Here to add new attribute(s).

  • Then Select Venn Group Attributes wizard appears and displays the list of attributes available in the transforms assigned as primary object. Choose required attributes and click OK.

The selected attributes get listed in the Attributes tab under the field marked for primary object.

Adding attributes from primary object:

  • When the attributes are already present, then click on down arrow next to +(Add) and select the object for which the attribute needs to be selected.

The +(Add) button open the wizard listing the attributes from the primary object.

  • Then Select Venn Group Attributes wizard appears and displays the list of attributes available in the transforms assigned as primary object. Choose required attributes and click OK.

  • The selected attributes get listed in the Attributes tab under the field marked for primary object.

Adding attributes from secondary object:

  • After adding the attributes from the primary object you need to add the attributes from the secondary objects. For this you can either auto map or manually add the required attributes.

Auto Mapping: Automatically map the attributes by selecting the Auto Map icon.

This will pick the attributes with the same name from the secondary objects as selected from the primary object. If any selected attribute from primary object is not matched in secondary then that field is left blank and it needs to be mapped manually.

Manually mapping Secondary Objects: To manually map the attributes from the secondary objects, click on down arrow next to +(Add). It will list the connected objects and you can click on a secondary object to select the attributes from. For reference, here SRC_MODEL_1 is selected. The +(Add) button open the wizard listing the attributes from the primary object.

The Select Venn Group Attributes wizard appears and displays the list of corresponding attributes available in the transform. Choose mapping attributes and click OK.

The selected attributes appear in the list under the secondary object for which selection is made.

Note:

  • The attribute name to be associated with the output of Venn group can be changed by editing the alias name field against the attributes from the transforms listed.
  • Using the same procedure map the remaining secondary objects.

Modifying the mapped Attributes:

  • To modify the attribute selected from any object, first unlink it. For this, select the attribute, and click Unlink Attribute(s) icon.
  • Then, click on down arrow next to +(Add) and select the object for which the attribute needs to be selected. The +(Add) button open the wizard listing the attributes from the primary object.
  • The Select Venn Group Attributes wizard appears and displays the list of attributes in the transform. Select the attribute that needs to be mapped and click OK.

Note: The attributes already selected from transform are Italicized.
  • The selected attributes appear in the list under the object for which selection is made.

Note:

  • From the list of Venn group attributes, multiple Venn group attributes can be selected and we can perform/apply these operations.
  • The attributes mapped from connected transform against an alias name should have same data type, precision, and scale to make the transform valid.
  • Following operations are allowed on the attributes: Add, Select, Auto Map, Up, Down, and Delete.

Runtime Properties tab:

The runtime properties are displayed only when the native data point type for data flow is Hadoop or Spark.

To change the Venn Transform runtime properties, click Runtime Properties tab.

By default these properties are set to recommended/default values from data point and the values can be overridden here. To work with runtime properties, refer Editing Runtime Properties in Hadoop Data Point.

Note: 
  • To revert the changes to the default values, click Reset All to Default.
  • To search for a specific property, enter the keyword in the search bar, and the grid displays the related properties.

Viewing the Script generated for the transform

The Script tab allows you to view the SQL generated for the transform. The script is generated based on the configuration of the transformation.

  • To view the generated script, navigate to Script tab.