You can use filter transformation to filter the results from the transform connected to it. The filter can be applied based on the value in one or more attributes in the connected transform.
To work with Filter transform in data flow, follow below steps.
Selecting the Filter transform to be added to data flow
In the data flow canvas move to Data Flow pane and navigate to Transforms menu. Here, you can either select or drag and drop the Filter transform to canvas. Now, link the required transform to Filter Transform.
Configuring Filter transform
General Tab: Provide the basic details for the filter transform.
1. The Name field auto populates the transform name and its editable.
2. In the Description text-box, provide a description and is optional.
3. By default, Diyotta does not create temporary tables during execution for the transformation in the data flow. If the temporary table needs to be created for a transform during execution, then, enable the checkbox Persist Data. The temporary table created will be dropped once the data flow executes successfully.
Properties tab: You can specify the filtering condition in properties tab.
Filtering Condition: Provide the condition to filter the data from the connected transform.
- To add a filter condition, click Expression Editor icon beside the Filter Condition field.
- The Expression Editor window opens and you can add the condition here.
- After applying the Filter condition, to verify that there are no syntax errors, click Validate. Upon successful validation success message appears. Once done, click OK.
The filter condition shall include attributes from the transform, hive database functions, Parameters, Functions, Reusable expressions, UDFs, and Sequences.
- The attributes from the transform are listed when selecting Transforms from the drop-down. Click on the attribute from the list and it will be added in the editor.
- The list of functions can be seen by selecting Functions from the drop down. The functions that can be used in the SQL is not limited by the list shown. All the hive database functions can be used in the SQL.
- The list of parameters are viewed by selecting Parameters from the drop-down. Displays only those that can be used in data flow - Data Flow Parameter, Data Flow SQL Parameter, Project Parameter and System Parameter can be used in the SQL. For more information, refer Working with Data Flow Parameters, Working with Data Flow SQL Parameters, Working with Project Parameters, Diyotta System Parameters.
- List of expressions, UDFs, and sequences are displayed under corresponding header in drop-down.
Runtime Properties tab:
To change the Filter Transform runtime properties, click Runtime Properties tab.
By default these properties are set to recommended/default values from data point and the values can be overridden here. To work with runtime properties, refer Editing Runtime Properties in Hadoop Data Point.
- To revert the changes to the default values, click Reset All to Default.
- To search for a specific property, enter the keyword in the search bar, and the grid displays the related properties.
Script tab: The script tab allows you to view the run time script of the transform. The script is generated based on the condition specified in the transform.
- To view the generated script from the below pane navigate to Script tab.