Data Flow Instance Jobs are used in Job Flows to execute the Data Flows. You can add multiple data flow instances within a job flow and set dependencies between them. You can add other jobs as well in the same job flow as needed.

Note: To create a new Job Flow, refer Creating New Job Flow

To create a Data Flow Instance Job follow the below steps.

Step I: From the Data Flow Instance under Job Flow menu, drag and drop the native data point type used to create the data flow. Here for reference, we have chosen to create data flow instance for the data flow that uses Hadoop data point as native platform. 

Step II: The window displays the list of Hadoop Data Flows in different layers within the projects. Select the required layer, and click on one or more Data Flow names to be used to create as data flow instance job.

Note: To create a Data Flow, refer Creating New Data Flow.

If there is any Global project that the user has access to, then the window displays the Project drop-down, which lists all the global projects. You can choose the global project from the Project drop-down, and select the global data flow as required.

Step III: Provide the General details of Data Flow Instance.

On Canvas, select the Data Flow Instance, and then under Properties, provide the General details. 

Name The Name field displays the name of the job. To edit the Job name, click the arrow next to the name field.

Description - In the text box, you can provide a description and is optional.

Data Flow Name - Displays the name of the Data Flow.

Disable task - Check Disable task, if the job need not be executed as part of the Job Flow, and you do not want to delete the Job.

Step IV: Optionally override the Connection details for Data Flow Instance

Under Properties, select the Connections tab. By default, the native data point displays the one chosen while creating the data flow and the source and target Data Object displays the database assigned to them. If required these data points can be overridden. 

Overriding the Native Data Point associated to the Data Flow

1. The Native Data Point displays the native Data Point chosen while creating the Data Flow. To change the native Data Point, click Change.

2. The window lists all the available Data Points of the type which was used to create the data flow originally. Choose the required Data Point

If there is any Global project that the user has access to, then the window displays the Project drop-down, which lists all the global projects. You can choose the global project from the Project drop-down, and select the global data point as required.

3. Once the Data Point is changed then, all the transformation sqls will be issued to this new data point.

Note: When Hadoop and Spark native data flows instance job is created then, the data flow Runtime Properties and the transform specific Runtime Properties are inherited from that set in the data flow. When the associated native data point is changed then in order to apply the Runtime Properties from this new data point you need to manually propagate it. For this, click the Propagate icon  that appears next to the data point name. The Runtime Properties from the new data point will get applied to the data flow instance and transform instance in Dataflow Transforms. For more details on Hadoop and Spark native data flows, refer to the respective pages under Working with Data Flow.

Overriding the source and target instance properties

The list displays all the sources and target transform instance defined in the data flow.

1. Overriding the Data point - The database field displays the database assigned to the source/target instance. By default, the database associated to the data object associated to the source/target instance is displayed here. You can override it by clicking on the Edit icon next to the database name. Once changed, the extract and load command will be issued against this data point.

2. Overriding extract and load properties - The Properties field displays the Extract and Load properties of external data object instances in Data Flow. By default, the Extract and Load properties assigned in data flow are displayed here. To override the Extract properties, click E. Similarly, to override the Load properties, click L. When the Job Flow is executed then, the extract and load properties set in Data Flow instance job will be considered. 

  • Extract and Load Properties are displayed against external data objects. For more information about extract and load properties for specific data point type, refer to respective pages under Working with Data Point.

  • When the data point against the external source or target instance is changed then, the extraction and load properties set for the external source or target in data flow are overridden with those from the new data point chosen.
  • When the native data point associated with the data flow is changed then, the extraction and load properties set for the native source or target in data flow are overridden with those from new native data point chosen.

3. Overriding the job partition - To add the extract partition, refer Adding extract partition in Data Flow Instance job.

When we use a Data Flow instance in Job Flow, by default the Data Flow Instance gets all the properties defined at Data Flow level.

Step V: Modifying properties for Data Flow Instance

Under Properties, select the Properties tab.

Manage the following properties of the Data flow from the Properties Tab.

1. Logging Level 

You can choose level at which the logs need to be generated during the execution of the Data Flow Instance. These logs can be viewed in the Diyotta Monitor.

The logging options are below.

Log LevelDescription

ERROR

Displays only the error events in the log.

WARN

Displays only the messages corresponding to the potentially harmful situations.

INFO

Displays informational messages that provides the progress of the application. This includes error and warning messages also. This is the default level.

DEBUG

Displays fine-grained informational events that are most useful to debug an application. This includes INFO level logs as well. 

TRACE

Displays finer-grained informational events at a level lower than DEBUG. This includes DEBUG level logs as well.

The log level chosen here has to be lower or equal to that set in the Admin. Example: If in Admin the log level is set at INFO then, here DEBUG or TRACE cannot be set. Only ERROR, WARN and INFO can be set.

2. TForm Cleanup

Set this option to specify if the temporary tables created during the execution of the associated data flow should be dropped or not after the successful execution of Data Flow.

It is recommended to always set this as Yes. Only when intermediate results need to be reviewed then this can be set to No. Once the data flow has been finalized revert this back to Yes.

3. JDBC Transform Control

This option is applicable when loading data into target system using JDBC load type. If the load fails intermittently without inserting all the records then, the partial records that are inserted is rolled back from the target table. If there are multiple pipelines in the data flow and any pipeline fails at any point then, the inserted records in all the target tables are rolled back. The rollback is applicable only for target loads where JDBC is used as load types. The rollback is ignored for targets using bulk load type. 

You can set the JDBC transaction control to Yes or No. 

  • Yes - To rollback the data in case of failure.
  • No - To commit the partial data in case of failure. This is default setting.

4. Optionally specify the retry attempts

Retry EnabledCheck the retry option if you want to enable retry attempts for Data Flow Instance job execution.

  • No. of Retry Attempts: Specify the number of attempts to retry data flow instance job execution if in case the job fails to execute. By default, the retry attempts is set to 2.
  • Retry Wait Time (in Seconds): Specify the duration in seconds for the job to retry next execution. By default, the duration is set to 60 seconds. If the Job fails to execute, it retries again for next execution attempt after the specified wait time.

Note:

  • To save the Job Flow, on the Actions menu, click Save. For more information, refer Saving Job Flow
  • To revert the changes before saving the Job Flow, on the Actions menu, click Revert. For more information, refer Reverting changes in Job Flow
  • To execute individual job in the Job Flow, on the Actions menu, click Run Job. For more information, refer Executing individual job in Job Flow.
  • To execute the Job Flow, on the Actions menu, click Run. For more information, refer Executing Job Flow.
  • Once the Job is created and the changes are saved, then, close or unlock the Job Flow so that it is editable by other users. For more information, refer Closing Job Flow and Unlocking Job Flow