To create a new data flow, follow below steps.

Step I: Navigate to New Data Flow window in one of the following ways. 

Option I: On the Navigation pane, mouseover the Data Flow tab, and then, click New.

Option II: On the Navigation pane click Data Flow, and then on the Actions menu, click New, and select New for Data Flow creation.

The New Data Flow window appears and prompts you to provide the details for the new data Flow.

Step II: Select native data platform to be associated to the data flow.

For every data flow, you can either select native or generic platform that matches your need.

  • If Native data platform is selected, it determines where the transformations defined in the data flow will be executed.
  • If Generic data platform is selected, the transformations are not applied and the data is directly loaded from source to target.
Only those data platform types that are added in the Diyotta license are displayed for selection.

1. The Data Platform tab in wizard displays the list of applicable native data platforms. Select the required data platform from the list. (here we are considering Hadoop for reference).

Note: To search for a specific platform, enter the keyword in search bar, and the window displays the search result. Select the required data platform.

2. Upon selecting a data platform, the Data Points tab displays the list of data points available for the selected Data Platform. Select the required Data Point and click Next

  • If there is any Global project that user has access to, then the Project drop-down will list them. Select the Global project from the list if the data point must be chosen from there. After selection, the canvas displays respective global data points as per the data platform type.
  • From the list of data points, you can also select Generic data point type and when selected displays additional details for defining derived attributes.

Note:

  • To move back to previous step in wizard, click Back.
  • To close New Data Flow window, click Cancel.

3. Upon selecting the data point, the General Properties tab displays the general properties for the data flow.

  • The Name field displays the default data flow name. Provide a suitable name for the data flow here.
  • The Layer field lists down the available layers in the project. Select a layer to create the data flow under it.

4. Click Create to navigate to the Data Flow workspace.

Step III: Configuring data flow 

In the General tab, edit the basic details associated with the data flow.

1. The Name field auto-populates the data flow name prefixed with df and is editable. If required, define the data flow name as needed.

2. In the Description text-box, provide a description and is optional.

3. Provide the value in Data View Limit to limit the number of records displayed as output for a transform when viewing output in interactive data flow. For more details on interactive data flow, refer Working with Interactive Data Flow.

4. A Process Platform is either native or generic platform based on which other data points, data objects, data flows, and job flows are created.

5. The Data Point field displays the associated data point with the data flow.

To change the associated data point, click Change. The Change Native Data Point window displays the list of data points available for the selected Data Platform. Select the required Data Point and click Ok.

  • If there is any Global project that user has access to, then the Project drop-down will list them. Select the Global project from the list if the data point must be chosen from there. After selection, the canvas displays respective global data points as per the data platform type.
  • Upon selecting a Global data point, the data point name in data object list page is appended with alphabet 'G'.
  • Upon selecting a Global data object as source in normal data flow, then the source name is in the canvas is appended with alphabet 'G'.

6. The Last updated by field displays the user who last updated the data flow along with date and time of last save.

Note: The tabs that appear on the canvas, differ based on the data point selected, it can be either native or generic for the Data Flow.

Step IVAdd transforms and create pipeline only for Native database

Drag and drop the transforms into the data flow canvas from the Transforms tab, to implement the business logic. It is mandatory to have at least one source instance transform and at least one target instance transform linked to create a valid data flow. You can add other transforms between the source and target instance to apply the logic to be implemented.

  • For detailed information on the transforms, refer Working with Data Flow Transforms.
  • You can create copy of the transforms from a different data flow or within the data flow. For more details, refer Creating copy of transform in Data Flow.
  • You can view the data at each transform step by enabling the interactive data flow feature. For more details, refer Working with Interactive Data Flow.
  • You can have multiple pipelines in a data flow. A pipeline is set of linked transforms starting from a source instance and ending at a target instance. You can define order in which each pipeline should be executed. For more details, refer Editing load order in Data Flow.
  • You can auto arrange the pipeline created in data flow to a default pattern. For more details, refer Arranging Data Flow.
  • You can validate the created data flow to check if there are any errors. For more details, refer Validating Data Flow.

Step V: Optionally edit the data flow properties and use parameters

Step VI: Save the data flow 

To save the changes made to the data Flow, refer Saving Data Flow.

Note: 

  • If the changes made to the data flow need to be reverted and not saved then, refer Reverting changes in Data Flow.
  • Once the data flow has been created and the changes have been saved then, Close or Unlock the data flow so that it is editable by other users. For more information, refer Closing Data Flow and Unlocking Data Flow.