Controller

The Purplecube Controller is a Java based program which is installed on Linux/Unix server. The Controller can be installed on any cloud-based platform or on server behind the network firewall. The network port called “startup port” though which the web service is made available, should be open over the network to access Purplecube web interface or the command line interface. The main components of Purplecube Controller are service manager, compile manager and execution manager.

Service Manager: The external interface to the Controller is managed through this component. The external interface could be web user interface or command line interface. If the instruction from the user interface is only to view or modify Purplecube code or administrative components then, the Service manager interacts with the metadata repository to fetch or update the details in metadata repository. If the instructions are related to execution of jobs then, the service manager passes on the instruction to the Compile manager and the Execution manager. 

Compile Manager: During execution of the jobs the Purplecube code is compiled by the Compile manager. As part of the compilation the logical flow of data is converted into system specific commands. These commands can be in form of SQLs to interact with relational databases, Restful API commands for HTTP requests or end system specific commands available through its proprietary libraries. The compiled code is maintained in the metadata and is read by the Execution manager to finally execute the job. Compilation happens only during first execution of the job. In subsequent executions the execution manager reads the compiled code directly from the metadata. If any component of the job is modified post first execution then, during next execution of the job it gets recompiled.

Execution Manager: Execution manager is responsible for executing the Purplecube jobs. It reads the compiled code from metadata and submits the instructions to the Agent for execution. The instructions are submitted in the order defined in the flow. It keeps track of the status of the submitted instructions. The status changes, error messages and the statistics returned by Agent are written to metadata to maintain operational statistics.

Metadata Repository

As part of the Controller installation, the PostgreSQL database can be installed optionally which will then be used by the Controller as the metadata repository. When PostgreSQL database is installed along with the Controller then the Controller is said to use “embedded metadata”. If an existing database is used as metadata repository then the Controller is said to use “external metadata”.

Agent

When Purplecube Controller is installed it includes Agent component also by default known as “default agent”. The Agent can also be installed separately on Linux/Unix or Windows machines. The Agent can be installed on any cloud-based platform or on server behind the network firewall. The number of Agents and the proximity of the Agents to the source and target system is mainly governed by the firewall access to the data, geographical distribution of the data, fail-over capabilities and volume of data. The Controller communicates with the Agent through the network port called “controller broker startup port” on the Controller server. This port should be accessible to the Agent service from wherever it’s installed.

Agent - Agent communication: When extracting the data from source, the Agent stages the data as file in a stage location. Once entire data is extracted from the source based on the filters and the selected attributes, the Agent loads the data into target system. If a different Agent is linked to the target system then, first the data is transferred to the target Agent and then loaded to target system. The target Agent receives file from source Agent through dedicated port - "data receiver port". Firewall to access the data receiver port from source Agent and to accept connectivity from source at target should be open. Data is transferred using TCP or HTTP which can be set as part of configuration. The data can be compressed and encrypted at source Agent before transferring and then decrypted and uncompressed at target agent before loading.

Controller Broker

Controller and Agent communicates through broker service. The Agents registered with the Controller are subscribed to this broker and the Controller publishes the instructions to this broker. The Agent reads and executes the instructions it receives and it communicates associated response like status, statistics and logs to the Controller through this broker. The Controller submits data extraction instructions like - SQL, rest commands, file read or load commands like - native bulk load, insert statements using jdbc or transform SQL to be executed on processing platform. The SQLs and commands are generated by the Controller based on the system where the data needs to be extracted or loaded from. The transformation SQLs are generated based on the processing platform assigned to the data flow used to transform the data.

Data Processing Platform

Purplecube performs the transformation of the data in the Data Processing Platform assigned in the Purplecube code. The processing platform is usually big data platforms like Hadoop, Teradata, Snowflake, Redshift or Google Bigquery. When data from multiple heterogeneous sources need to be brought together and messaged to create useful data then, Purplecube brings the data from these sources into the processing platform and applies the transformation defined in the logical flow. The transformed data can either reside in the same platform or can be loaded into a different target system. The Compile Manager converts the transformation logic defined in the logical flow into platform specific SQLs which is used by the Execution manager during the execution of the job. The SQLs are run in the processing platform by the Agent. The results of the intermediate steps are stored temporarily in the processing platform itself and these temporary tables are dropped after the job completes execution. 

User Interface

Purplecube User Interface is included as part of the Controller installation. The web interface can be accessed from any web browser using the Purplecube access URL which will normally be http://<controller hostname>:<controller startup port>. The command line utility is installed by default with the Controller. This can also be installed independently on any Linux/unix based system. Purplecube also provides through Restful API. It utilizes the same URL to provide this service as used for the web interface.

The web user interface components are:

Admin: All the system and user level administrative functionalities are available through this module.

Studio: This is the main development module where all the business logic is implemented.

Monitor: This module provides capability to monitor Purplecube jobs and view run logs and history.

Scheduler: This module is used to orchestrate process flow based on specific date and time and certain events.

Lineage: This module allows one to view the end to end flow of data entities as per the logic implemented in Studio.

The command line interface components are:

dicmd command: This command provides option to interact with Purplecube to perform specific tasks.

Rest API: This provides same functionality as dicmd . It can accessed using the endpoint - http://<hostname>:<startup port>/Purplecube-rest/ and can be executed by any utility that supports rest api command execution.