BigQuery data point is used to configure connectivity to BigQuery database. For each instance (project) of BigQuery database, a separate data point will have to be created. Multiple datasets within the same BigQuery instance can be added in single BigQuery data point. The BigQuery data point will be associated to any BigQuery data object created and data flows defining BigQuery as the native processing platform.
To work with BigQuery Data Point, follow below steps:
Step I: Create a New Data Point
- To open and edit an existing data point, refer Opening Data Point.
- To create a new data point, refer Create New Data Point.
Step II: Provide connection details
A. To connect to BigQuery API following details need to be provided in the Properties tab.
- Project ID: The identification number given to particular project and its always unique.
- Authentication Type: It can be either Service Account or Access Tokens.
1. For Service Account, following details are populated on the screen:
- In Service Account Key File, browse and select the Service account key file location.
2. For Access Token, following details are populated on the screen:
- Access Token: If you want to do authentication based on access token generated in Google which is specific to this project, choose the Authentication Type as Access Token, then following details must be provided:
- To Obtain access token from external service, enable this option to use SSL connectivity between Agent and Big Query server. Upon enabling this option, following fields are displayed on the screen. Provide details for this to use SSL in connectivity.
- External Service Command: Specify the external command to generate the Access Token.
- Response Type: Select the response type and by default its JSON.
- Access Token Key Name: Specifies the access token field name which is available in generated response JSON file.
- Mandatory field names are suffixed with *. To establish the connection, provide all the mandatory property field values.
- All the fields in the Properties tab can be parameterized using project parameters. To parameterize the fields, refer Working with Project Parameters.
For both the Authentication Types following details are similar:
B. Query Priority: Specify the time duration to run the query and it can be either Interactive or Batch.
Batch is selected by default, and if the query does not start after default time duration Interactive is initiated.
C. Jdbc Options: Specify the options that should be used along with JDBC URL to connect to BigQuery.
- For example, following details are provided in JDBC Options to connect to BigQuery: user=diyotta, password=****, db=TEST_DB.
D. Proxy: If you want to connect to external network connection, you can enable Use Proxy option and set the type of proxy you are using, address, and port on which to reach it.
- Proxy Host: Specify the hostname or the IP address of the BigQuery system.
- Proxy Port: Specify the port associated to the BigQuery system.
- Protocol: It can be either http or https.
E. Assign Agent: To assign or change the associated agent click Change. The Agent window appears and displays the list of available Agents. From the list, click on required Agent Name.
- If Default agent is assigned to the Project then automatically, the Default agent will be associated with the new Data point created.
- If Default agent is not assigned to the Project then, no agent will be assigned automatically and appropriate agent needs to be assigned to the data point.
- When connecting to the Agent server then, the agent installation user should have appropriate privilege to access the path where file will be placed.
- When connecting to the remote server then, the firewall needs to be opened from the Agent server to it and user specified to connect should have appropriate privilege to access the path where file will be placed.
Step III: Test the data point connection
- To validate that the data point is able to connect to the BigQuery data point database using the details provided, refer Test Data Point Connection.
Step IV: Enter the Datasets details
Manage the dataset required to be accessed through the Datasets tab. Multiple datasets can be added here.
1. To add a new dataset, click on Click Here.
New entry for dataset is generated.
- The "Name" field is a friendly name that can be assigned to the dataset for easy identification. This friendly name is displayed when a dataset needs to be chosen from the data point and when the dataset association with other components is displayed.
- The "Project ID" is given to a particular project and it's always unique. The Project ID is given in Properties tab and upon creating a new dataset it's corresponding Project ID is auto populated. The Project ID is either auto populated or you can manually enter and validate. To manually enter the project id details, click on the entry and enter the Project ID and check for validation by clicking the forward arrow icon beside Project ID.
- Provide the physical name of the dataset in the "Datasets" field. When clicked on the entry in the "dataset" field a drop-down appears with the list of datasets in the system. The dataset name can either be selected from this drop-down list or you can manually enter.
- To assign a dataset to be used for creating temporary tables as part of processing the data (generally referred to as tform dataset), select the check box under Transforms field.
- The Transforms field is available only for those type of data points which are supported by Diyotta as data processing platform.
- It is mandatory to assign a dataset as transform in the data point when, that data point needs to be assigned during the data flow creation and used as the processing platform.
- To view the dataset drop-down, it is a prerequisite to test the connection. For more information, refer Test Data Point Connection.
- To search for a specific dataset, enter the keyword in the search bar, and the page displays the related datasets.
2. You can also copy dataset from another Data Point of same type and paste here, to do that:
Click on the drop down on the tab to see the paste option.
- Following operations are allowed on the dataset entries: Add, Cut, Copy, Paste, Up, Down, Delete, Search, and Fetch DB/Schema.
- From the list of datasets, multiple datasets can be selected and we can perform/apply these operations.
Step V: Save the data point
- To save the changes made to the data point, refer Saving Data Point.
- If the changes made to the data point need to be reverted and not saved then, refer Reverting changes in Data Point.
- Once the data point has been created and the changes have been saved then, Close or Unlock the data point so that it is editable by other users. For more information, refer Closing Data Point and Unlocking Data Point.
Step VI: Modify the configured Extract and Load properties
When moving data from one system to another the data is extracted from the source system, moved over the network, and loaded into the target system. The SQLs and commands generated during execution of the jobs to extract and load data are generated based on the properties defined for these. The properties associated with the extraction and load of data should depend on the format, performance and variety of the data being moved. These properties vary based on the environment and the type of the system. Diyotta comes with default properties that covers most of the known scenarios.
- The default values for extract and load properties can be configured in the Admin module and these properties reflect in the Studio module.
- The extract and load properties set in data point are by default used in the source and target instance of the data flow and the job flows.
- It is a good practice to set the extract and load properties as per the company standards in the data point.
- However, if needed any specific property can be overridden in the data flow or job flow.