Snowflake data point is used to configure connectivity to Snowflake database. For each instance of Snowflake warehouse, a separate data point will have to be created. Multiple databases within the same Snowflake warehouse can be added in single Snowflake data point. The Snowflake data point will be associated to any Snowflake data object created and data flows defining Snowflake as the native processing platform.
To work with Snowflake Data Point, follow below steps:
Step I: Create a New Data Point
- To open and edit an existing data point, refer Opening Data Point.
- To create a new data point, refer Create New Data Point.
Step II: Provide connection details
1. To connect to Snowflake database following details need to be provided in the Properties tab.
- App User: Specify the user id that will be used to connect to the Snowflake system. This user should have necessary privileges to access the data in the databases.
- Password: Specify the password for the App User. To use the project parameter for the password, check the Use Project Parameters option, and you can view and select the required Project Parameter from the Password drop-down.
- Jdbc Options: Specify the options that should be used along with JDBC URL to connect to Snowflake.
For example, following details are provided in JDBC Options to connect to Snowflake: user=diyotta, password=****, db=TEST_DB.
- Account: Specify the account used to connect to snowflake database
- Cloud Platform: Specifies which cloud platform is being used, you can either select AWS, GCS or Microsoft Azure.
- Private Link: To specify the Snowflake instance as private link, select the Private Link check box.
- Region: Specifies the location on the globe and based upon the type of Cloud Platform the locations(options) may vary. For Microsoft Azure the default region is West US 2, for GCS the default region is US-Central, and for AWS the default region is US West.
- Client Session Keep Alive: Indicates whether to force a user to log in again after a period of inactivity in a JDBC or ODBC session. When set to True, Snowflake keeps the session active indefinitely, even if there is no activity from the user. When set to False, the user must log in again after four hours of inactivity.
Show Query ID in logs: Specify if snowflake query_id should be displayed in the monitor logs for every execution.
- Warehouse: Specify the Snowflake warehouse to be used for the processing of data.
- Mandatory field names are suffixed with *. To establish the connection, provide all the mandatory property field values.
- All the fields in the Properties tab can be parameterized using project parameters. To parameterize the fields, refer Working with Project Parameters.
Once you click the refresh button next to the Warehouse field, the Warehouse is assigned and the details pertaining to the particular warehouse are displayed.
Following details are displayed on the screen:
- Status: Specifies whether the warehouse is active/running (STARTED) or inactive (SUSPENDED).
- Type: Warehouse type; STANDARD is the only currently supported type.
- Size: Size of the warehouse (X-Small, Small, Medium, Large, X-Large, etc.)
- Auto Suspend: Period of inactivity, in seconds, after which a running warehouse will automatically suspend and stop using credits; a null value indicates the warehouse never automatically suspends.
- Auto Resume: Whether the warehouse, if suspended, automatically resumes when a query is submitted to the warehouse.
- Created On: Date and time when the warehouse was created.
- Resumed On: Date and time when the warehouse was last started or restarted.
- Owner: Role that owns the warehouse.
- Comment: Comment for the warehouse.
2. Assign Agent: To assign or change the associated agent click Change. The Change Agent window appears and displays the list of available Agents. From the list, select the required Agent Name.
Note: To search for a specific agent, enter the keyword in search bar and the window displays the search result list. Select the required agent and click Ok.
- If Default agent is assigned to the Project then automatically, the Default agent will be associated with the new Data point created.
- If Default agent is not assigned to the Project then, no agent will be assigned automatically and appropriate agent needs to be assigned to the data point.
- When connecting to the Agent server then, the agent installation user should have appropriate privilege to access the path where file will be placed.
- When connecting to the remote server then, the firewall needs to be opened from the Agent server to it and user specified to connect should have appropriate privilege to access the path where file will be placed.
Step III: Test the data point connection
- To validate that the data point is able to connect to the Snowflake data point database using the details provided, refer Test Data Point Connection.
Step IV: Provide the database connection details
Manage the databases required to be accessed through the Databases tab. Multiple databases can be added here.
1. To add a new database, click on Click Here.
New entry for the database is generated and a success message is displayed at the bottom right corner of the screen.
- The "Name" field is a friendly name that can be assigned to the database for easy identification. This friendly name is displayed when a database needs to be chosen from the data point and when the database association with other components is displayed.
- Provide the physical name of the snowflake database in the "Database" field. When clicked on the entry in the "Database" field a drop-down appears with the list of snowflake databases in the system. As you enter the database keyword, the drop-down shows the specific database. The snowflake database name can either be selected from this drop-down list or you can manually enter.
- Provide the physical name of the schema in the "Schema" field. The schema name must be manually provided.
- To assign a database to be used for creating temporary tables as part of processing the data (generally referred to as tform database), select the checkbox under Transforms field.
- The Transforms field is available only for those type of data points which are supported by Diyotta as data processing platform.
- It is mandatory to assign a database as transform in the data point when, that data point needs to be assigned during the data flow creation and used as the processing platform.
- To view the database drop-down, it is a prerequisite to test the connection. For more information, refer Test Data Point Connection.
- To search for a specific database, enter the keyword in the search bar, and the page displays the related databases.
2. You can also copy database from another Data Point of same type and paste here, to do that:
Click the drop down arrow on Click Here, to see the paste option.
- Following operations are allowed on the database entries: Add, Cut, Copy, Paste, Up, Down, Delete, and Search.
- From the list of databases, multiple databases can be selected and we can perform/apply these operations.
Step V: Save the data point
- To save the changes made to the data point, refer Saving a Data Point.
- If the changes made to the data point need to be reverted and not saved then, refer Reverting changes in Data Point.
- Once the data point has been created and the changes have been saved then, Close or Unlock the data point so that it is editable by other users. For more information, refer Closing Data Point and Unlocking Data Point.
Step VI: Modify the configured Extract and Load properties
When moving data from one system to another the data is extracted from the source system, moved over the network, and loaded into the target system. The SQLs and commands generated during execution of the jobs to extract and load data are generated based on the properties defined for these. The properties associated with the extraction and load of data should depend on the format, performance and variety of the data being moved. These properties vary based on the environment and the type of the system. Diyotta comes with default properties that covers most of the known scenarios.
To modify these properties, refer Editing Extract Properties in Snowflake Data Point and Editing Load Properties in Snowflake Data Point.
- The default values for extract and load properties can be configured in the Admin module and these properties reflect in the Studio module.
- The extract and load properties set in data point are by default used in the source and target instance of the data flow and the job flows.
- It is a good practice to set the extract and load properties as per the company standards in the data point.
- However, if needed any specific property can be overridden in the data flow or job flow.