This guide provides instructions for Octopai administrators on setting up metadata extraction from Databricks to build data lineage within Octopai. You have two options depending on your needs: you can either enable lineage via Unity Catalog using the Octopai Client extraction or applying lineage for specific notebooks.
Make sure your Databricks environment is using a cluster type that supports Unity Catalog. This is essential for extracting metadata using Unity Catalog.
Set up and manage Unity Catalog | Databricks Documentation
Depending on your needs, you can either set up lineage support via Unity Catalog or configure lineage for specific notebooks. Ensure that permissions and configurations are correctly set to maintain accurate and comprehensive data lineage within Octopai.
Important Note: Lack of appropriate permissions may result in broken lineage tracking.
How to set up the permissions
Option 1: Supporting Lineage via Unity Catalog Using Octopai Client Extraction
Step 1: Ensure You Have the Correct Cluster Type
Make sure your Databricks environment is using a cluster type that supports Unity Catalog. This is essential for extracting metadata using Unity Catalog.
Step 2: Configuring Permissions in Databricks
Proper permissions are crucial for allowing Octopai to access and extract metadata from your Databricks environment:
Locate the Workspace:
-
- Locate the Unity Catalog cluster that holds the metadata you want to extract.
- Locate your Databricks workspace with Admin privileges.
Manage Permissions:
-
- Access your Databricks workspace and navigate to the permissions settings.
- Add users or groups who will need access to this metadata.
- Open the Permission Dialog - select "sharing permissions".
Assign Permissions:
- Add Users or Groups: In the Sharing Permissions dialog, you can add individual users or groups to grant them permissions on the notebook.
- Click on the "Add user" or "Add group" button.
- Enter the name of the user or group you wish to add. Choose the correct entity from the dropdown list that appears.
- For each user or group added, you can set specific permissions using the dropdown menu next to their name. Available permissions include "Can View", "Can Run", "Can Edit", and "Is Owner".
Select the appropriate permission level for each user or group.
It's advisable to add a group and set the permission to 'Can manage.'
Save and Verify:
-
- Save your changes and optionally verify that all permissions are correctly set.
- To confirm the permissions settings, click the "Sharing permissions" once more to revisit the Permissions dialog and review the permissions listed.
Option 2: Building Lineage for Specific Notebooks
Step 1: Identify the Notebooks for Lineage
Determine which specific notebooks within your Databricks environment need to be included in the data lineage.
Step 2: Configuring Permissions for Notebooks
-
Access the Notebook Workspace:
- Locate the specific notebook(s) that you want to include in the data lineage.
- Open their Databricks workspace(s) with Admin privileges.
-
Manage Permissions:
- Access your Databricks workspace and navigate to the permissions settings.
- Add users or groups who will need access to this metadata.
- Open the Permission Dialog - select "sharing permissions".
-
Assign Permissions:
- Add Users or Groups: In the Sharing Permissions dialog, you can add individual users or groups to grant them permissions on the notebook.
- Click on the "Add user" or "Add group" button.
- Enter the name of the user or group you wish to add. Choose the correct entity from the dropdown list that appears.
- For each user or group added, you can set specific permissions using the dropdown menu next to their name. Available permissions include "Can View", "Can Run", "Can Edit", and "Is Owner".
Select the appropriate permission level for each user or group.
-
Save and Verify:
- Save the permissions settings and double-check to ensure accuracy.
Setting up Databricks Metadata Source
To configure a Databricks Metadata Source in the Octopai Client (OC), follow these instructions:
- Connection Name - Assign a meaningful name for the connection as it will appear to users on the Octopai platform.
- Databricks server url - customer Databricks server URL (i.e - https://abc-1234.5.azuredatabricks.net)
- Token - Input the workspace token. This token must be generated in advance on the Databricks server. For detailed instructions on creating a token, refer to the Databricks documentation on personal access token authentication. Please refer toDatabricks personal access token authenticationfor more details.
Comments
0 comments
Please sign in to leave a comment.