Databricks (via Direct Connect method)

This page will guide you through the setup of Databricks in K using the direct connect method.

Integration details

Scope	Included	Comments
Metadata	YES	See below for known limitations
Lineage	YES
Usage	YES
Sensitive Data Scanner	Alpha

Known limitations with this integration:

Hive catalogues are not supported currently.

Step 1) Databricks access

Ensure you have configured Unity enabled catalogue for your workspace
1. https://community.databricks.com/t5/bangalore/how-do-we-enable-unity-catalog-for-our-workspace/td-p/73258

Enable System Schemas for access & queries

Follow the following documentation to enable the system schemas:

https://docs.databricks.com/en/admin/system-tables/index.html#enable

https://kb.databricks.com/unity-catalog/find-your-metastore-id

Run the following commands to enable:

curl -v -X PUT -H "Authorization: Bearer <PAT TOKEN>" "https://<YOUR WORKSPACE>.cloud.databricks.com/api/2.0/unity-catalog/metastores/<METASTORE ID>/systemschemas/access"
curl -v -X PUT -H "Authorization: Bearer <PAT TOKEN>" "https://<YOUR WORKSPACE>.cloud.databricks.com/api/2.0/unity-catalog/metastores/<METASTORE ID>/systemschemas/query"

SQL Warehouse
1. Create or use an existing SQL Warehouse for K to use for the extraction.
2. Go to the SQL Warehouse page. Select the SQL Warehouse to use.
  1. Go to the connection details
  2. Note down the Server hostname and HTTP path
PAT Token
1. For the user that will be used to connect, create a PAT token to be used for authentication.
2. https://docs.databricks.com/en/dev-tools/auth/pat.html

From the above, record down the following for use in the setup:

Databricks account URL
1. e.g. adb-<workspaceId>.<instance>.azuredatabricks.net
PAT token
Server host name
HTTP path

Step 2) Connecting K to Databricks

Select Platform Settings in the side bar
In the pop-out side panel, under Integrations click on Sources
Click Add Source and select Databricks
Select Direct Connect and add your Databricks details and click Next
Fill in the Source Settings and click Next
- Name: The name you wish to give your Databricks source in K
- Host: Add your Databricks Account location. Can be seen in the URL when you log into your Databricks account.
  - e.g. adb-<workspaceId>.<instance>.azuredatabricks.net
- Server Hostname: The SQL Warehouse Server hostname
  - e.g. adb-<workspaceId>.<instance>.azuredatabricks.net
- HTTP Path: The SQL Warehouse Http path either to a DBSQL endpoint or to a DBR interactive cluster
  - e.g. /sql/1.0/warehouses/<warehouseId>
Add the Connection details and click Save & Next when connection is successful
- PAT Token: Add the PAT token created in Step 1
Test your connection and click Save

Step 3) Manually run an ad hoc load to test the integration

Next to your new Source, click on the Run manual load icon
Confirm how you want the source to be loaded
After the source load is triggered, a pop up bar will appear taking you to the Monitor tab in the Batch Manager page. This is the usual page you visit to view the progress of source loads

A manual source load will also require a manual run of

DAILY
GATHER_METRICS_AND_STATS

To load all metrics and indexes with the manually loaded metadata. These can be found in the Batch Manager page

Troubleshooting failed loads

If the job failed at the extraction step
- Check the error. Contact KADA Support if required.
- Rerun the source job
If the job failed at the load step, the landing folder failed directory will contain the file with issues.
- Find the bad record and fix the file
- Rerun the source job