K Knowledge Base
Breadcrumbs

Databricks (via Direct Connect method)

This page will guide you through the setup of Databricks in K using the direct connect method.

Integration details

Scope

Included

Comments

Metadata

YES

See below for known limitations

Lineage

YES


Usage

YES


Sensitive Data Scanner

Alpha


Known limitations with this integration:

  1. Hive catalogues are not supported currently.


Step 1) Databricks access

  1. Ensure you have configured Unity enabled catalogue for your workspace

    1. https://community.databricks.com/t5/bangalore/how-do-we-enable-unity-catalog-for-our-workspace/td-p/73258

  2. Enable System Schemas for access & queries

    1. Follow the following documentation to enable the system schemas:

      1. https://docs.databricks.com/en/admin/system-tables/index.html#enable

        1. https://kb.databricks.com/unity-catalog/find-your-metastore-id

        2. Run the following commands to enable:

          curl -v -X PUT -H "Authorization: Bearer <PAT TOKEN>" "https://<YOUR WORKSPACE>.cloud.databricks.com/api/2.0/unity-catalog/metastores/<METASTORE ID>/systemschemas/access"
          curl -v -X PUT -H "Authorization: Bearer <PAT TOKEN>" "https://<YOUR WORKSPACE>.cloud.databricks.com/api/2.0/unity-catalog/metastores/<METASTORE ID>/systemschemas/query"
          
  3. SQL Warehouse

    1. Create or use an existing SQL Warehouse for K to use for the extraction.

    2. Go to the SQL Warehouse page. Select the SQL Warehouse to use.

      1. Go to the connection details

      2. Note down the Server hostname and HTTP path

  4. PAT Token

    1. For the user that will be used to connect, create a PAT token to be used for authentication.

    2. https://docs.databricks.com/en/dev-tools/auth/pat.html

From the above, record down the following for use in the setup:

  1. Databricks account URL

    1. e.g. adb-<workspaceId>.<instance>.azuredatabricks.net

  2. PAT token

  3. Server host name

  4. HTTP path


Step 2) Connecting K to Databricks

  • Select Platform Settings in the side bar

  • In the pop-out side panel, under Integrations click on Sources

  • Click Add Source and select Databricks

  • Select Direct Connect and add your Databricks details and click Next

  • Fill in the Source Settings and click Next

    • Name: The name you wish to give your Databricks source in K

    • Host: Add your Databricks Account location. Can be seen in the URL when you log into your Databricks account.

      • e.g. adb-<workspaceId>.<instance>.azuredatabricks.net

    • Server Hostname: The SQL Warehouse Server hostname

      • e.g. adb-<workspaceId>.<instance>.azuredatabricks.net

    • HTTP Path: The SQL Warehouse Http path either to a DBSQL endpoint or to a DBR interactive cluster

      • e.g. /sql/1.0/warehouses/<warehouseId>

  • Add the Connection details and click Save & Next when connection is successful

    • PAT Token: Add the PAT token created in Step 1

  • Test your connection and click Save


Step 3) Manually run an ad hoc load to test the integration

  • Next to your new Source, click on the Run manual load icon

  • Confirm how you want the source to be loaded

  • After the source load is triggered, a pop up bar will appear taking you to the Monitor tab in the Batch Manager page. This is the usual page you visit to view the progress of source loads

A manual source load will also require a manual run of

  • DAILY

  • GATHER_METRICS_AND_STATS

To load all metrics and indexes with the manually loaded metadata. These can be found in the Batch Manager page

Troubleshooting failed loads

  • If the job failed at the extraction step

    • Check the error. Contact KADA Support if required.

    • Rerun the source job

  • If the job failed at the load step, the landing folder failed directory will contain the file with issues.

    • Find the bad record and fix the file

    • Rerun the source job