Skip to main content
Skip table of contents

Azure Data Factory

This page will walkthrough the setup of Azure Data Factory in K using the direct connect method

Integration details

Scope

Included

Comments

Metadata

YES

See below

Lineage

YES

Usage

YES

Sensitive Data Scanner

N/A

Known limitations

  • Not all sources and destinations are included in the metadata extraction. Improvements are planned to provide wider coverage

  • Sources Implemented

    • SNOWFLAKE


Step 1) Enabling Azure Data Factory Admin APIs to be accessible to an AD Group

This step is performed by the Azure Data Factory Admin

  • Under Azure services click on Data factories

  • Locate the Data Factory that you would like to connect to K

  • Click on Overview to copy the below details for a later step:

    • Factory name

    • Resource group name

    • Subscription ID


Step 2) Registering Azure Data Factory App in Azure AD

This step is performed by the Azure AD Admin

  • Log in to your company’s Azure Portal and open the Azure Active Directory page

  • Select App Registration in the side panel and click New registration

  • Complete the registration form

    • Name: Enter a name for the integration e.g. KADA Azure Data Factory API Integration

    • Supported account types: Select Accounts in this organisation directory only

    • Redirect URL: Add Web / https://www.kada.ai

  • Click Register to complete the registration

  • Click on the newly created KADA Azure Data Factory API Integration App

  • Save the Application (client) ID and Directory (tenant) ID for use in a later step

  • Click on Endpoints and save the URL for OpenID Connect metadata document for use in a later step

  • Select Certificates & secrets in the side panel and click New client secret

  • Complete the new secret form and save the Secret Value for use in a later step

Make sure you send all of the information from Step 1 and Step 2 to the K Admin so that they can complete step 4.

  • Factory name

  • Resource group name

  • Subscription ID

  • Application (client) ID

  • Directory (tenant) ID

  • Secret Value


Step 3) Update your Azure Data Factory access control

This step is performed by the Azure Data Factory Admin

To ensure your Azure Data Factory can connect to K, you will need to provide the Azure Data Factory with the correct Role Assignment

  • Follow Step 1 to navigate to your Data Factory you wish to profile. You will need to perform the following steps for each Data Factory you wish to profile.

    Open a Data Factory

  • Click on Access control (IAM) in the panel and click Add

  • Select Data Factory Contributor

  • Click Select Member.

    In the side panel add the the Service Application you created in Step 2. Click Select to add the Service Application.

    Click Review + Assign to finish adding the permission.


Step 4) Add Azure Data Factory as a New Source

This step is performed by the K Admin

  • Select Platform Settings in the side bar

  • In the pop-out side panel, under Integrations click on Sources

  • Click Add Source and select AZURE_DATA_FACTORY

  • Select Direct Connect

  • Fill in the Source Settings and click Save & Next

    • Name: Give the Azure Data Factory source a name in K. If you have multiple ADFs, each one will need to have a unique name

    • Host: Enter the url e.g. adf.azure.com

    • Timeout: Default is 10, sometimes it may take longer for the API to respond, so we recommend increasing it to 20

    • Update the Host / Database mapping - see Host / Database Mapping. This step can be completed after the initial load via the guided workflow.

    • Select Enable Workspace Filtering if you wish to load only select Workspaces

  • Add Connection Details and click Save & Next

    • Tenant ID: Add the Directory (tenant) ID copied from step 2

    • Client ID : Add the Application (client) ID copied from Step 2

    • Client Secret: Add the Secret ID copied from Step 2

  • Test your connection and click Next

  • If you selected Enabled Workspace Filtering select the Workspaces you want to load. If you have a lot of workspaces this may take a bit of time to load.

  • Click Finish Setup


Step 4) Schedule Azure Data Factory source load

  • Select Platform Settings in the side bar

  • In the pop-out side panel, under Integrations click on Sources

  • Locate your new Azure Data Factory Source and click on the Schedule Settings (clock) icon to set the schedule

Note that scheduling a source can take up to 15 minutes to propagate the change.


Step 5) Manually run an ad hoc load to test Azure Data Factory

  • Next to your new Source, click on the Run manual load icon

    Confirm how your want the source to be loaded

  • After the source load is triggered, a pop up bar will appear taking you to the Monitor tab in the Batch Manager page. This is the usual page you visit to view the progress of source loads

A manual source load will also require a manual run of

  • DAILY

  • GATHER_METRICS_AND_STATS

To load all metrics and indexes with the manually loaded metadata. These can be found in the Batch Manager page

 

Troubleshooting failed loads

  • If the job failed at the extraction step

    • Check the error. Contact KADA Support if required.

    • Rerun the source job

  • If the job failed at the load step, the landing folder failed directory will contain the file with issues.

    • Find the bad record and fix the file

    • Rerun the source job

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.