K Knowledge Base
Breadcrumbs

BigQuery (via Collector method) - v3.0.0

About Collectors


Pre-requisites

Collector Server Minimum Requirements

BigQuery Requirements

  • Access to BigQuery


Step 1: Establish BigQuery Access

This step is performed by the Google Cloud Admin

  • Create a Service Account by going to the Google Cloud Admin or clicking on this link

    • Give the Service Account a name (e.g. KADA BQ Integration)

    • Select the Projects that include the BigQuery instance(s) that you want to catalog

    • Click Save

  • Create a Service Token

    • Click on the Service Account

    • Select the Keys tab. Click on Create new key

    • Select the JSON option. After clicking 'CREATE', the JSON file will automatically download to your device.

  • Add permission grants on the Service Account by going to IAM page

    • Click on ADD

    • Add the Service Account to the 'New principals' field.

    • Grant the following roles this principal:

      • BigQuery Job User

      • BigQuery Metadata Viewer

      • BigQuery Read Session User

      • BigQuery Resource Viewer

    • Click SAVE


Step 2: Create the Source in K

Create a BigQuery source in K

  • Go to Settings, Select Sources and click Add Source

  • Select "Load from File system" option

  • Give the source a Name - e.g. BigQuery Production

  • Add the Host name for the BigQuery Server

  • Click Finish Setup


Step 3: Getting Access to the Source Landing Directory


Step 4: Install the Collector

You can download the Latest Core Library and whl via Platform Settings → SourcesDownload Collectors

Run the following command to install the collector

pip install kada_collectors_extractors_<version>-none-any.whl

You will also need to install the common library kada_collectors_lib for this collector to function properly.

pip install kada_collectors_lib-<version>-none-any.whl

Under the covers this uses the BigQuery Client API and may have OS dependencies see https://cloud.google.com/bigquery/docs/reference/libraries


Step 5: Configure the Collector

FIELD

FIELD TYPE

DESCRIPTION

EXAMPLE

regions

list<string>

List of valid regions to inspect

"us"

projects

list<string>

List of project ids to inspect across the regions specified

"kada-data"

host

string

This is the host that was onboarded in K for BigQuery

"bigquery"

json_credentials

JSON

Service account credentials JSON

{"type": "service_account", "project_id": "...", ...}

output_path

string

Absolute path to the output location

"/tmp/output"

mask

boolean

To enable masking or not

true

compress

boolean

To gzip the output or not

true

kada_bigquery_extractor_config.json

{
    "regions": [],
    "projects": [],
    "host": "",
    "json_credentials": {},
    "output_path": "/tmp/output",
    "mask": true,
    "compress": true
}

Step 6: Run the Collector

This is the wrapper script: kada_bigquery_extractor.py

import os
import argparse
from kada_collectors.extractors.utils import load_config, get_hwm, publish_hwm, get_generic_logger
from kada_collectors.extractors.bigquery import Extractor

get_generic_logger('root')

_type = 'bigquery'
dirname = os.path.dirname(__file__)
filename = os.path.join(dirname, 'kada_{}_extractor_config.json'.format(_type))

parser = argparse.ArgumentParser(description='KADA BigQuery Extractor.')
parser.add_argument('--config', '-c', dest='config', default=filename)
parser.add_argument('--name', '-n', dest='name', default=_type)
args = parser.parse_args()

start_hwm, end_hwm = get_hwm(args.name)

ext = Extractor(**load_config(args.config))
ext.test_connection()
ext.run(**{"start_hwm": start_hwm, "end_hwm": end_hwm})

publish_hwm(args.name, end_hwm)

Step 7: Check the Collector Outputs

K Extracts

A set of files (eg metadata, databaselog, linkages, events etc) will be generated in the output_path directory.

High Water Mark File

A high water mark file is created called bigquery_hwm.txt.


Step 8: Push the Extracts to K

Once the files have been validated, you can push the files to the K landing directory.


Example: Using Airflow to orchestrate the Extract and Push to K