K Knowledge Base
Breadcrumbs

DB2 (via Collector method) - v3.1.0

About Collectors


Pre-requisites

Collector Server Minimum Requirements

DB2 Requirements

  • The DB2 user that the collector will be using must have select access to the following tables

    1. syscat.tables

    2. syscat.views

    3. syscat.columns

    4. syscat.procedures

    5. syscat.functions

    6. syscat.roleauth

    7. syscat.tableauth

    8. sysibm.sqlforeignkeys

Enabling DB2 Audit

To capture usage information audit needs to be enabled in db2.

See https://www.ibm.com/docs/en/db2/11.1?topic=facility-audit-policies

KADA audit policy guidelines

  1. KADA recommending to start using the WITHOUT DATA directive to limit logging. However if dynamic sql is used WITH DATA may need to be enabled.

  2. KADA only requires the successful EXECUTE events

CREATE AUDIT POLICY KADA CATEGORIES EXECUTE WITHOUT DATA STATUS SUCCESS ERROR TYPE NORMAL COMMIT
AUDIT DATABASE USING POLICY KADA COMMIT

After the logs are captured they need to decoded and loaded into db2 tables. KADA will extract the usage information from the audit tables. Follow the guide https://www.ibm.com/docs/en/db2/11.1?topic=logs-creating-tables-db2-audit-data


Step 2: Create the Source in K

Create a DB2 source in K

  • Go to Settings, Select Sources and click Add Source

  • Select DB2 Source Type

  • Select "Load from File system" option

  • Give the source a Name - e.g. DB2 Production

  • Add the Host name for the DB2 Server

  • Click Finish Setup


Step 3: Getting Access to the Source Landing Directory


Step 4: Install the Collector

You can download the Latest Core Library and whl via Platform Settings → SourcesDownload Collectors

Run the following command to install the collector

pip install kada_collectors_extractors_<version>-none-any.whl

You will also need to install the common library kada_collectors_lib for this collector to function properly.

pip install kada_collectors_lib-<version>-none-any.whl

Step 5: Configure the Collector

The DB2 collector currently only supports meta_only=true, do not set this to false.

FIELD

FIELD TYPE

DESCRIPTION

EXAMPLE

server

string

DB2 Server. If using a custom port append with comma Example: 10.1.1.23,5678

"10.1.18.19"

username

string

Username to log into the DB2 account

"myuser"

password

string

Password to log into the DB2 account


database_name

string

The DB2 database to connect to

"db2inst"

output_path

string

Absolute path to the output location

"/tmp/output"

mask

boolean

To enable masking or not

true

compress

boolean

To gzip the output or not

true

meta_only

boolean

Extract meta only

true

host_name

string

This is the host value that you will be or have onboarded the source into K as.

db2prod

audit_schema

string

The schema for the audit tables, default is audit

audit

audit_table

string

The table name for the audit table, default is execute

execute

kada_db2_extractor_config.json

{
    "server": "",
    "username": "",
    "password": "",
    "database_name": "",
    "output_path": "/tmp/output",
    "mask": true,
    "compress": true,
    "meta_only": true,
    "host_name": "",
    "audit_schema": "audit",
    "audit_table": "execute"
}

Step 6: Run the Collector

This is the wrapper script: kada_db2_extractor.py

import os
import argparse
from kada_collectors.extractors.utils import load_config, get_hwm, publish_hwm, get_generic_logger
from kada_collectors.extractors.db2 import Extractor

get_generic_logger('root')

_type = 'db2'
dirname = os.path.dirname(__file__)
filename = os.path.join(dirname, 'kada_{}_extractor_config.json'.format(_type))

parser = argparse.ArgumentParser(description='KADA DB2 Extractor.')
parser.add_argument('--config', '-c', dest='config', default=filename)
parser.add_argument('--name', '-n', dest='name', default=_type)
args = parser.parse_args()

start_hwm, end_hwm = get_hwm(args.name)

ext = Extractor(**load_config(args.config))
ext.test_connection()
ext.run(**{"start_hwm": start_hwm, "end_hwm": end_hwm})

publish_hwm(args.name, end_hwm)

Step 7: Check the Collector Outputs

K Extracts

A set of files (eg metadata, databaselog, linkages, events etc) will be generated in the output_path directory.

High Water Mark File

A high water mark file is created called db2_hwm.txt.


Step 8: Push the Extracts to K

Once the files have been validated, you can push the files to the K landing directory.


Example: Using Airflow to orchestrate the Extract and Push to K