About Collectors
Collectors are extractors that are developed and managed by you (a customer of K).
KADA provides python libraries that customers can use to quickly deploy a Collector.
Why you should use a Collector
There are several reasons why you may use a collector vs the direct connect extractor:
-
You are using the KADA SaaS offering and it cannot connect to your sources due to firewall restrictions
-
You want to push metadata to KADA rather than allow it to pull data for security reasons
-
You want to inspect the metadata before pushing it to K
Using a collector requires you to manage:
-
Deploying and orchestrating the extract code
-
Managing a high water mark so the extract only pulls the latest metadata
-
Storing and pushing the extracts to your K instance
Pre-requisites
Collector Server Minimum Requirements
For the collector to operate effectively, it will need to be deployed on a server with the below minimum specifications:
-
CPU: 2 vCPU
-
Memory: 8GB
-
Storage: 30GB (depends on historical data extracted)
-
OS: unix distro e.g. RHEL preferred but can also work with Windows Server
-
Python 3.10.x or later
-
Access to K landing directory
SQL Server Requirements
-
Access to SQL Server - SQL Server 2012 or later
-
Create a SQL login (non-domain login) with the following permissions: read-only access to system views, view definition on objects, and impersonate any login
Step 1: SQL Server Setup
To configure SQL Server for use with the collector, you need to enable SQL Server authentication. You also need to create a dedicated SQL Server login and configure it with appropriate permissions.
Step 2: Create the Source in K
Create a SQL Server source in K
-
Go to Settings, Select Sources and click Add Source
-
Select "Load from File" option
-
Give the source a Name - e.g. SQLServer Production
-
Add the Host name for the SQL Server
-
Click Finish Setup
Step 3: Getting Access to the Source Landing Directory
When using a Collector you will push metadata to a K landing directory.
To find your landing directory you will need to:
-
Go to Platform Settings - Settings. Note down the value of this setting:
-
If using Azure: storage_azure_storage_account
-
If using AWS:
-
storage_root_folder - the AWS s3 bucket
-
storage_aws_region - the region where the AWS s3 bucket is hosted
-
-
-
Go to Sources - Edit the Source you have configured. Note down the landing directory in the About this Source section.
To connect to the landing directory you will need:
-
If using Azure: a SAS token to push data to the landing directory. Request this from KADA Support (support@kada.ai)
-
If using AWS:
-
An Access key and Secret. Request this from KADA Support (support@kada.ai)
-
OR provide your IAM role to KADA Support to provision access.
-
Step 4: Install the Collector
You can download the Latest Core Library and whl via Platform Settings → Sources → Download Collectors
Run the following command to install the collector
pip install kada_collectors_extractors_<version>-none-any.whl
You will also need to install the common library kada_collectors_lib for this collector to function properly.
pip install kada_collectors_lib-<version>-none-any.whl
Step 5: Configure the Collector
|
FIELD |
FIELD TYPE |
DESCRIPTION |
EXAMPLE |
|---|---|---|---|
|
user |
string |
User to connect to SQL Server |
"sqluser" |
|
password |
string |
Password to connect to SQL Server |
"password" |
|
server |
string |
Server name or IP of the SQL Server |
"localhost" |
|
host_name |
string |
The onboarded host in K for the SQL Server |
"localhost" |
|
database_name |
string |
The onboarded database name in K for the SQL Server |
"sqldb" |
|
output_path |
string |
Absolute path to the output location |
"/tmp/output" |
|
mask |
boolean |
To enable masking or not |
true |
|
compress |
boolean |
To gzip the output or not |
true |
kada_sqlserver_extractor_config.json
{
"user": "",
"password": "",
"server": "",
"host_name": "",
"database_name": "",
"output_path": "/tmp/output",
"mask": true,
"compress": true
}
Step 6: Run the Collector
See Collector Integration General Notes for how to run the collector and example script.
Step 7: Check the Collector Outputs
K Extracts
A set of files (eg metadata, databaselog, linkages, events etc) will be generated in the output_path directory.
High Water Mark File
A high water mark file is created called sqlserver_hwm.txt.
Step 8: Push the Extracts to K
Once the files have been validated, you can push the files to the K landing directory.