Skip to main content
Skip table of contents

Athena

This page will walkthrough the setup of Athena in K using the direct connect method

Integration details

Scope

Included

Comments

Metadata

YES

See below

Lineage

YES

Usage

YES

Sensitive Data Scanner

N/A

Known limitations

  • TBC


Step 1: Establish Athena Access

It is advised you create a new Role and a separate s3 bucket for the service user provided to KADA and have a policy that allows the below, see Identity and access management in Athena - Amazon Athena

The service user/account/role will require permissions to the following

  1. Execute queries against Athena with access to the INFORMATION_SCHEMA in particular the following tables:

    1. information_schema.views

    2. information_schema.tables

    3. information_schema.columns

  2. Executing queries in Athena requires an s3 bucket to temporary store results.
    The policy must also allow Read Write Listing access to objects within that bucket, conversely, the bucket must also have policy to allow to do the same.

  3. Call the following Athena APIs

    1. list_databases

    2. list_table_metadata

    3. list_query_executions

    4. list_work_groups

    5. batch_get_query_executions

    6. start_query_execution

    7. get_query_execution

  4. The service user/account/role will need permissions to access all workgroups to be able to extract all data, if you omit workgroups, that information will not be extracted and you may not see the complete picture in K.

  5. See IAM policies for accessing workgroups - Amazon Athena on how to add policy entries to have fine grain control at the workgroup level. Note that the extractor runs queries on Athena, If you do choose to restrict workgroup access, ensure that Query based actions (e.g. StartQueryExecution) are allowed for the workgroup the service user/account/role is associated to.

Note that user usage will be associated to the workgroup level rather than individual users, these workgroups are published as users in K in the form “athena_workgroup_<name>”

Example Role Policy to allow Athena Access with least privileges for actions, this example allows the ACCOUNT ARN to assume the role. Note the variables ATHENA RESULTS BUCKET NAME. You may also choose to just assign the policy directly to a new user and use that user without assuming roles. In the scenario you do wish to assume a role, please note down the role ARN to be used when onbaording/extracting

CODE
AWSTemplateFormatVersion: "2010-09-09"
Description: 'AWS IAM Role - Athena and Cloudtrail Access to KADA'
Resources: 
  KadaAthenaRole: 
    Type: "AWS::IAM::Role"
    Properties: 
      RoleName: "KadaAthenaRole"
      MaxSessionDuration: 43200
      Path: "/"
      AssumeRolePolicyDocument: 
        Version: "2012-10-17"
        Statement: 
        - Effect: "Allow"
          Principal:
            AWS: "[ACCOUNT ARN]"
          Action: "sts:AssumeRole"

  KadaAthenaPolicy: 
    Type: 'AWS::IAM::Policy'
    Properties:
      PolicyName: root
      PolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Action: 
              - athena:BatchGetQueryExecution
              - athena:GetQueryExecution
              - athena:GetQueryResults
              - athena:GetQueryResultsStream
              - athena:ListQueryExecutions
              - athena:StartQueryExecution
              - athena:ListWorkGroups
              - athena:ListDataCatalogs
              - athena:ListDatabases
              - athena:ListTableMetadata
            Resource: '*'
          - Effect: Allow
            Action: 
              - s3:GetBucketLocation
              - s3:GetObject
              - s3:ListBucket
              - s3:ListBucketMultipartUploads
              - s3:ListMultipartUploadParts
              - s3:AbortMultipartUpload
              - s3:PutObject
              - s3:PutBucketPublicAccessBlock
              - s3:DeleteObject
            Resource:
              - arn:aws:s3:::[ATHENA RESULTS BUCKET NAME]
      Roles:
        - !Ref KadaAthenaRole

After this step you should have the following information

  • Athena User

  • Role

  • Key

  • Secret

  • Athena S3 bucket location


Step 2: Create the Source in K

Create an Athena source in K

  • Select Platform Settings in the side bar

  • In the pop-out side panel, under Integrations click on Sources

  • Click Add Source and select Athena

  • Select Direct Connect and add your Athena details

    • Name: Give the Athena source a name in K.

    • Host: Enter a hostname for your Athena instance

    • Region: Set the region for AWS for where Athena exists e.g. ap-southeast-2

    • Athena Results bucket: Bucket location used to temporarily store Athena query results. Use the full path starting with s3://

  • Add Connection Details and click Save & Next

    • Assume Role: Add the Role from Step 1

    • Key: Add the Key from Step 1

    • Secret: Add the Secret from Step 1

  • Test your connection and click Next


Step 3: Schedule Athena source load

  • Select Platform Settings in the side bar

  • In the pop-out side panel, under Integrations click on Sources

  • Locate your new Athena Source and click on the Schedule Settings (clock) icon to set the schedule


Step 4: Manually run an ad hoc load to test Athena

  • Next to your new Source, click on the Run manual load icon

  • Confirm how your want the manual run to be completed

  • After the source load is triggered, a pop up bar will appear taking you to the Monitor tab in the Batch Manager page. This is the usual page you visit to view the progress of source loads

A manual source load will also require a manual run of

  • DAILY

  • GATHER_METRICS_AND_STATS

To load all metrics and indexes with the manually loaded metadata. These can be found in the Batch Manager page

 

Troubleshooting failed loads

  • If the job failed at the extraction step

    • Check the error. Contact KADA Support if required.

    • Rerun the source job

  • If the job failed at the load step, the landing folder failed directory will contain the file with issues.

    • Find the bad record and fix the file

    • Rerun the source job

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.