This page provides a list of administrator queries for solving unique problems that are yet to be available via the UI.
Dataset upstream source extract
There may be a time when you need to see all the upstream objects for datasets in K.
A current workaround is to follow the below instructions to generate an extract
Extract details
|
Columns |
Description |
Example |
|---|---|---|
|
name |
Name of the dataset |
Customer model |
|
object_type |
Type of the dataset |
Dataset |
|
object_id |
ID of the dataset |
753f5c32-5cf3-3af6-a499-364d749344e5 |
|
source_name |
Name of the dataset source |
Power BI |
|
upstream_name |
Name of the upstream object |
dim_customer |
|
upstream_object_type |
Type of the upstream object |
Table |
|
upstream_object_signature |
Fully qualified location fo the upstream object |
source.database.schema.table |
|
upstream_id |
ID of the upstream object |
8a05f40a-0e6a-3ab1-93eb-9a6db10e0601 |
|
upstream_source_name |
Name of the upstream source |
Snowflake |
Instructions to produce the extract
# CONNECT to the postgres pod and start a psql session
kubectl exec -it postgres-statefulset-0 -- psql -U postgres -d cerebrum
# Run this query
COPY (
SELECT DISTINCT
ds.name AS name,
dsr.name AS object_type,
ds.id AS object_id,
dss.name AS source_name,
r.name AS upstream_name,
rr.name AS upstream_object_type,
r.signature AS upstream_object_signature,
r.id AS upstream_id,
rs.name AS upstream_source_name
FROM node ds
INNER JOIN node_ref dsr ON dsr.id = ds.node_ref_id
INNER JOIN source dss ON dss.id = ds.source_id
INNER JOIN edge ON edge.source_node_id = ds.id AND edge.source_node_ref_id = ds.node_ref_id
INNER JOIN node r ON r.id = edge.target_node_id AND r.node_ref_id = edge.target_node_ref_id
INNER JOIN source rs ON rs.id = r.source_id
INNER JOIN node_ref rr ON rr.id = r.node_ref_id
WHERE ds.node_ref_id IN (15,27)
AND edge.edge_ref_id IN (6,26,32)
AND edge.source_node_ref_id IN (15,27)
AND edge.target_node_ref_id IN (3,4)
AND r.node_ref_id IN (3,4)
ORDER BY source_name, object_type, name, upstream_source_name, upstream_object_type, upstream_object_signature
) TO '/tmp/extract.csv' DELIMITER ',' CSV HEADER;
# Exist out of postgres pod.
# Use kubectl to copy the csv out of the pod.
kubectl cp postgres-statefulset-0:/tmp/extract.csv extract_table_size.csv