Metadata-Driven ELT Framework in dbt + Snowflake for REST API Integration

Learn how to build a dynamic ELT framework that reads from REST APIs using metadata tables to control endpoints, methods, and transformations—scalable and fully dbt-integrated.

DATA INTEGRATION & AUTOMATION

Kiran Yenugudhati

11/23/20242 min read

This blog walks through how to build a scalable, metadata-driven ELT framework to extract data from REST APIs into Snowflake using:

  • ❄️ Snowflake Snowpark Python functions for API calls

  • 🔐 External Access + Secrets for secure connectivity

  • 📄 Metadata tables to manage all API configurations

  • 🧱 dbt incremental models to automate ingestion and transformation

🚫 No external ingestion tools
🚫 No manual coding per endpoint
✅ Just metadata + modular components = fully reusable framework

💡 Why Metadata-Driven?

Traditional ingestion pipelines:

  • Require a Python or DAG per endpoint

  • Involve hardcoding filters, auth, pagination

  • Need rework for every change or new object

That doesn’t scale.

With a metadata-first approach, you can:

  • Reuse a single Snowpark function for all API calls

  • Maintain configs in Snowflake tables

  • Add or update integrations by inserting metadata

  • Let dbt handle incremental loads declaratively

  • Hardcoded URLs, headers, filters

  • Separate scripts per API/module

  • Manual scheduling or orchestration

🧰 Framework Components

  • Metadata Tables (Snowflake) : Configure all APIs, objects, and fields

  • Snowpark Python Function (UDF/UDTF) : Calls REST APIs securely from Snowflake

  • External Access Integration : Controls internet access + auth

  • Secrets Management : Stores credentials securely

  • dbt Incremental Models : Ingest, filter, and transform API data

🧪 Metadata Table Structure

You manage everything via metadata stored in Snowflake. Here's a high-level summary:

  • CONFIG_DATA_SOURCES =>  Stores API base config (source name, credentials, connection type).

  • CONFIG_DATA_SOURCE_OBJECTS => Defines individual API modules/resources (e.g., Users, Invoices), filters, and active status.

  • CONFIG_DATA_SOURCE_OBJECT_FIELD_MAPPINGS => Describes fields, friendly names, data types, primary keys — used for dynamic SELECT clauses.

🛠️ Sample schemas and population examples coming soon.

🧠 Snowpark Function: Highlights

The core component is a Snowflake UDTF (User Defined Table Function) that:

  • Accepts parameters like OBJECT_NAME, OBJECT_FILTERS

  • Authenticates securely using stored credentials

  • Builds dynamic REST API URLs

  • Returns a stream of JSON results (VARIANT)

  • Logs API response codes and errors to Snowflake logs

✅ All built using Snowpark Python with requests, logging, and snowflake.getusername_password()
✅ Compatible with external access integration and secret manager
✅ Flexible for future APIs or vendors

⚙️ External Access Setup (Once Per API)

CREATE NETWORK RULE ... CREATE SECRET ... CREATE EXTERNAL ACCESS INTEGRATION ...

✅ Enables the Snowpark function to securely call external APIs from within Snowflake.

📥 Ingesting Data with dbt Incremental Models

Each API module is handled by a dbt model configured as:

  • materialized: incremental

  • Uses a macro that:

    • Reads metadata for the object

    • Calculates a dynamic last_modified filter

    • Calls the Snowpark UDTF with correct filters

    • Outputs structured rows into the RAW layer

  • Supports time zone shifts, DST, and deduplication

💡 unique_key, merge_exclude_columns, and field selections are all metadata-driven — no hardcoded SQL.

🛠️ dbt macro samples coming soon.

🎯 Benefits

  • No external tool required => Entirely Snowflake + dbt

  • Easy to extend => Add new API endpoints via metadata rows

  • Centralized config => Everything lives in Snowflake (auditable)

  • Incremental by design => Only fetches new/changed records

  • Secure => Uses built-in secrets + access controls

  • Declarative & testable => Follows dbt best practices

  • Scalable => Single pattern supports any API

🔍 Ideal Use Cases

  • REST API integrations across multiple applications

  • Partner or platform data (e.g., CRM, finance, booking)

  • External systems with complex or paginated endpoints

  • Projects that require no external orchestrator

🧩 Bonus: Future-Ready Features

  • Retry-on-failure and error JSON responses

  • Row-level logging with response metadata

  • Streamlit UI for managing object configs

  • dbt exposures and metrics built on top of clean layers

  • Usage tracking per API or module

📎 Artefacts

  • GitHub repo with full dbt + Snowpark implementation

  • Metadata table sample data + config UI (Streamlit optional)

  • Visual diagrams showing end-to-end flow

  • Audit log setup using Snowflake event table

  • Best practices for flattening and late-binding joins