Master Data Management Using Snowflake + Streamlit Apps

Implement an efficient master data management workflow using Snowflake for logic and Streamlit for business-friendly UI.

MASTER & REFERENCE DATA MANAGEMENTADVANCED DATA ANALYTICS & VISUALISATION

Kiran Yenugudhati

12/15/20242 min read

Learn how to build an internal Master Data Management (MDM) solution using Snowflake for logic and audit tracking, and Streamlit for a self-service business UI.

This low-code, cost-effective approach enables teams to:

  • Clean and group inconsistent values

  • Confirm or correct fuzzy-matched suggestions

  • Track approvals with full audit trail

  • Drive consistent reporting in Tableau, Power BI, and dbt models

All without paying for a full-scale MDM product.

πŸ’¬ Why This Matters

In almost every business, free-text or poorly maintained master data causes:

  • Duplicated records

  • Messy dashboard filters

  • Misaligned reporting metrics

  • Manual cleanup before each reporting cycle

Whether you're standardising investment channels, client types, hotel amenities, or partner names, the need is the same:
πŸ“¦ Raw values in β†’ Cleaned values out β†’ Auditable, confirmed, and trusted.

πŸ’‘ Why Not Use a Traditional MDM Tool?

There are many great commercial Master Data Management tools (like Informatica MDM, Reltio, Ataccama, etc.) β€” but they come with:

  • πŸ’Έ High licensing costs

  • πŸ› οΈ Complex implementations

  • 🧠 Specialised skillsets and vendor lock-in

  • πŸ•°οΈ Long timelines to onboard and scale

As a result, many growing companies still rely on:

  • Excel spreadsheets

  • CSV-based mapping files

  • Manual adjustments during reporting

This is fragile, error-prone, and not scalable.

βœ… A Practical Alternative

With Snowflake + Streamlit, you can build:

  • βœ… A governed, auditable master data layer

  • βœ… A low-code UI your business teams can actually use

  • βœ… At zero licensing cost (compute only)

This approach is perfect for teams that need:

  • 🧩 Lightweight governance

  • 🧹 Self-service cleanup workflows

  • ⚑ Fast time-to-value β€” without vendor lock-in

πŸ§ͺ Use Case: Interactive Grouping for Investment Channels

Let’s say your CRM or onboarding flow captures free-text for "Channel":

"private_banker", "PB", "PrivateBank", "Retail", "direct", "Web-Direct", "Online Booking"

Your team needs to group and clean these into:

  • "Private Banking"

  • "Retail"

  • "Online Direct"

The MDM app enables business users to:

  • View raw values

  • See system-suggested display names

  • Edit or override them

  • Confirm values with one click

  • Push results to reporting β€” with full audit trail

πŸ–₯️ Key App Features

  • Tab-based UI per master data domain

  • Filters for confirmed/unconfirmed status

  • Search on raw or display name

  • Edit suggestions inline

  • Checkboxes to confirm

  • Audit columns for user + timestamp

  • Snowflake-native β€” no external infra or storage needed

πŸ› οΈ Full app UI, code, and SQL snippets coming soon.

🧭 Types of MDM Workflows

1. Static Mapping (Controlled Reference Mapping)

For simple, fixed relationships where raw values should always resolve to the same group or label.

Examples:

  • "EMEA" β†’ "Europe, Middle East, and Africa"

  • "US" β†’ "United States"

βœ… Best for: low-cardinality fields, controlled vocabularies, reference codes.

2. Interactive Grouping & Cleanup (Fuzzy Matching)

For messy, free-text fields with high variation across systems.

Supports:

  • System-suggested β€œgroup” based on similarity (e.g., Soundex, Levenshtein, co-occurrence)

  • Business user confirmation

  • Inline editing of display names

  • Logging of who confirmed what, and when

Example:

β†’ Raw values: "JP Morgan", "JPMorgan", "JPM", "JP Morgan Bank"

β†’ Suggested group: "J.P. Morgan"

βœ… Best for: long-tail values, unstructured entry points (forms, CRM, legacy systems)

3. Hybrid and Contextual Mappings

You can mix both methods based on:

  • Field type

  • Source system

  • Region or business unit

Example:

  • Use static mapping for known internal codes

  • Use interactive grouping for external vendor names

  • Apply filters (e.g., geography) to scope reviews

πŸ–₯️ App Experience (Business UI)

The Streamlit app provides:

  • Filterable list of values (confirmed/unconfirmed)

  • Search by raw value or display name

  • Inline editing and checkbox to confirm

  • Real-time audit of who made changes

  • Role-specific views (optional)

πŸ› οΈ Screenshots and code templates coming soon.

πŸ” Governance & Audit Trail

All confirmations are saved with:

  • display_name

  • suggested_group (optional)

  • confirmed_by (Snowflake CURRENT_USER)

  • confirmed_date (timestamp)

  • Optional: source_system, last_seen_date, is_active

πŸ“Š Analytics Integration

Downstream tools like Tableau, Power BI, or dbt models can:

  • Pull only confirmed values (is_confirmed = TRUE)

  • Highlight or exclude unconfirmed records

  • Join to this table for trusted labels or groupings

🎯 Key Benefits

  • βœ… Business-friendly interface for self-service cleanup

  • βœ… Easy to extend across data domains

  • βœ… Zero dependency on third-party tools or vendors

  • βœ… Auditable, traceable, and real-time inside Snowflake

  • βœ… Faster, cheaper, and more transparent than a big MDM rollout

πŸ“Ž Artefacts

  • GitHub repo with full Streamlit app template

  • Fuzzy matching logic using Snowflake

  • Bulk mapping upload/import and admin review tools

  • Role-based access per data domain