Master Data Management Using Snowflake + Streamlit Apps
Implement an efficient master data management workflow using Snowflake for logic and Streamlit for business-friendly UI.
MASTER & REFERENCE DATA MANAGEMENTADVANCED DATA ANALYTICS & VISUALISATION
Kiran Yenugudhati
12/15/20242 min read
Learn how to build an internal Master Data Management (MDM) solution using Snowflake for logic and audit tracking, and Streamlit for a self-service business UI.
This low-code, cost-effective approach enables teams to:
Clean and group inconsistent values
Confirm or correct fuzzy-matched suggestions
Track approvals with full audit trail
Drive consistent reporting in Tableau, Power BI, and dbt models
All without paying for a full-scale MDM product.
π¬ Why This Matters
In almost every business, free-text or poorly maintained master data causes:
Duplicated records
Messy dashboard filters
Misaligned reporting metrics
Manual cleanup before each reporting cycle
Whether you're standardising investment channels, client types, hotel amenities, or partner names, the need is the same:
π¦ Raw values in β Cleaned values out β Auditable, confirmed, and trusted.
π‘ Why Not Use a Traditional MDM Tool?
There are many great commercial Master Data Management tools (like Informatica MDM, Reltio, Ataccama, etc.) β but they come with:
πΈ High licensing costs
π οΈ Complex implementations
π§ Specialised skillsets and vendor lock-in
π°οΈ Long timelines to onboard and scale
As a result, many growing companies still rely on:
Excel spreadsheets
CSV-based mapping files
Manual adjustments during reporting
This is fragile, error-prone, and not scalable.
β A Practical Alternative
With Snowflake + Streamlit, you can build:
β A governed, auditable master data layer
β A low-code UI your business teams can actually use
β At zero licensing cost (compute only)
This approach is perfect for teams that need:
π§© Lightweight governance
π§Ή Self-service cleanup workflows
β‘ Fast time-to-value β without vendor lock-in
π§ͺ Use Case: Interactive Grouping for Investment Channels
Letβs say your CRM or onboarding flow captures free-text for "Channel":
"private_banker", "PB", "PrivateBank", "Retail", "direct", "Web-Direct", "Online Booking"
Your team needs to group and clean these into:
"Private Banking"
"Retail"
"Online Direct"
The MDM app enables business users to:
View raw values
See system-suggested display names
Edit or override them
Confirm values with one click
Push results to reporting β with full audit trail
π₯οΈ Key App Features
Tab-based UI per master data domain
Filters for confirmed/unconfirmed status
Search on raw or display name
Edit suggestions inline
Checkboxes to confirm
Audit columns for user + timestamp
Snowflake-native β no external infra or storage needed
π οΈ Full app UI, code, and SQL snippets coming soon.
π§ Types of MDM Workflows
1. Static Mapping (Controlled Reference Mapping)
For simple, fixed relationships where raw values should always resolve to the same group or label.
Examples:
"EMEA" β "Europe, Middle East, and Africa"
"US" β "United States"
β Best for: low-cardinality fields, controlled vocabularies, reference codes.
2. Interactive Grouping & Cleanup (Fuzzy Matching)
For messy, free-text fields with high variation across systems.
Supports:
System-suggested βgroupβ based on similarity (e.g., Soundex, Levenshtein, co-occurrence)
Business user confirmation
Inline editing of display names
Logging of who confirmed what, and when
Example:
β Raw values: "JP Morgan", "JPMorgan", "JPM", "JP Morgan Bank"
β Suggested group: "J.P. Morgan"
β Best for: long-tail values, unstructured entry points (forms, CRM, legacy systems)
3. Hybrid and Contextual Mappings
You can mix both methods based on:
Field type
Source system
Region or business unit
Example:
Use static mapping for known internal codes
Use interactive grouping for external vendor names
Apply filters (e.g., geography) to scope reviews
π₯οΈ App Experience (Business UI)
The Streamlit app provides:
Filterable list of values (confirmed/unconfirmed)
Search by raw value or display name
Inline editing and checkbox to confirm
Real-time audit of who made changes
Role-specific views (optional)
π οΈ Screenshots and code templates coming soon.
π Governance & Audit Trail
All confirmations are saved with:
display_name
suggested_group (optional)
confirmed_by (Snowflake CURRENT_USER)
confirmed_date (timestamp)
Optional: source_system, last_seen_date, is_active
π Analytics Integration
Downstream tools like Tableau, Power BI, or dbt models can:
Pull only confirmed values (is_confirmed = TRUE)
Highlight or exclude unconfirmed records
Join to this table for trusted labels or groupings
π― Key Benefits
β Business-friendly interface for self-service cleanup
β Easy to extend across data domains
β Zero dependency on third-party tools or vendors
β Auditable, traceable, and real-time inside Snowflake
β Faster, cheaper, and more transparent than a big MDM rollout
π Artefacts
GitHub repo with full Streamlit app template
Fuzzy matching logic using Snowflake
Bulk mapping upload/import and admin review tools
Role-based access per data domain
ACTUVATE PTY LTD
Delivering strategic data architecture, cloud engineering, and AI-driven solutions
Connect
Insights
contact@actuvate.com.au
Β© 2024. All rights reserved.