PP

Medallion Data Architecture

Enterprise Data Ecosystem · End-to-End Pipeline · Series B Ready

Q1 2026 Implementation GCP · BigQuery · PostgreSQL 16-Week Roadmap

End-to-End Data Architecture

DATA SOURCES External Systems & Integrations
📁
MLS / Midwest
SFTP · Daily 6:00 AM
🔗
DigiFi
Webhooks · Real-time
✍️
DocuSign
Webhooks · Events
⚙️
Internal Apps
REST APIs · On-demand
🥉 BRONZE Raw Data Zone — Immutable Source Records
☁️
Google Cloud Storage
gs://porchpass-bronze
📂 sftp/mls/{date}/daily_payments.csv
📂 webhooks/digifi/{date}/events.json
📂 webhooks/docusign/{date}/envelopes.json
🗄️
Ingestion Metadata
PostgreSQL Tracking
📋 sftp_file_ingests (file tracking)
K8s CronJob @ 6:00 AM CST
Checksum validation & dedup
🥈 SILVER Cleansed & Conformed — Validated Business Entities
🐘
PostgreSQL (Cloud SQL)
Operational Data Store
🔗 Foreign key relationships enforced
🔒 PII encryption at rest
📊 Standardized date/amount formats
Go Parser Service
Data Transformation
✔️ Schema validation
🔄 Deduplication logic
📝 Business rule application
📋
loans
P0 · NEW
💰
loan_payments
P0 · NEW
📁
sftp_file_ingests
P0 · NEW
✏️
loan_corrections
P2 · LCA
📸
loan_snapshots
P2 · Daily
🥇 GOLD Business-Ready — Analytics & Reporting
📊
BigQuery
Enterprise Data Warehouse
🎯 Pre-aggregated business metrics
Optimized for BI queries
📈 Historical trend analysis
🔄
Apache Airflow
ETL Orchestration
📅 Daily snapshot DAGs
🔁 PostgreSQL → BigQuery sync
📆 Monthly remittance jobs
📑
fact_remittance_monthly
📈
dim_loan_performance
💵
fact_daily_balances
🏪
dim_dealer_scorecard
📤
investor_loan_tape
CONSUMERS Business Intelligence & Reporting
📊
Superset / Looker
Interactive dashboards & reports
💼
Investor Portal
Self-service loan tapes & data room
📋
Remittance Reports
Automated monthly TXCO/RAMS
🔌
REST APIs
Internal apps & integrations
Ingestion Layer
SFTP Client Go + K8s CronJob
Webhook Handler Go Echo + Pub/Sub
Raw Storage GCS (Standard)
Archive Policy 90d→Coldline, 365d→Archive
🗄️
Storage Layer
Operational DB Cloud SQL (PostgreSQL)
Data Warehouse BigQuery
Object Storage Google Cloud Storage
Region us-central1
🔧
Orchestration
ETL Engine Apache Airflow
Container Platform GKE (2 nodes)
GitOps Deploy ArgoCD
Monitoring Datadog + Sentry

16-Week Implementation Roadmap

1
Foundation
Weeks 1-4
  • Complete remittance PR
  • Create loans table
  • Create loan_payments table
  • Deploy SFTP CronJob
  • GCS Bronze bucket setup
2
Integration
Weeks 5-8
  • Airflow → BigQuery ETL
  • Historical data migration
  • sftp_file_ingests tracking
  • loan_corrections (LCA)
3
Automation
Weeks 9-12
  • Remittance report automation
  • Data quality framework
  • Superset/Looker dashboards
  • Exception alerting
4
Optimization
Weeks 13-16
  • Investor data room API
  • Self-service reporting
  • Performance tuning
  • Series B documentation