PP
Medallion Data Architecture
Enterprise Data Ecosystem · End-to-End Pipeline · Series B Ready
Q1 2026
Implementation
GCP ·
BigQuery
· PostgreSQL
16-Week
Roadmap
End-to-End Data Architecture
DATA SOURCES
External Systems & Integrations
📁
MLS / Midwest
SFTP · Daily 6:00 AM
🔗
DigiFi
Webhooks · Real-time
✍️
DocuSign
Webhooks · Events
⚙️
Internal Apps
REST APIs · On-demand
🥉 BRONZE
Raw Data Zone — Immutable Source Records
☁️
Google Cloud Storage
gs://porchpass-bronze
📂
sftp/mls/{date}/daily_payments.csv
📂
webhooks/digifi/{date}/events.json
📂
webhooks/docusign/{date}/envelopes.json
🗄️
Ingestion Metadata
PostgreSQL Tracking
📋
sftp_file_ingests (file tracking)
⏰
K8s CronJob @ 6:00 AM CST
✅
Checksum validation & dedup
🥈 SILVER
Cleansed & Conformed — Validated Business Entities
🐘
PostgreSQL (Cloud SQL)
Operational Data Store
🔗
Foreign key relationships enforced
🔒
PII encryption at rest
📊
Standardized date/amount formats
⚡
Go Parser Service
Data Transformation
✔️
Schema validation
🔄
Deduplication logic
📝
Business rule application
📋
loans
P0 · NEW
💰
loan_payments
P0 · NEW
📁
sftp_file_ingests
P0 · NEW
✏️
loan_corrections
P2 · LCA
📸
loan_snapshots
P2 · Daily
🥇 GOLD
Business-Ready — Analytics & Reporting
📊
BigQuery
Enterprise Data Warehouse
🎯
Pre-aggregated business metrics
⚡
Optimized for BI queries
📈
Historical trend analysis
🔄
Apache Airflow
ETL Orchestration
📅
Daily snapshot DAGs
🔁
PostgreSQL → BigQuery sync
📆
Monthly remittance jobs
📑
fact_remittance_monthly
📈
dim_loan_performance
💵
fact_daily_balances
🏪
dim_dealer_scorecard
📤
investor_loan_tape
CONSUMERS
Business Intelligence & Reporting
📊
Superset / Looker
Interactive dashboards & reports
💼
Investor Portal
Self-service loan tapes & data room
📋
Remittance Reports
Automated monthly TXCO/RAMS
🔌
REST APIs
Internal apps & integrations
⚡
Ingestion Layer
SFTP Client
Go + K8s CronJob
Webhook Handler
Go Echo + Pub/Sub
Raw Storage
GCS (Standard)
Archive Policy
90d→Coldline, 365d→Archive
🗄️
Storage Layer
Operational DB
Cloud SQL (PostgreSQL)
Data Warehouse
BigQuery
Object Storage
Google Cloud Storage
Region
us-central1
🔧
Orchestration
ETL Engine
Apache Airflow
Container Platform
GKE (2 nodes)
GitOps Deploy
ArgoCD
Monitoring
Datadog + Sentry
16-Week Implementation Roadmap
1
Foundation
Weeks 1-4
Complete remittance PR
Create loans table
Create loan_payments table
Deploy SFTP CronJob
GCS Bronze bucket setup
2
Integration
Weeks 5-8
Airflow → BigQuery ETL
Historical data migration
sftp_file_ingests tracking
loan_corrections (LCA)
3
Automation
Weeks 9-12
Remittance report automation
Data quality framework
Superset/Looker dashboards
Exception alerting
4
Optimization
Weeks 13-16
Investor data room API
Self-service reporting
Performance tuning
Series B documentation