Technology and infrastructure
Data and analytics infrastructure
Data and analytics infrastructure
Your loan management system handles daily operations. Your data infrastructure handles everything else: historical analysis, cohort performance, investor reporting, and the analytics that let you actually understand your portfolio. As your ABF operations grow, the quality of your data infrastructure determines whether you can answer lender questions in hours or weeks.
Why you need both LMS and data warehouse
Your LMS is optimized for transactions, not analysis. Running complex queries against your production servicing system slows down operations and risks data corruption. A separate data warehouse gives you:
Historical snapshots: Loan tape as of every month-end, so you can rebuild any historical report. When a lender asks “what did the 2023Q2 vintage look like six months after origination,” you have the answer.
Performance tracking: Roll rates, transition matrices, vintage curves over time. These require comparing snapshots across periods, which transaction systems don’t handle well.
Flexible analysis: Ad-hoc queries your LMS can’t handle. “Show me all loans in Texas with FICO below 650 that paid late in month 3 but are now current” shouldn’t require a development project.
Safe experimentation: Analysts can run anything without affecting production. No risk of locking up your servicing system during month-end close.
Core data architecture
A functional data warehouse has three layers:
Staging layer
Raw data extracted from source systems (LMS, payment processor, bank accounts). No transformation, just capture. This preserves the source data exactly as it was received, which matters for audit and debugging.
Key principles:
- Extract daily or more frequently for critical data
- Never modify data in staging
- Keep full extraction history
Transformation layer
Data cleaning, validation, and business logic. This is where you:
- Standardize formats across sources
- Calculate derived fields (days past due, cohort assignments, eligibility flags)
- Apply business rules (concentration calculations, covenant definitions)
- Flag exceptions and data quality issues
This layer is where most of the work happens. Document your transformation logic carefully because lenders will ask how you calculate things.
Production layer
Clean, validated data ready for reporting and analysis. This is what your reports pull from. Data here should be:
- Consistent across time periods
- Reconciled to source systems
- Ready for self-service queries
What to store
At minimum, maintain historical snapshots of:
| Data Type | Frequency | Retention |
|---|---|---|
| Loan tape (static) | Monthly | Life of facility + 7 years |
| Payment transactions | Daily | Life of facility + 7 years |
| Collection activity | Daily | Life of facility + 7 years |
| Modification records | As modified | Life of facility + 7 years |
| Borrowing base | Monthly | Life of facility |
| Covenant calculations | Monthly/Quarterly | Life of facility |
The 7-year retention covers most regulatory and contractual requirements, but check your facility agreements for specific terms.
Loan tape snapshots
Store a complete loan tape as of each month-end. This should include:
- All static loan attributes (origination data, credit, collateral)
- Current status (balance, payment status, days past due)
- Cumulative performance (total paid, total charged off)
- Facility-specific fields (eligibility flags, concentration buckets)
Transaction-level data
Don’t just store summaries. Keep every:
- Payment received (date, amount, application method)
- Fee assessed or waived
- Status change
- Modification or workout action
- Collection attempt and result
Transaction-level data lets you reconstruct any point in time and investigate discrepancies.
Cloud data platforms
Cloud data warehouses are the standard choice now. You don’t need to manage servers; you pay for storage and compute. Cost scales with usage.
| Platform | Strengths | Typical Cost |
|---|---|---|
| Snowflake | Best-in-class, easy to use | $1-5K/month at modest scale |
| Google BigQuery | Cost-effective, serverless | $500-3K/month |
| AWS Redshift | AWS ecosystem integration | $1-5K/month |
| Databricks | Analytics + ML combined | $2-8K/month |
Illustrative pricing. See pricing disclaimer.
For most originators under $500M AUM, a cloud data warehouse plus a BI tool (Tableau, Looker, Power BI, or even well-structured Excel connected to the warehouse) is sufficient.
Choosing a platform
Snowflake is the default choice for most ABF operations. Easy to use, well-documented, handles structured data well. The consumption-based pricing is predictable for loan portfolio analytics.
BigQuery works well if you’re already in the Google ecosystem or want the lowest cost at modest scale. Serverless model means no capacity planning.
Redshift makes sense if you’re heavily invested in AWS. Tight integration with other AWS services but requires more administration than Snowflake.
Databricks is overkill for most originators unless you’re doing advanced analytics or machine learning. The platform combines data warehousing with compute for complex analytics.
Analytics you need to run
Your capital providers will expect you to produce regular analytics. Build your infrastructure to generate these efficiently.
Monthly requirements
Loan tape with full history: The complete picture of every loan, updated monthly. Format per your facility agreement.
Delinquency stratification: Current, 30, 60, 90+, charge-off buckets. Show counts, balances, and percentages.
Concentration analysis: Geography, credit tier, loan size, any other facility-specific concentrations. Compare to limits.
Borrowing base calculation: Eligible assets, advance rate application, concentration haircuts. Should reconcile exactly to your funding request.
Covenant compliance: All financial covenants with calculation backup. Include cure periods and headroom.
Quarterly requirements
Static pool performance: Loss and prepayment curves by vintage. Show actual vs. expected where applicable.
Roll rate analysis: Transition probabilities between status buckets. Compare current period to historical averages.
Vintage comparison: How is 2024 vintage tracking vs. 2023? Isolate performance differences.
Portfolio stratification trends: How are concentrations changing over time? Early warning of drift.
Annual requirements
Full portfolio review materials: Comprehensive performance analysis for annual lender meetings.
Cohort-level economics: Yield, loss, net margin by origination period. Required for facility renewals and new lender discussions.
Benchmark comparison: Your performance vs. public ABS or lender expectations. Context for your results.
Building data pipelines
Data doesn’t move itself. You need automated pipelines that:
- Extract data from source systems on schedule
- Apply transformations consistently
- Load results to production tables
- Alert on failures or anomalies
Pipeline tools
dbt (data build tool): Industry standard for transformation logic. Version-controlled, testable, documented. Free core version works for most needs.
Fivetran/Airbyte: Managed data extraction from common sources. Reduces integration development but adds subscription cost.
Airflow/Dagster: Workflow orchestration for complex pipelines. Needed when you have multi-step dependencies.
For most ABF operations, dbt plus scheduled queries handles the bulk of pipeline needs. Add orchestration tools when complexity requires it.
Data quality management
Bad data undermines everything. Build quality checks into your pipelines:
Reconciliation checks: Does the sum of loans in your warehouse match your LMS? Does UPB tie?
Completeness checks: Are required fields populated? Are all expected records present?
Consistency checks: Do derived fields calculate correctly? Do status transitions make sense?
Anomaly detection: Are there unusual patterns (sudden spikes, missing days, outlier values)?
When checks fail, the pipeline should alert your team, not silently continue. Lender confidence depends on data you can trust.
Common data infrastructure mistakes
Starting too late. By the time you need cohort analysis for a lender conversation, it’s too late to build the infrastructure. Start capturing historical snapshots early.
Letting perfect be the enemy of good. A well-maintained set of Excel workbooks pulling from a clean database beats a half-implemented data warehouse that nobody trusts. Start simple and add sophistication.
Skipping documentation. When you leave, or your analyst leaves, will the next person understand what the transformations do? Document your logic.
Ignoring data lineage. When a number looks wrong, can you trace it back to the source? Build pipelines that maintain clear data lineage.
Related topics
- Loan management systems - your source system for most data
- Investor portals and integration - how this data reaches your lenders
- Technology and infrastructure - overview and implementation roadmap