Data and analytics infrastructure

Your loan management system handles daily operations. Your data infrastructure handles everything else: historical analysis, cohort performance, investor reporting, and the analytics that let you actually understand your portfolio. As your ABF operations grow, the quality of your data infrastructure determines whether you can answer lender questions in hours or weeks.

Why you need both LMS and data warehouse

Your LMS is optimized for transactions, not analysis. Running complex queries against your production servicing system slows down operations and risks data corruption. A separate data warehouse gives you:

Historical snapshots: Loan tape as of every month-end, so you can rebuild any historical report. When a lender asks “what did the 2023Q2 vintage look like six months after origination,” you have the answer.

Performance tracking: Roll rates, transition matrices, vintage curves over time. These require comparing snapshots across periods, which transaction systems don’t handle well.

Flexible analysis: Ad-hoc queries your LMS can’t handle. “Show me all loans in Texas with FICO below 650 that paid late in month 3 but are now current” shouldn’t require a development project.

Safe experimentation: Analysts can run anything without affecting production. No risk of locking up your servicing system during month-end close.

Core data architecture

A functional data warehouse has three layers:

Staging layer

Raw data extracted from source systems (LMS, payment processor, bank accounts). No transformation, just capture. This preserves the source data exactly as it was received, which matters for audit and debugging.

Key principles:

Extract daily or more frequently for critical data
Never modify data in staging
Keep full extraction history

Transformation layer

Data cleaning, validation, and business logic. This is where you:

Standardize formats across sources
Calculate derived fields (days past due, cohort assignments, eligibility flags)
Apply business rules (concentration calculations, covenant definitions)
Flag exceptions and data quality issues

This layer is where most of the work happens. Document your transformation logic carefully because lenders will ask how you calculate things.

Production layer

Clean, validated data ready for reporting and analysis. This is what your reports pull from. Data here should be:

Consistent across time periods
Reconciled to source systems
Ready for self-service queries

What to store

At minimum, maintain historical snapshots of:

Data Type	Frequency	Retention
Loan tape (static)	Monthly	Life of facility + 7 years
Payment transactions	Daily	Life of facility + 7 years
Collection activity	Daily	Life of facility + 7 years
Modification records	As modified	Life of facility + 7 years
Borrowing base	Monthly	Life of facility
Covenant calculations	Monthly/Quarterly	Life of facility

The 7-year retention covers most regulatory and contractual requirements, but check your facility agreements for specific terms.

Loan tape snapshots

Store a complete loan tape as of each month-end. This should include:

All static loan attributes (origination data, credit, collateral)
Current status (balance, payment status, days past due)
Cumulative performance (total paid, total charged off)
Facility-specific fields (eligibility flags, concentration buckets)

Transaction-level data

Don’t just store summaries. Keep every:

Payment received (date, amount, application method)
Fee assessed or waived
Status change
Modification or workout action
Collection attempt and result

Transaction-level data lets you reconstruct any point in time and investigate discrepancies.

Cloud data platforms

Cloud data warehouses are the standard choice now. You don’t need to manage servers; you pay for storage and compute. Cost scales with usage.

Platform	Strengths	Typical Cost
Snowflake	Best-in-class, easy to use	$1-5K/month at modest scale
Google BigQuery	Cost-effective, serverless	$500-3K/month
AWS Redshift	AWS ecosystem integration	$1-5K/month
Databricks	Analytics + ML combined	$2-8K/month

Illustrative pricing. See pricing disclaimer.

For most originators under $500M AUM, a cloud data warehouse plus a BI tool (Tableau, Looker, Power BI, or even well-structured Excel connected to the warehouse) is sufficient.

Choosing a platform

Snowflake is the default choice for most ABF operations. Easy to use, well-documented, handles structured data well. The consumption-based pricing is predictable for loan portfolio analytics.

BigQuery works well if you’re already in the Google ecosystem or want the lowest cost at modest scale. Serverless model means no capacity planning.

Redshift makes sense if you’re heavily invested in AWS. Tight integration with other AWS services but requires more administration than Snowflake.

Databricks is overkill for most originators unless you’re doing advanced analytics or machine learning. The platform combines data warehousing with compute for complex analytics.

Analytics you need to run

Your capital providers will expect you to produce regular analytics. Build your infrastructure to generate these efficiently.

Monthly requirements

Loan tape with full history: The complete picture of every loan, updated monthly. Format per your facility agreement.

Delinquency stratification: Current, 30, 60, 90+, charge-off buckets. Show counts, balances, and percentages.

Concentration analysis: Geography, credit tier, loan size, any other facility-specific concentrations. Compare to limits.

Borrowing base calculation: Eligible assets, advance rate application, concentration haircuts. Should reconcile exactly to your funding request.

Covenant compliance: All financial covenants with calculation backup. Include cure periods and headroom.

Quarterly requirements

Static pool performance: Loss and prepayment curves by vintage. Show actual vs. expected where applicable.

Roll rate analysis: Transition probabilities between status buckets. Compare current period to historical averages.

Vintage comparison: How is 2024 vintage tracking vs. 2023? Isolate performance differences.

Portfolio stratification trends: How are concentrations changing over time? Early warning of drift.

Annual requirements

Full portfolio review materials: Comprehensive performance analysis for annual lender meetings.

Cohort-level economics: Yield, loss, net margin by origination period. Required for facility renewals and new lender discussions.

Benchmark comparison: Your performance vs. public ABS or lender expectations. Context for your results.

Building data pipelines

Data doesn’t move itself. You need automated pipelines that:

Extract data from source systems on schedule
Apply transformations consistently
Load results to production tables
Alert on failures or anomalies

Pipeline tools

dbt (data build tool): Industry standard for transformation logic. Version-controlled, testable, documented. Free core version works for most needs.

Fivetran/Airbyte: Managed data extraction from common sources. Reduces integration development but adds subscription cost.

Airflow/Dagster: Workflow orchestration for complex pipelines. Needed when you have multi-step dependencies.

For most ABF operations, dbt plus scheduled queries handles the bulk of pipeline needs. Add orchestration tools when complexity requires it.

Data quality management

Bad data undermines everything. Build quality checks into your pipelines:

Reconciliation checks: Does the sum of loans in your warehouse match your LMS? Does UPB tie?

Completeness checks: Are required fields populated? Are all expected records present?

Consistency checks: Do derived fields calculate correctly? Do status transitions make sense?

Anomaly detection: Are there unusual patterns (sudden spikes, missing days, outlier values)?

When checks fail, the pipeline should alert your team, not silently continue. Lender confidence depends on data you can trust.

Common data infrastructure mistakes

Starting too late. By the time you need cohort analysis for a lender conversation, it’s too late to build the infrastructure. Start capturing historical snapshots early.

Letting perfect be the enemy of good. A well-maintained set of Excel workbooks pulling from a clean database beats a half-implemented data warehouse that nobody trusts. Start simple and add sophistication.

Skipping documentation. When you leave, or your analyst leaves, will the next person understand what the transformations do? Document your logic.

Ignoring data lineage. When a number looks wrong, can you trace it back to the source? Build pipelines that maintain clear data lineage.

Loan management systems - your source system for most data
Investor portals and integration - how this data reaches your lenders
Technology and infrastructure - overview and implementation roadmap