Collateral analysis
Loan tape fundamentals
Loan tape fundamentals
A loan tape is the foundation of every ABF transaction. Before you can assess credit quality, model cash flows, or negotiate terms, you need to understand what you’re working with. This page covers what loan tapes contain, how to interpret them, and the first checks to run when a new tape lands in your inbox.
What is a loan tape
A loan tape is a loan-level data export, typically delivered as a CSV, Excel file, or fixed-width text file. Each row represents one loan; each column represents an attribute. The tape is a snapshot of the portfolio at a specific point in time (the “tape date” or “cut-off date”).
Think of the loan tape as the source of truth for collateral analysis. Everything you calculate—stratifications, performance metrics, concentrations—derives from this data. If the tape is wrong, your analysis is wrong.
Common delivery formats
CSV files: Most common for automated delivery. Check for embedded commas in text fields, inconsistent quoting, and encoding issues (UTF-8 vs. ASCII). Large files may truncate or lose precision on loan IDs.
Excel files: Convenient but dangerous. Watch for truncated loan IDs (Excel converts long numbers to scientific notation), date formatting inconsistencies, and hidden rows or columns. Maximum row limits (1,048,576) can silently truncate large pools.
Fixed-width files: Common from legacy systems. Require a layout specification documenting field positions; even one character offset corrupts everything downstream. These demand more careful handling but rarely have formatting ambiguity issues.
Database exports / API delivery: Growing in prevalence. Large originators may provide tape access via SQL query or API rather than file export. This enables real-time access but requires technical integration.
Standard loan tape fields
While field names vary by originator and asset class, certain categories appear in virtually every tape.
Loan identification
| Field | Description | Example |
|---|---|---|
| Loan ID / Account number | Unique identifier for the loan | LN-2024-001234 |
| Borrower ID | May differ from loan ID for repeat borrowers | BR-98765 |
| Origination channel | How the loan was originated | Retail, wholesale, correspondent |
| Product type | The loan product | Term loan, revolving, balloon |
The loan ID is your primary key for all analysis. It must be unique and consistently formatted across monthly tape deliveries. When loan IDs change format (e.g., after a system migration), reconciliation becomes painful.
Balance and terms
| Field | Description | Notes |
|---|---|---|
| Original principal balance | Loan amount at origination | Should never change month to month |
| Current principal balance (UPB) | Outstanding balance as of tape date | Key metric for pool sizing |
| Interest rate | Note rate, not APR | Fixed or current rate for variable |
| Original term | Loan term in months | 12, 24, 36, 48, 60, 72, 84 typical |
| Remaining term | Months until maturity | Should decline each month |
| Origination date | When the loan was funded | Critical for vintage analysis |
| Maturity date | Scheduled payoff date | Should equal origination + term |
| Payment amount | Scheduled monthly payment | P&I for amortizing, may be interest-only |
Critical relationship: For amortizing loans, the relationship between balance, rate, term, and payment should be mathematically consistent. A 60-month, $25,000 loan at 8% should have a payment of approximately $507. If the tape shows $425, something is wrong.
Performance fields
| Field | Description | Notes |
|---|---|---|
| Payment status | Current, delinquent, default, paid off | Terminology varies by originator |
| Days past due (DPD) | Days since payment was due | 0 = current |
| Date of last payment | When borrower last paid | For projecting behavior |
| Cumulative principal paid | Total principal collected to date | For audit and reconciliation |
| Cumulative interest paid | Total interest collected to date | Should align with rate/time |
The status field and DPD field should be consistent. A loan showing “current” status but 45 DPD indicates either a data error or an unusual definition of “current” (some servicers call loans current until 60 DPD).
Borrower attributes
| Field | Description | Notes |
|---|---|---|
| Credit score at origination | FICO, VantageScore, or internal score | May also include current score |
| Income | Stated or verified income | May be missing for stated-income products |
| DTI | Debt-to-income ratio | Critical for payment affordability |
| Employment status | Employed, self-employed, retired | If captured |
| State / geography | Borrower location | Required for concentration analysis |
For consumer loans, the FICO score at origination is typically the most predictive field for default probability. Income and DTI matter for ability-to-pay assessment but are often less reliably captured.
Collateral attributes (secured products)
| Field | Description | Notes |
|---|---|---|
| Collateral type | Asset securing the loan | Vehicle, equipment, property |
| Collateral value at origination | Appraised or invoice value | Basis for LTV calculation |
| LTV at origination | Loan-to-value ratio | Higher LTV = more loss severity risk |
| Current LTV | Updated value if available | Rare outside mortgage |
| Collateral identifier | VIN, serial number, property address | For title/lien verification |
For auto loans, the collateral type (new vs. used, make/model) significantly affects depreciation and recovery values. Equipment loans depend heavily on whether collateral is titled (vehicles, heavy equipment) vs. soft (office equipment, technology).
The data dictionary
Always request the data dictionary alongside the tape. This document defines:
- Exact field names and their meanings: “STAT_CD” vs. “STATUS” vs. “PaymentStatus” could all mean the same thing
- Code values: What does “CU” mean? “30”? “PD”? Without documentation, you’re guessing
- Null/blank handling conventions: Does blank mean “not applicable,” “not available,” or “zero”?
- Date formats: MM/DD/YYYY vs. DD/MM/YYYY vs. YYYYMMDD
- Update frequency: Are balances updated monthly, weekly, or real-time?
Without a data dictionary, you’re reverse-engineering field meanings from context. A “status” field with values 1-6 could mean anything.
Building your own data dictionary
When the originator doesn’t provide documentation (common with smaller originators), build your own:
- List all fields with sample values
- Identify obvious meanings (any field called “balance” probably is one)
- Check distributions for categorical fields (what unique values exist?)
- Test numeric fields for reasonableness (are rates in decimal or percentage form?)
- Send the dictionary back to the originator for confirmation
This takes time but prevents misinterpretation downstream.
The first five checks
When a loan tape arrives, resist the urge to build elaborate models immediately. Run these five checks first. They take 30 minutes and tell you whether the tape is worth deeper analysis.
1. Data completeness check
Calculate the percentage of populated fields for each column. Focus on critical fields first:
| Field category | Critical fields | Acceptable missing % |
|---|---|---|
| Identity | Loan ID, Borrower ID | 0% |
| Balance | Original balance, Current balance | 0% |
| Terms | Interest rate, Original term, Origination date | <1% |
| Performance | Payment status, Days past due | <1% |
| Borrower | Credit score, Income (if underwritten) | <5% |
| Collateral | Property/asset details | <5% |
If critical fields show >5% missing data, stop and request a clean tape before proceeding.
Python example:
# Quick completeness check
df = pd.read_csv('loan_tape.csv')
completeness = df.notna().mean() * 100
print(completeness.sort_values())
2. Eligibility screening
Apply the basic eligibility criteria from the term sheet (or standard market criteria if you’re screening cold). What percentage of loans pass?
Example criteria for a consumer unsecured pool:
- Current balance > $0
- Remaining term > 0
- FICO at origination >= 600
- Not currently 60+ days delinquent
- Originated within last 36 months
If only 70% of loans pass basic eligibility, you’re analyzing a different pool than you thought. Clarify with the originator which loans are intended for the facility.
3. Balance distribution
Calculate basic statistics:
| Metric | Value | What it tells you |
|---|---|---|
| Total UPB | $50.0M | Pool size |
| Loan count | 2,000 | Granularity |
| Average balance | $25,000 | Typical loan size |
| Median balance | $18,000 | Central tendency |
| Min balance | $500 | Smallest loan |
| Max balance | $150,000 | Largest loan / concentration |
A large gap between average and median indicates skew. A $50M pool with 2,000 loans averaging $25K but median $8K likely has a few very large loans driving concentration risk. Those large loans warrant individual review.
4. Delinquency snapshot
Group loans by days past due (DPD) as of the tape date:
| DPD bucket | Loan count | UPB | % of pool |
|---|---|---|---|
| Current (0) | 1,700 | $42.5M | 85.0% |
| 1-29 DPD | 160 | $4.0M | 8.0% |
| 30-59 DPD | 80 | $2.0M | 4.0% |
| 60-89 DPD | 40 | $1.0M | 2.0% |
| 90+ DPD | 20 | $0.5M | 1.0% |
| Total | 2,000 | $50.0M | 100% |
This 7% total DQ rate (30+ DPD) would be concerning for prime auto (expect 2-3%) but normal for subprime consumer (expect 6-10%). Context matters.
5. Age distribution
Calculate weighted average seasoning and examine the vintage mix:
| Origination year | Loan count | UPB | % of pool | Avg seasoning (mo) |
|---|---|---|---|---|
| 2024 | 1,000 | $25M | 50% | 6 |
| 2023 | 600 | $15M | 30% | 18 |
| 2022 | 300 | $7.5M | 15% | 30 |
| 2021 | 100 | $2.5M | 5% | 42 |
A pool dominated by very recent originations (6 months or less) hasn’t experienced enough seasoning to demonstrate performance. You’re buying underwriting guidelines, not proven loans.
Format-specific handling
Excel pitfalls to avoid
Scientific notation: Excel converts loan IDs like “123456789012345” to “1.23457E+14”, losing precision. Import as text, not number.
Date ambiguity: Is “01/02/2024” January 2 or February 1? Check originator’s locale and verify against other date fields.
Hidden data: Check for hidden rows, columns, or sheets. Filter for hidden rows: right-click row headers, check for “Unhide” option.
Truncation: Excel’s row limit is 1,048,576. A 2 million loan tape silently drops half your data.
CSV best practices
Specify encoding when importing: UTF-8 is standard but not universal.
Check for embedded delimiters: If address fields contain commas, ensure proper quoting.
Validate row counts: Compare row count in import to row count reported by originator.
Fixed-width handling
Fixed-width files require precise layout documentation. A sample layout:
| Field | Start position | Length | Type |
|---|---|---|---|
| Loan ID | 1 | 15 | Text |
| Balance | 16 | 12 | Numeric (2 implied decimals) |
| Rate | 28 | 5 | Numeric (3 implied decimals) |
| … | … | … | … |
Without this layout, the file is unusable. If the layout document is missing or outdated, request it before proceeding.
Key takeaways
-
The loan tape is the foundation. Every analysis you perform derives from this data. Invest time understanding it.
-
Request the data dictionary. Field names vary wildly between originators. Don’t guess at meanings.
-
Run the first five checks before deep analysis. Completeness, eligibility, balance distribution, delinquency, and age distribution take 30 minutes and tell you if the tape is worth analyzing.
-
Format issues cause real problems. Excel truncates, CSV encoding breaks, fixed-width offsets corrupt. Handle each format appropriately.
-
Document what you learn. The mapping table and data notes you create become valuable institutional knowledge for subsequent tape reviews.