This article is a work in progress. If you have any questions, thoughts, or corrections, contact us.

Collateral analysis

Loan tape fundamentals

Loan tape fundamentals

A loan tape is the foundation of every ABF transaction. Before you can assess credit quality, model cash flows, or negotiate terms, you need to understand what you’re working with. This page covers what loan tapes contain, how to interpret them, and the first checks to run when a new tape lands in your inbox.


What is a loan tape

A loan tape is a loan-level data export, typically delivered as a CSV, Excel file, or fixed-width text file. Each row represents one loan; each column represents an attribute. The tape is a snapshot of the portfolio at a specific point in time (the “tape date” or “cut-off date”).

Think of the loan tape as the source of truth for collateral analysis. Everything you calculate—stratifications, performance metrics, concentrations—derives from this data. If the tape is wrong, your analysis is wrong.

Common delivery formats

CSV files: Most common for automated delivery. Check for embedded commas in text fields, inconsistent quoting, and encoding issues (UTF-8 vs. ASCII). Large files may truncate or lose precision on loan IDs.

Excel files: Convenient but dangerous. Watch for truncated loan IDs (Excel converts long numbers to scientific notation), date formatting inconsistencies, and hidden rows or columns. Maximum row limits (1,048,576) can silently truncate large pools.

Fixed-width files: Common from legacy systems. Require a layout specification documenting field positions; even one character offset corrupts everything downstream. These demand more careful handling but rarely have formatting ambiguity issues.

Database exports / API delivery: Growing in prevalence. Large originators may provide tape access via SQL query or API rather than file export. This enables real-time access but requires technical integration.


Standard loan tape fields

While field names vary by originator and asset class, certain categories appear in virtually every tape.

Loan identification

FieldDescriptionExample
Loan ID / Account numberUnique identifier for the loanLN-2024-001234
Borrower IDMay differ from loan ID for repeat borrowersBR-98765
Origination channelHow the loan was originatedRetail, wholesale, correspondent
Product typeThe loan productTerm loan, revolving, balloon

The loan ID is your primary key for all analysis. It must be unique and consistently formatted across monthly tape deliveries. When loan IDs change format (e.g., after a system migration), reconciliation becomes painful.

Balance and terms

FieldDescriptionNotes
Original principal balanceLoan amount at originationShould never change month to month
Current principal balance (UPB)Outstanding balance as of tape dateKey metric for pool sizing
Interest rateNote rate, not APRFixed or current rate for variable
Original termLoan term in months12, 24, 36, 48, 60, 72, 84 typical
Remaining termMonths until maturityShould decline each month
Origination dateWhen the loan was fundedCritical for vintage analysis
Maturity dateScheduled payoff dateShould equal origination + term
Payment amountScheduled monthly paymentP&I for amortizing, may be interest-only

Critical relationship: For amortizing loans, the relationship between balance, rate, term, and payment should be mathematically consistent. A 60-month, $25,000 loan at 8% should have a payment of approximately $507. If the tape shows $425, something is wrong.

Performance fields

FieldDescriptionNotes
Payment statusCurrent, delinquent, default, paid offTerminology varies by originator
Days past due (DPD)Days since payment was due0 = current
Date of last paymentWhen borrower last paidFor projecting behavior
Cumulative principal paidTotal principal collected to dateFor audit and reconciliation
Cumulative interest paidTotal interest collected to dateShould align with rate/time

The status field and DPD field should be consistent. A loan showing “current” status but 45 DPD indicates either a data error or an unusual definition of “current” (some servicers call loans current until 60 DPD).

Borrower attributes

FieldDescriptionNotes
Credit score at originationFICO, VantageScore, or internal scoreMay also include current score
IncomeStated or verified incomeMay be missing for stated-income products
DTIDebt-to-income ratioCritical for payment affordability
Employment statusEmployed, self-employed, retiredIf captured
State / geographyBorrower locationRequired for concentration analysis

For consumer loans, the FICO score at origination is typically the most predictive field for default probability. Income and DTI matter for ability-to-pay assessment but are often less reliably captured.

Collateral attributes (secured products)

FieldDescriptionNotes
Collateral typeAsset securing the loanVehicle, equipment, property
Collateral value at originationAppraised or invoice valueBasis for LTV calculation
LTV at originationLoan-to-value ratioHigher LTV = more loss severity risk
Current LTVUpdated value if availableRare outside mortgage
Collateral identifierVIN, serial number, property addressFor title/lien verification

For auto loans, the collateral type (new vs. used, make/model) significantly affects depreciation and recovery values. Equipment loans depend heavily on whether collateral is titled (vehicles, heavy equipment) vs. soft (office equipment, technology).


The data dictionary

Always request the data dictionary alongside the tape. This document defines:

  • Exact field names and their meanings: “STAT_CD” vs. “STATUS” vs. “PaymentStatus” could all mean the same thing
  • Code values: What does “CU” mean? “30”? “PD”? Without documentation, you’re guessing
  • Null/blank handling conventions: Does blank mean “not applicable,” “not available,” or “zero”?
  • Date formats: MM/DD/YYYY vs. DD/MM/YYYY vs. YYYYMMDD
  • Update frequency: Are balances updated monthly, weekly, or real-time?

Without a data dictionary, you’re reverse-engineering field meanings from context. A “status” field with values 1-6 could mean anything.

Building your own data dictionary

When the originator doesn’t provide documentation (common with smaller originators), build your own:

  1. List all fields with sample values
  2. Identify obvious meanings (any field called “balance” probably is one)
  3. Check distributions for categorical fields (what unique values exist?)
  4. Test numeric fields for reasonableness (are rates in decimal or percentage form?)
  5. Send the dictionary back to the originator for confirmation

This takes time but prevents misinterpretation downstream.


The first five checks

When a loan tape arrives, resist the urge to build elaborate models immediately. Run these five checks first. They take 30 minutes and tell you whether the tape is worth deeper analysis.

1. Data completeness check

Calculate the percentage of populated fields for each column. Focus on critical fields first:

Field categoryCritical fieldsAcceptable missing %
IdentityLoan ID, Borrower ID0%
BalanceOriginal balance, Current balance0%
TermsInterest rate, Original term, Origination date<1%
PerformancePayment status, Days past due<1%
BorrowerCredit score, Income (if underwritten)<5%
CollateralProperty/asset details<5%

If critical fields show >5% missing data, stop and request a clean tape before proceeding.

Python example:

# Quick completeness check
df = pd.read_csv('loan_tape.csv')
completeness = df.notna().mean() * 100
print(completeness.sort_values())

2. Eligibility screening

Apply the basic eligibility criteria from the term sheet (or standard market criteria if you’re screening cold). What percentage of loans pass?

Example criteria for a consumer unsecured pool:

  • Current balance > $0
  • Remaining term > 0
  • FICO at origination >= 600
  • Not currently 60+ days delinquent
  • Originated within last 36 months

If only 70% of loans pass basic eligibility, you’re analyzing a different pool than you thought. Clarify with the originator which loans are intended for the facility.

3. Balance distribution

Calculate basic statistics:

MetricValueWhat it tells you
Total UPB$50.0MPool size
Loan count2,000Granularity
Average balance$25,000Typical loan size
Median balance$18,000Central tendency
Min balance$500Smallest loan
Max balance$150,000Largest loan / concentration

A large gap between average and median indicates skew. A $50M pool with 2,000 loans averaging $25K but median $8K likely has a few very large loans driving concentration risk. Those large loans warrant individual review.

4. Delinquency snapshot

Group loans by days past due (DPD) as of the tape date:

DPD bucketLoan countUPB% of pool
Current (0)1,700$42.5M85.0%
1-29 DPD160$4.0M8.0%
30-59 DPD80$2.0M4.0%
60-89 DPD40$1.0M2.0%
90+ DPD20$0.5M1.0%
Total2,000$50.0M100%

This 7% total DQ rate (30+ DPD) would be concerning for prime auto (expect 2-3%) but normal for subprime consumer (expect 6-10%). Context matters.

5. Age distribution

Calculate weighted average seasoning and examine the vintage mix:

Origination yearLoan countUPB% of poolAvg seasoning (mo)
20241,000$25M50%6
2023600$15M30%18
2022300$7.5M15%30
2021100$2.5M5%42

A pool dominated by very recent originations (6 months or less) hasn’t experienced enough seasoning to demonstrate performance. You’re buying underwriting guidelines, not proven loans.


Format-specific handling

Excel pitfalls to avoid

Scientific notation: Excel converts loan IDs like “123456789012345” to “1.23457E+14”, losing precision. Import as text, not number.

Date ambiguity: Is “01/02/2024” January 2 or February 1? Check originator’s locale and verify against other date fields.

Hidden data: Check for hidden rows, columns, or sheets. Filter for hidden rows: right-click row headers, check for “Unhide” option.

Truncation: Excel’s row limit is 1,048,576. A 2 million loan tape silently drops half your data.

CSV best practices

Specify encoding when importing: UTF-8 is standard but not universal.

Check for embedded delimiters: If address fields contain commas, ensure proper quoting.

Validate row counts: Compare row count in import to row count reported by originator.

Fixed-width handling

Fixed-width files require precise layout documentation. A sample layout:

FieldStart positionLengthType
Loan ID115Text
Balance1612Numeric (2 implied decimals)
Rate285Numeric (3 implied decimals)

Without this layout, the file is unusable. If the layout document is missing or outdated, request it before proceeding.


Key takeaways

  1. The loan tape is the foundation. Every analysis you perform derives from this data. Invest time understanding it.

  2. Request the data dictionary. Field names vary wildly between originators. Don’t guess at meanings.

  3. Run the first five checks before deep analysis. Completeness, eligibility, balance distribution, delinquency, and age distribution take 30 minutes and tell you if the tape is worth analyzing.

  4. Format issues cause real problems. Excel truncates, CSV encoding breaks, fixed-width offsets corrupt. Handle each format appropriately.

  5. Document what you learn. The mapping table and data notes you create become valuable institutional knowledge for subsequent tape reviews.