Estimating the Probability of Default

With Lending Club, an investor can invest in a portfolio of loans.  But how will those loan perform, and which loans to pick?  In order to say anything meaningful about what loans to choose, we must first estimate how loans will do over time.  What percentage of loans will default?  What percentage of loans will get paid off in full?

Loans in repayment can transition through the following states:

  • Current,
  • In Grace Period,
  • Late (16-30 days),
  • Late (31-120 days),
  • Default,
  • Charged Off,
  • Fully Paid

From one month to the next, a loan could move, for example, from Current -> In Grace Period, or Current -> Fully Paid, or Late -> Current.  

When selecting any given loan, we want to know the probability that the loan will end up in one of two end states: Charged Off (bad loans) or Fully Paid (good loans).

The simplest approach would be to just count up the number of loans with those two end states.  So as of April 1, 2011, we have 23,007 total loans. Of those, 1152 have been Charged Off and 3436 have been Fully Paid.  So 25% of loans are Charged Off.  Not good. But wait, those Charged Off loans only represent 5% of our total loan data.  Many more loans have yet to be paid off, but we expect them to be paid or charged off eventually.

A better way is to look at the how the loans evolve over time.  Take a loan that is Current in February.  We want to know the probability that it will be Current in March.  Using snapshots of the loan data over time, we can construct a Transition Matrix.

Transition Matrix as of April 2011: All Loans

Transition Matrix for Apr 2011 - All Loans

How to interpret this matrix?  Well, if we select a loan that has a status of Current (left side), then it has a 94.84% chance of remaining current in the next month, and a 2.79% chance of being Fully Paid in the next month.  A loan with a status of In Grace Period has a 42.62% chance of returning to a Current status in the next month, and a 30.35% chance of moving to Late (31-120 days).  

Why such a low probability for moving from In Grace Period to Late (16-30 days)?  Because we are taking a snapshot only once per month.  So a loan that is In Grace Period in February will most likely either have a payment made within 30 days or become late by another 30 days.  30 days + days into the grace period means that the loan will be more than 30 days late by the next month, thus skipping the Late (16-30 days) category.  

Notice how late loans degrade quickly in future performance.  A loan In Grace Period has a 42.62% chance of going back to Current status in the next month.  But a Late (31-120 days) loan has only a 5.80% chance of going back to Current status in the next month.  Late loans tend to default.  Loans in Default either remain in Default or are Charged Off in the next month. 

Charged Off and Fully Paid are end states, represented by the 100% probability.  Once a loan enters one of those two states, it cannot return to any other state.  

This Transition Matrix allows us to create estimates of future loan performance for any given loan.  The example here is admittedly simple, and there are a lot of modifications that we might consider to make our estimate more sophisticated.  For example, we might want to make separate matrices for each Credit Letter: A loans through G loans.  We would expect to see a different set of probabilities for loans in different credit letter categories. 

Future posts will explore that possibility, and also go into more detail about how one constructs a transition matrix.  

This entry was posted in Modeling, Modeling Loan Transitions and tagged , , , , , . Bookmark the permalink.

2 Responses to Estimating the Probability of Default

  1. melondonkey says:

    So just to clarify, this transition matrix was constructed using multiple months or just February to March? I ask because there could be seasonal factors influencing the matrix (tax returns, for example). I’m going to start archiving data so I can have a better view of loan performance with monthly snapshots of performance.

    • Rhead Enion says:

      This transition matrix was constructed using multiple months. All months where I had loan data (which spanned more than 1 year’s worth).

      You are probably correct that seasonal factors could influence the matrix. I would imagine that there are two types of seasonal factors:
      1. Vintage: The month / year that the loan was approved could matter. For example, people may be more desperate for a loan (and thus less able to pay it back) during bad economic times.
      2. Seasonal: As you mention, tax returns or after-holiday bills could influence payments on a monthly basis.

      Age of loan is most certainly another factor that you can estimate given enough data. New loans tend to be paid off or default at higher rates initially. Defaults tend to take awhile to percolate through, so older vintage loans see a higher default rate than younger loans.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s