Why Loan Descriptions and Q&A matter

Recently Lending Club instituted a new policy concerning investor interactions with potential borrowers. Previously, investors could ask potential borrowers basically anything at all, and borrowers could respond at will. The new policy, presumably done to protect borrower privacy, restricts investor questions to a predetermined, limited, list.

Some investors are outraged. Why? Well, as Peter at Social Lending explained, some investors use Q&A to feel out potential borrowers. The goal of such careful vetting is to determine which borrowers are serious and which are likely to flake on the loan. Many investors used Q&A to explain the importance of Lending Club to investors, and to remind borrowers that investors at Lending Club are real people, not just a faceless bank. And it makes intuitive sense that a borrower that takes time to carefully describe their goals for a loan and detail their expenses will be a better bet than a borrower who acts disinterested or seems confused about their financial status.

So is there empirical evidence to support the idea that borrower descriptions of their loans matter? Lending Club loan data includes any description of the loan by the borrower. It looks like the data includes borrower responses to questions, but I am not certain about that. But there is enough data there for a rough analysis.

Comparing estimated percentage return for all loans, I can break out loans by the number of characters in the description.  A very short description indicates that the borrower did not really engage with investor questions or did not try to “advertise” the loan by offering a thorough description of the loan purpose.

Here is where things get interesting. Each circle represents a loan from the Lending Club data. Most loans result in a positive return on investment, so we see a cluster of loans at the top of this graph. But look at the clumping of loans along the left side. Loans that contain descriptions of less than, say, 1000 characters, are much more likely to show a negative return.

Another way to look at this is to take an arbitrary cutoff of characters. Say we reject all loans with less than 100 characters in the description and invest in all the rest. Is that return better, on average, than investing in all loans equally? The next two figures suggest that it is better to invest in loans with longer descriptions:

Plot of estimated percentage loan gains for all loans

The effect is somewhat subtle, because we are dealing with thousands of loans. But the result is strong: a shift in investment return from 5.12% (median) for all loans to 6.35% (median) for loans with longer descriptions. In fact, this indicator—longer descriptions—is one of the strongest indicators I have identified thus far for choosing good loans.

This data tells me that loans with more thorough descriptions are better investments, on average, than loans with little or no description. By limiting Q&A, Lending Club is restricting the one tool that investors have to seek out quality borrowers.

Posted in Uncategorized | 8 Comments

April 2011: Loans By Credit Letter Density Plots

In a previous post, I examined density plots of all Lending Club loan data as of April 1, 2011.  I used past loan data to estimate the rate of return for each previously available loan at Lending Club.  Now I want to do the same thing for subsets of loans.  Specifically, this post will examine subsets by credit letter.

First I looked at only loans graded “A” by Lending Club. The x axis represents the estimated percentage return on investment for each A grade loan. The y axis represents the number of loans with that estimated return, as a percentage of all A loans.

The median is 5.51% return. Notice that A loans seldom default, giving a positive return for almost every loan in the subset.  But there is a steep drop-off in return. The initial peak represents loans paid off early, and the second peak represents loans paid off much later, probably at the end of the 36-month loan term.

With B grade loans, you can begin to see the influence of additional defaults, with increased loans to the left of the red line (0% gain). Again, a double peak likely indicates a significant number of early-payment loans.

With C grade loans, less loans are paid off early than with A and B (no sharp peak to the right of the red line). More loans return higher gains (between 10 and 20%) but these are offset by additional defaults. The result is a median estimated gain of only 4.86%: less than either A or B loans.

D, E, F and G loans show greater dichotomy: a high proportion of these loans go bad (negative returns) while a significant number are paid off, with resulting high returns. This is good: if you believe you can pick winning loans, then these loan categories offer a promise of high returns if you successfully avoid loans that will end in default. And there are a significant number of lower grade loans that do not default.

Posted in Uncategorized | Leave a comment

Estimating the Probability of Default

With Lending Club, an investor can invest in a portfolio of loans.  But how will those loan perform, and which loans to pick?  In order to say anything meaningful about what loans to choose, we must first estimate how loans will do over time.  What percentage of loans will default?  What percentage of loans will get paid off in full?

Loans in repayment can transition through the following states:

  • Current,
  • In Grace Period,
  • Late (16-30 days),
  • Late (31-120 days),
  • Default,
  • Charged Off,
  • Fully Paid

From one month to the next, a loan could move, for example, from Current -> In Grace Period, or Current -> Fully Paid, or Late -> Current.  

When selecting any given loan, we want to know the probability that the loan will end up in one of two end states: Charged Off (bad loans) or Fully Paid (good loans).

The simplest approach would be to just count up the number of loans with those two end states.  So as of April 1, 2011, we have 23,007 total loans. Of those, 1152 have been Charged Off and 3436 have been Fully Paid.  So 25% of loans are Charged Off.  Not good. But wait, those Charged Off loans only represent 5% of our total loan data.  Many more loans have yet to be paid off, but we expect them to be paid or charged off eventually.

A better way is to look at the how the loans evolve over time.  Take a loan that is Current in February.  We want to know the probability that it will be Current in March.  Using snapshots of the loan data over time, we can construct a Transition Matrix.

Transition Matrix as of April 2011: All Loans

Transition Matrix for Apr 2011 - All Loans

How to interpret this matrix?  Well, if we select a loan that has a status of Current (left side), then it has a 94.84% chance of remaining current in the next month, and a 2.79% chance of being Fully Paid in the next month.  A loan with a status of In Grace Period has a 42.62% chance of returning to a Current status in the next month, and a 30.35% chance of moving to Late (31-120 days).  

Why such a low probability for moving from In Grace Period to Late (16-30 days)?  Because we are taking a snapshot only once per month.  So a loan that is In Grace Period in February will most likely either have a payment made within 30 days or become late by another 30 days.  30 days + days into the grace period means that the loan will be more than 30 days late by the next month, thus skipping the Late (16-30 days) category.  

Notice how late loans degrade quickly in future performance.  A loan In Grace Period has a 42.62% chance of going back to Current status in the next month.  But a Late (31-120 days) loan has only a 5.80% chance of going back to Current status in the next month.  Late loans tend to default.  Loans in Default either remain in Default or are Charged Off in the next month. 

Charged Off and Fully Paid are end states, represented by the 100% probability.  Once a loan enters one of those two states, it cannot return to any other state.  

This Transition Matrix allows us to create estimates of future loan performance for any given loan.  The example here is admittedly simple, and there are a lot of modifications that we might consider to make our estimate more sophisticated.  For example, we might want to make separate matrices for each Credit Letter: A loans through G loans.  We would expect to see a different set of probabilities for loans in different credit letter categories. 

Future posts will explore that possibility, and also go into more detail about how one constructs a transition matrix.  

Posted in Modeling, Modeling Loan Transitions | Tagged , , , , , | 2 Comments

April 2011: All Loans Density Plots

Let’s start off by looking at data for all existing loans at Lending Club. Specifically, the plots below are based on loan data downloaded on April 1, 2011 from Lending Club’s website. I will leave it to future posts to describe how I clean up and process the loan data that I download.

For our purposes, we can split existing loans by their status: Current, In Grace Period, Late (16-30 days), Late (31-120 days), Default, Charged Off, and Fully Paid.  Most of these categories are self-explanatory. “In Grace Period” means that the latest loan payment was not made on time, but the borrower is still within the grace period for payment. Loans enter “Default” before being “Charged Off.” There is a very, very small chance that loans in default return to current status, but the overwhelming majority will be charged off. “Charged Off” means that no further payments will occur, and the lender must consider any principal invested to be a total loss (minus, of course, any payments already received.)

I use past loan data to estimate the probability that a loan will transition between status categories. For example, their might be a 95% chance that a current loan remains current in the next month. These probabilities let me predict the rate of return for any given loan in Lending Club’s database of existing loans. Charged off or Fully Paid loans, of course, require no prediction because I can determine the total payments made on those loans. More on loan probabilities in later posts.

April 2011: All Loans

Plot of estimated percentage loan gains for all loans

This first plot estimates the percentage gain if one randomly chose a selection of loans from all the loans in Lending Club’s history.  The y-axis (Density) represents the number of loans, as a percentage of the total number of loans.  The x-axis (Percentage Gain), represents the estimated percentage gain or loss for each loan.  N is the total number of loans analyzed (23,007).

If you invested in every loan, the probability model estimates that you would have a median return of 5.12%.  The long tail on the left represents the small percentage of loans that eventually default.  Depending on the number of payments made before a particular loan defaults, one may lose 0% to 100% of their principal.

April 2011: Late Loans

Multiple plots for various late loans

Now let’s zoom in on the various late loans.  As one might expect, the model estimates increasing losses as a loan moves from Grace Period all the way to Default and Charged Off.  Take, for example, the Grace Period loans, which represent loans that, as of April 2011 data, are between 1 and 15 days late on payment.  The probability model estimates that these loans have a median percentage loss of approximately 20% (the peak of the curve).  This is because some of these loans will make additional payments, but overall, the probability model suggests that these loans will, on average, eventually default: thus, the total return is a negative percentage.  Of those loans that are currently registered as Default, the total return is approximately -75%, on average.  So, not surprisingly, we expect loans in Default to be much worse investments than loans currently in Grace Period.

April 2011: Finished Loans

Plot of Charged Off and Fully Paid loans

Finally, let’s look at loans that are “finished.”  That is, loans that have a status of either “Charged Off” (bad) or “Fully Paid” (good).  We have 4,588 loans with that status.  The left side of the density curve represents the fact that a small percentage of these loans default, and thus the investor would lose some percentage of his total principal (anywhere from 0% to 100% loss).  On the right side, we see the influence of paid off loans.  The double peak on the right likely represents the following scenario: some loans are paid off quickly, so the investor gets back her principal, but not much interest.  Then a second peak occurs, representing the larger median gains from loans that are not paid off extremely early.  

Note that no probability model is used in this last graph, because we do not have to estimate future payments.  The loans are finished, either charged off or fully paid.

The median percentage gain if you had invested in every loan that is now charged off or fully paid? 6.68%.  Not too bad, and if you exercised some discretion in loan selection, you might even too better.

Posted in All Loans, Density Plots | Tagged , , , | 1 Comment



This blog details my ongoing efforts to use computer modeling to assist me with loan investment choices at Lending Club.

For those who don’t know, Lending Club is a peer-to-peer lending site.  Individual investors can offer to partially fund loans, and individual borrowers can advertise their loan needs on the site.  Say you need $10,000 to install solar panels on your roof.  You fill out an application at Lending Club as a borrower, and Lending Club gives a rating for your loan application (A loans are lowest risk, G loans are highest risk).  Your loan is assigned an interest rate based on the credit letter rating.  Then investors choose to invest a minimum of $25 in your loan and are paid proportionally as you make loan payments.  Loans are either 3 or 5 years, fixed rate.

I encourage you to check out Lending Club’s website, as well as Prosper, another P2P lending site.

Since no bank acts as the lender, Lending Club can offer borrowers lower rates, while offering investors a higher rate of return.  (Currently, Lending Club is advertising an average of 9.65% returns.  I will explain why that is likely optimistic in a later post.)

From a modeling perspective, Lending Club offers an unparalleled set of lending data statistics.  Most importantly, their website contains links to download their entire loan data set as .csv or .xml files.  You can download past loan history as well as prospective loans available for investing.  This data is what I use to develop my models to hopefully assist with loan investing choices.

Future posts will discuss various aspects of the models and the loan data.  I am learning as I go, and I don’t claim any special insights into the proper way to model financial loan data.  I don’t have an economics degree and you should probably do your own research if you plan to invest in P2P lending.

All of the modeling is done using R, a free statistical software program.

Header image NASA/courtesy of nasaimages.org


Posted in Uncategorized | 2 Comments