Monday, October 31, 2011

Prosper Loan Types vs. Returns

Thought I'd look at ROI by loan type on Prosper, as well. Once again, these numbers are from Lendstats.com.


And, for comparison, the Lending Club data I posted a few days ago.


Sunday, October 30, 2011

Story Telling

A very interesting article about the effects of story telling on loan rates during the Prosper 1.0 time period out of the University of Delaware and Rice University.

They analyzed each description in terms of the identities it portrayed, including being trustworthy, successful, hardworking, moral, religious or having economic hardship.

They found that the more identities portrayed, the less likely a loan was to default:
The more identities the borrowers constructed, the more likely lenders were to fund the loan and reduce the interest rate but the less likely the borrowers were to repay the loan – 29 percent of borrowers with four identities defaulted, where 24 percent with two identities and 12 percent with no identities defaulted.

This seems in line with the data I've seen in my Needs Series. It really does seem like we might be able to pull out useable information for determining future defaults from a Loan's Title and Description.

Thanks to lovinglifestyle on the lendstats forum for publishing the link.

Thursday, October 27, 2011

So many new things...

Feels like Christmas morning at Prosper... I see:


  • Automated Quick Invest
  • A view of their annualized return calculation
  • Pretty, personalized charts
  • Lender promotion percentages
Now we spend the rest of the day playing. :)

Tuesday, October 25, 2011

More Loan Types vs. Returns

Expanding on my charts from yesterday:

This is all ROIs for Lending Club, split by loan type, for loans starting between 2008 and 2010. It looks to me like Credit Card and Debt Consolidation types were the only types consistently above average. Everything else looks to have bounced around:



Monday, October 24, 2011

Loan Type vs. Returns

One of the things that struck me about Ken's blog post that I referenced yesterday when I looked at Inquiries vs. Returns was that some loan types performed so much more poorly than others. I know when I first read it I thought it seemed unlikely. Indeed, even now my investing criteria don't filter by any particular loan type.

In his post, he saw that credit card loans had the highest ROI, just over 5%. Business loans had a low ROI just under -3% and education loans had the worst return on investment at under -4%.

I looked at Lending Club loans issued from 2008 through today to see if this trend continued or was just random noise and found these results:


It's worth noting that there are no new Educational Loans for 2011. But, indeed, year over year we see educational (blue) and business (red) loans performing more poorly than the average loan (grey) and Credit Card loans performing better than average.

At the moment it looks like the ROI is converging, but this could be due to the recency of many 2011 loans. It will be interesting to see a similar graph a year from now and compare.

Saturday, October 22, 2011

Inquiries vs. Returns

Just over a year ago Ken from lendstats.com made a graph showing Returns vs. Credit grade (among other interesting graphs.) I found this fascinating and since I read it I have started investing in listings with only small numbers of inquiries.

I wanted to see if this trend is continuing so, using lendstats' complete performance breakdown I put together the following graphs for both Prosper and Lending Club:






To me it looks like, in general, the trend is holding true. Loans with more inquiries appear to have a lower ROI than loans with fewer inquiries. There may be some bobbling at 5+ inquiries but there seems to be much more variation. Indeed, looking at these four graphs the deviation for the 0-inquiry loans is always the smallest and, as we move to more inquiries, the deviation seems to get larger and larger.

It's also nice to see, in Lending Club loans, how the 0-inquiry loans appear to gain a 1%-2% ROI each year that passes for the past 4 years. Looking at the spread between all loans and D's and below, it looks like much of that gain has come from the lower-rated loans.

Wednesday, October 5, 2011

AI Series: The Wrong Way?

Up until now I've been imagining a computer learning algorithm to determine whether or not I should invest in any given loan. I'm starting to wonder if that's the best question I can be asking.

I'm a small lender, without a lot of money invested into either Prosper or Lending Club. As my loans are paid off, I reinvest that money in more loans. I wonder if a better question for an AI for a lender like me would be:

Given that I have $200 to invest and these are the open loan requests, which requests will likely yield the highest returns on my investment.

Of course, if I look at the problem this way, there still needs to be a way for the program to determine that my money is better left uninvested to wait for another set of listings.


Incidentally, for those interested in Machine Lending, Smart Peer Lending is doing a series on their own efforts to make a Recommendation Engine. Very interesting stuff.

Monday, September 26, 2011

AI Series: A Classification Problem

In a previous post on Machine Lending I mentioned that I'd be taking the free Machine Learning course offered by a Stanford professor. The first lectures are now available online and I continue to think about how one would write a program to determine which loans to invest in and which to avoid.

From the lecture:
A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
If my understanding is correct, Machine Lending software would have:

  • Task T = Advising the user whether or not to invest in a listing
  • Experience E = The backlog of published data about listings, their Payoff rates, etc.
  • Performance Measure P = The total return from investing in a loan (more detail in my Performance Measure post)

It would fall into the category of Supervised Learning since there is a "correct" answer on whether or not one should have invested in a listing (as measured by Performance Measure P.) And it would fall into the "Classification Problem" subset of supervised learning problems.

(Now, you might be able to change this into a regression problem if you changed the question from "Should I invest in this listing or shouldn't I?" to "How much should I invest in this listing?" But that's a topic for another post.)

Sunday, September 25, 2011

Needs Series: What We Fund

Continuing on with my Needs Series, and with ideas from the graphs of my look at Loan Description Length, let's take a look at what lenders are actually funding:

First let's look all of the 2009 listings on Prosper (the blue line below.) We can see that a bit over 25% of listings had the word "need" in the title or description for most Prosper Scores. (AAs and As were a bit lower, but still over 20%.)

Of those listings that went on to be funded (the red line), we can see they were close to the same percentage as that of the listings for AA-C scores, with D and E loans having a much higher percentage of loans with the word "need" than was in the listings and then a tumbling off to almost nothing for the HRs.



We now look at the data for Prosper's 2010 listings:


To me they look neck-and-neck here. It looks to me like, in general, lenders are not paying attention to whether or not a listing has "need" in the title or description when they are choosing to fund the loan.

This may be unfortunate for lenders because it looks to me like use of the word "need" is correlated with loans which are not paid back.



All Articles in the Needs Series
An Introduction
Initial Findings
Correlation Matrix
Comparing to Lending Club
What We Fund

Sunday, September 18, 2011

Prosper's Top Tips for Borrowers

On Friday Prosper blogged their Top-Tips for Borrower Success. Among the highlights is the following tip:
Write a thorough loan purpose and description.
This one strikes true, especially after my post the prior Tuesday. Allow me to post the graph of my 2010 findings for loan description length again:


Borrowers: it looks like you are already writing longer descriptions when you have a lower Prosper rating (the blue line) which, in my opinion, is a good thing. When we take a look, though, we see that Lenders are funding loans which have even higher character counts than average (the red line.) Indeed, at a B grade and lower it appears that loans that are funded have about 50 more characters than the average request. And HR Grade loans are funded, on average, with even more characters than that. (The aforementioned post from Tuesday shows similar results for 2009 and Pre-2008 requests.)

The data is correlational, not causal, so I can't say that having a longer loan description will make it more likely to get funded -- but it certainly doesn't hurt. Plus, from personal experience, I'd rather fund a loan when I have a clear idea of how my money will be used (and repaid.)

Thursday, September 15, 2011

Bad And Good Words, Another Perspective

Nickel Steamroller followed up with Isepankur (I believe from his initial comment on my post Needs Series: Comparing To Lending Club) and found similar results to what we've been seeing.

Among their findings: loans mentioning payday have a very low ROI. Loans mentioning a steady job or long employment perform much better. Go take a look. Their table is even sortable, which is awesome.

Tuesday, September 13, 2011

Loan Description Length: More Recent Loans

On Sunday I showed findings that, at least for Prosper loans before 2008, lower rated borrowers tend to write longer descriptions for their listings.


From this graph I saw three things:

  1. The lower the credit rating the longer the loan description a borrower writes. (The blue line.) 
  2. Lenders tend to fund loans with higher than average character counts. (The red line.)
  3. Of those loans that are funded, the ones that went on to become Paid tended to have fewer characters than the average funded loan. (The grey line.)

Let's see if these trends hold up for more recent listings. First, we'll look at the number of characters in the listing description, by credit grade, for Listings in 2009:



And now for 2010:


Let's revisit the three conclusions:

  • The lower the credit rating the longer the loan description a borrower writes. (The blue line.) 

This seems even more true now than it did before. Before 2008 there was a peak in characters at around the D rating and then a decline. In 2009 and 2010 we see some very small drops from one credit grade to the next, but for the most part borrowers with a worse credit grade write longer descriptions than borrowers with better credit grades.
  •  Lenders tend to fund loans with higher than average character counts. (The red line.)
This continues to hold true. We still don't know if lenders are funding loans because they have longer descriptions or because long descriptions correlate with some other factors that affect default rate beyond the Prosper rating (low delinquencies, few inquiries, etc) but longer descriptions definitely get funded more.
  • Of those loans that are funded, the ones that went on to become Paid tended to have fewer characters than the average funded loan. (The grey line.)
This does not seem to remain true. It sure looks to me like the Paid/Current descriptions have roughly the same number of characters as the average funded listing. At this point I would posit that this conclusion from my post on Sunday is false.

Prosper Adds New Verification Stage To Listings

Prosper was unavailable for a long time last night, and now we see why. They've added a new verification stage progress indicator to their listings (at the far right.)


Along with a tooltip describing what the stage means.


Way to go, Prosper! More information is always a good thing.

Edit: Prosper has a page up detailing the new feature here.

Sunday, September 11, 2011

Loan Description Length: By Credit Grade

Last week I took a look at how the length of a loan description affects payoff rates with posts about both Lending Club and Prosper.

Today I wanted to dig a bit deeper into the indicated trend: loans with longer descriptions are less likely to be Paid or Current. To begin, let's look at the trend we saw from Prosper Loans initiated before 2008:



The trend is pretty clear cut. Let's step back for a minute, though, and show the number of characters in each description for each credit grade:


From this graph we see three things:
  1. The lower the credit rating the longer the loan description a borrower writes. (The blue line.) 
  2. Lenders tend to fund loans with higher than average character counts. (The red line.)
  3. Of those loans that are funded, the ones that went on to become Paid tended to have fewer characters than the average funded loan. (The grey line.)

Looking at these trends it would seem that, when a credit score is lower, lenders choose to fund loans with longer descriptions. This, very likely, explains my findings from last week. I'll continue to explore on Tuesday with a post examining Prosper's 2009 and 2010 loans.

Thursday, September 8, 2011

Loan Description Length: Prosper

Inspired by a post by Smart Peer Lending, on Tuesday I looked at how the length of a loan's description compared to whether or not the loan was currently Current or had ended with a Paid status for Lending Club loans.

What I found was the opposite of what I expected: loans with shorter descriptions appeared to be Paid or Current more often than loans with longer descriptions. In this post I'll take a look at loans from Prosper and see if the numbers agree.

Pre-2008 Loans (Raw data below.)

Amazingly we see almost a smooth decline between loan groups and percent of loans Paid. I looked, further, at 2008 and Later loans (many of which are still under way) and saw the following results:

Loans from 2008 and Later (Paid and Current) (Raw data below.)

Loans from 2008 and Later (Defaulted and Charged Off) (Raw data below.)

It's easiest to compare the Default and Charge-Off data, which is, again, smaller for less lengthy descriptions and larger for more lengthy descriptions. However, looking at the Paid and Current chart we see that shorter descriptions have fewer Paid loans and more Current loans. This could mean that newer loans have shorter descriptions and older loans have longer descriptions--at least in loans since 2008.


I wanted to go a little farther with these sets of loans, so I further divided the loans between credit grades. I looked at AA, A and B as a set of loans and C, D and E as a set of loans (once again using completed loans made before 2008) and found the following results:

Pre-2008 Loans By Credit Grade (Raw data below.)



For both groups we see the same thing: the fewer characters in a listing, the more likely it was to finish off having Paid.

Well this is certainly an unexpected result. It's worthwhile to keep in mind that these are all loans that funded. It could be that this is a characteristic of Lenders choosing loans with short descriptions only if all other characteristics look good.

But the more I look at this data, and the results of words like "need" and "help", the more it would seem that writing a description is more of a detriment to a borrower than not writing one.

Update: There seems to be a very strong correlation between credit grade and the count of characters in the description. Details are available in my post Loan Description Length By Credit Grade.



Pre-2008 Loans
DescriptionTotal LoansPaidCurrentRecoveredNever Recovered
Pre 2008 Loans, 0-500 character description194870%0%70.8%29.1%
Pre 2008 Loans, 501-1000 character description378864%0%65.4%34.6%
Pre 2008 Loans, 1001-1500 character description343162.7%0%64%35.9%
Pre 2008 Loans, 1501-2000 character description235960.5%0%61.8%38.2%
Pre 2008 Loans, 2001-2500 character description178255.3%0%56.9%43.1%
Pre 2008 Loans, 2501-3000 character description127954.2%0%56.5%43.4%
Pre 2008 Loans, 3001 and greater character description270351.5%0%52.5%47.4%

Loans from 2008 and Later
DescriptionTotal LoansPaidCurrentRecoveredNever Recovered
2008 And Later Loans, Description 0-749 Characters807724.7%62.3%24.9%10.5%
2008 And Later Loans, Description 750-1250 Characters776434.5%45.5%34.9%17.1%
2008 And Later Loans, Description 1250 Characters or More718644.3%28.4%44.9%24.5%

Pre-2008 Loans By Credit Grade
DescriptionTotal LoansPaidCurrentRecoveredNever Recovered
Pre 2008 Loans, AA, A, B All607875.4%0%76.7%23.2%
Pre 2008 Loans, AA, A, B 1249 or fewer character description323178.6%0%79.7%20.3%
Pre 2008 Loans, AA, A, B 1250 or more character description278871.6%0%73.3%26.6%
Pre 2008 Loans, C, D, E All868657.3%0%58.5%41.4%
Pre 2008 Loans, C, D, E 1249 or fewer character description363058.8%0%60%39.9%
Pre 2008 Loans, C, D, E 1250 or more character description497556.2%0%57.4%42.6%

Tuesday, September 6, 2011

Loan Description Length: Lending Club

In a post introducing their new Loan Analyzer, Smart Peer Lending writes that they've added a new Loan Description Length search filter:
Loan Desc Length : One possible feature useful for selecting loans is the length of the description field entered by the borrower. Justification being that borrowers who don't take the time to enter in anything may prove to be a higher risk.
I agree. It makes sense that borrowers who write longer descriptions are ones who care more about their loans and, therefore, are more likely to pay them back. Today I'll look at the data for loan description length on Lending Club and in a few days I'll look at the loan description length on Prosper.


Data Set: Lending Club loans made before the start of 2009

Let's begin by looking at the ROI results using Smart Peer Lending's Loan Analyzer:

(Data is presented in Table Form at the end of the post under the title Lending Club ROI.)

What's interesting is that we see ROI swell in the 101-500 character description range and then drop off significantly after 500 characters--the opposite of what I'd expect. Now I take data from my personal analysis tool and find:

(Data is presented in Table Form at the end of the post under the title Lending Club Percent Paid.)


Wow! The longer the description, the less likely a loan is to be Good (defined as Status = Paid or Current.) This is exactly the opposite of what I was expecting. I suppose that it could be that the longer a loan request is, the more the borrower feels the lender needs to be talked into funding a risky loan request.

On Thursday I'll post an analysis of data from Prosper to see if the same thing holds true with loans made over there. I'll be looking at a broader range of loans and breaking them down into higher-rated and lower-rated loans to see if those categories make a difference.


Update: There seems to be a very strong correlation between credit grade and the count of characters in the description. Details are available in my post Loan Description Length By Credit Grade.



Lending Club ROI
DescriptionTotal LoansSmart Peer Lending ROI
Pre 2009, All29980.72%
Pre 2009, 0-100 Character Description10160.83%
Pre 2009, 101-500 Character Description13531.04%
Pre 2009, 501 Characters And Longer629-0.16%

Lending Club Percent Paid
DescriptionTotal LoansPercent GoodPercent BadFully PaidCurrentCharged OffDefault
Pre 2009, All299677.2%21.8%66.2%11.1%21.1%.2%
Pre 2009, 0-100 Character Description100178.6%20.6%70.2%8.4%20%.1%
Pre 2009, 101-500 Character Description135077.1%21.9%64.6%12.5%21%.2%
Pre 2009, 501 Characters And Longer63075.2%23.7%62.7%12.5%23%.2%

Saturday, September 3, 2011

Bad and Good Words Revisited

In a post the prospers.org forums, user havastat recommended looking at the listings for good and bad words, randomly dividing them, and seeing if they come out similarly. If they don't, there's a good chance that the findings were random. If they do, the findings are more likely to be relevant.

Percent Paid (By Loan): Is the percent of loans, containing the indicated word at least once, which finished with a status Paid.
Percent Paid (By Word): Is the percent of time that a loan ended with the status paid, weighted by the frequency of the word in the listing. (For example, a loan with a title "Help, help, help, help!" which did not pay would count four times more than a loan with "Help" listed only once.)
Word Count: The number of listings containing the word at least once. (Notably not the total number of times the word was used--the maximum here is once per listing.)

Like in the original posts, these are words from Prosper loans that were created before 2008. My methodology is at the bottom of the post, but loans were assigned to groups randomly and there were 8728 loans in each group.

Group 1 Worst Performing Words
WordPercent Paid (By Loan)Percent Paid (By Word)Word Count
[average Paid]61.1%
payday38.9%38.6%596
behind42.9%43.6%592
mother43.5%44.8%566
chance44.5%42.1%631
track46.8%45.6%581
son47.1%44.8%597
daughter48.1%46.3%516
child48.7%47.9%520
husband49%51.3%896
single49.5%49.7%707

Group 2 Worst Performing Words
WordPercent Paid (By Loan)Percent Paid (By Word)Word Count
[average Paid]59.6%
payday37.5%39.2%595
behind42.4%41.3%566
chance43.5%41.5%575
son45.7%44.2%514
mother46.6%46.7%601
children47%45.6%854
daughter47.7%44.6%539
DELETED47.7%46.6%507
child47.8%46.3%552
3000048.3%47.6%532

So, as with the original Words of Loss post, we see the word 'payday' at the bottom, with the words 'behind', 'chance' and then family words like 'mother', 'child', etc. to be on the bottom for both groups.


Now let's take a look at the best performing words:


Group 1 Best Performing Words
WordPercent Paid (By Loan)Percent Paid (By Word)Word Count
[average Paid]61.1%
tax67.1%66.7%504
early67.2%67.7%534
rate67.6%68.6%1952
term67.6%66.6%509
risk67.8%70.3%565
fund68.2%70.2%666
rates68.3%68.8%609
lender68.3%70.2%707
minimum68.4%68.4%583
investment69.1%69.3%679

Group 2 Best Performing Words
WordPercent Paid (By Loan)Percent Paid (By Word)Word Count
[average Paid]59.6%
risk64.2%66.1%592
card64.2%65.9%2910
higher64.4%63.2%765
style64.5%58.8%968
span64.9%59.5%1069
don't65%64.3%861
rate65.2%65.3%1938
student66%66.3%1078
lender66.2%67.5%754
I've66.2%63.6%888

It's interesting to see that lending words appear on both of these lists -- but there are fewer matches than the worst performing words. It looks like we've got 'risk', 'rate(s)' and 'lender' as matches but all of these are still much closer to the average paid than the worst performing words.

It could be that we will find that we can only tell if a loan is more likely to fail from the words that it uses, not that a loan is more likely to succeed.

Methodology:

Similar to the methodology I used in the previous two studies, I began with all Pre-2008 Prosper Loans.

I then placed all the loans in a random order and assigned them to Group 1 or Group 2 sequentially. From there I built a list of all the words in the title and body of the listing for those loans, tallying the number of times the word was used in each loan.

To come up with the Percent Paid (By Loan) I divided the number of loans with that word that finished with a status Paid by the number of loans with that word in total.

Tuesday, August 30, 2011

The Methodology Behind Words of Loss and Words of Win

Last month I posted a list of words on Prosper which, when used in a listing which successfully became a loan, were more successful than average and those that were less successful than average.

A comment from havastat on the prospers.org forum made me realize that I had neglected to talk about the methodology used to find these words. It is as follows:


For every loan created successfully on Prosper before the end of 2007 I created a list of all of the words used in the Title and Description of the loan. For every instance of a word in a loan that was Paid I added 1 to a running total of PaidInstances. For every instance of a word in a loan that had any other status I added 1 to a running total of UnpaidInstances.

I then calculated the percentage for the word with the formula:
PaidInstances / (PaidInstances + UnpaidInstances)
(Which is to say: PaidInstances / TotalWordUsage)

I reduced the list to words which had been used at least 1000 in the loan set and sorted it from words that were most often in Paid loans to words that were least often in Paid loans and compared that list to the overall likelihood of any loan to be paid back.

I found the word 'lender' at the top, with loans containing the word having been Paid 68.96% of the time. I found the word 'payday' at the bottom, with loans containing the word having been Paid only 38.89% of the time. (This compared to an average Paid percentage, across all loans, of about 61% for this time period.)

Now I think that there is an argument to be made that it would have been better to count each word a maximum of once for each listing -- what I did measures use of the word itself, more than it measures the use of the word in the listing ("help, help, help, help!" in one listing counts 4 times, instead of just once), but I think that the best choice really depends on what you're trying to do with the information.

Sunday, August 28, 2011

An initial investment in Lending Club

I've been a Prosper lender for years but I've been thinking about branching out to lending on Lending Club, as well. Since I've been looking at words which have performed poorly on Prosper I wanted to extend that search to Lending Club, as well.

What follows is Lending Club data taken from about two months ago. For D, E, F and G loans we can see that the words used seem to correlate with the loans similarly on both sites:

DescriptionTotal LoansPercent GoodPercent BadFully PaidCurrentCharged OffDefault
D,E,F,G All884081.1%9.4%15.5%65.6%7.5%.1%
D,E,F,G With 'need'145577.9%16.2%21.6%56.3%13.8%.1%
D,E,F,G With 'help'142280.3%11.3%18.9%61.4%9.3%.4%
D,E,F,G With 'chance'8483.3%7.1%22.6%60.7%6%0%
D,E,F,G With 'behind'6578.5%16.9%18.5%60%15.4%0%
D,E,F,G With 'payday'1241.7%58.3%25%16.7%58.3%0%

'Need', 'help', 'behind', and 'payday' all have fewer Good loans (defined as 'Fully Paid' and 'Current') and more Bad loans (defined as 'Charged Off', 'Default', and 'Late (31-120 days)'). The only exception in the words I searched here is for the word 'Chance' which actually performed better than the average loan. (This could be true for Prosper, as well, and is worth further investigation.)

When taken as a group, the four bad words yield the following results:

DescriptionTotal LoansPercent GoodPercent BadFully PaidCurrentCharged OffDefault
D,E,F,G All884081.1%9.4%15.5%65.6%7.5%.1%
D,E,F,G without 'help', 'behind', 'need', 'payday'643681.8%7.8%13.8%68%6%.1%

So, as I start to invest, I expect that I'll only be investing in loans where the title and description does not have any of these words.

Saturday, August 27, 2011

AI Series: A Performance Measure

As mentioned previously, I'm planning on following along with Stanford's Introduction to Artificial Intelligence class this coming semester. I've just received the book, Artificial Intelligence A Modern Approach and am starting to work my way through it.

As I go through the book, I'll be thinking about what I would do if I were building my own program to analyze loans and writing up my analysis here.

Chapter 2 discusses the idea of rationality and how we determine whether an agent (program, in this case) has done well. Specifically we would create a performance measure. In the case of Peer to Peer lending, I think the following performance measure would be in order:

  1. The total return from investing in a loan
  2. Subtracting a small percentage of the investment for the time the money is invested but the loan as not yet started
  3. Subtracting some amount for a loan that goes over 30 days late
The first rule speaks for itself. Since our goal is to maximize return we want our agent to choose to invest in loans which will give the most return on investment. Let's say that we are investing $100 in every loan and the total return from the loan is $110. We know that this loan has done well for us but it hasn't done as well as a loan that returns $115. And it has done much better than a loan that returns $40.

The second rule is a rule to encourage the agent to pick loans which are closest to closing. Given two loans that are exactly equal in every other way, we'd rather invest in the one that is two days from being funded than the one that is 10 days from being funded.

The third rule is more of a personal preference. Even if the agent were able to pick out borrowers who would pay over 30 days late and still end up paying off all of the loan value (and perhaps even more, with penalties) I don't want these loans. They would drive me crazy every time I would look at my portfolio. So rule three biases these loans downwards to make the agent value these loans less.

Sunday, August 21, 2011

A look at "family" words

It's nice to see all the traffic we've been getting for the analysis of words from Prosper loans. A couple sites latched onto the fact that in Pre-2008 loans we saw certain family words performing appearing in loans that performed poorly.

Since that topic appeared to be of interest to people I wanted to explore it in more detail:

DescriptionTotal LoansPaidRecoveredNever Recovered
Pre 2008, D, All311759.3%60.2%39.8%
Pre 2008, D, Contain a "family" word92452.2%53%47%
2008-2009, D, All237946.7%47.4%31.6%
2008-2009, D, Contain a "family" word48743.3%43.9%35.5%
2010, D, all131413.9%13.9%4.9%
2010, D, Contain a "family" word15514.8%14.8%2.6%
2011, D, All11932.8%2.8%0%
2011, D, Contain a "family" word1031.9%1.9%0%

It looks like the case is true through the 2009 loans so far. The 2010 and 2011 loans are too young to draw conclusions, but given the initial numbers I'm not certain that I would be comfortable saying that loans with a "family" word are a bad investment.

"Family" words are defined as: husband, child, children, mother, daughter, son in either the title or description of the loan.

Saturday, August 20, 2011

A look at Prosper's 2008 Loans

Previously I published a list of words which did poorly in pre-2008 loans and words that did well in pre-2008 loans. I wanted to see if those lists could predict what would happen in 2008 loans on Prosper.

I created a new value, WordValue, which will become negative if a loan has more words which had previously failed in it and become positive if a loan has more words which had previously succeeded in it. (Additional description of the value is below.)

Suffice it to say, I expected that lower WordValues would repay less often than higher WordValues. It turns out that this was not the case for 2008 loans:

DescriptionPaidRecoveredNever Recovered
2008, D, All49.8%50.7%34.7%
2008, D, WordValue <-150.9%51.7%34.2%
2008, D, WordValue >= -147.5%48.3%35.8%
2008, E, WordValue < -144.7%45.6%39.1%
2008, E, WordValue >= -142.2%42.7%46.4%

What I found is that lower WordValues actually repayed at a higher rate than higher WordValues--exactly the opposite of what I was expecting. This means that some of the low performing words in loans before 2008 performed better than average in 2008 loans.

Since my original conjecture is that the word "need" performs less well than average I tested that on this same set of loans and found the following:

DescriptionPaidRecoveredNever Recovered
2008, D, All49.8%50.7%34.7%
2008, D, Title or Description contain "need"49.7%50.7%36.2%
2008, D, Title or Description do not contain "need"49.9%50.7%33.8%
2008, E, Title or Description contain "need"43.5%44.6%39.3%
2008, E, Title or Description do not contain "need"44.6%45.4%41.6%
2009, D, All27.9%27.9%13.4%
2009, D, need in title or desc25.5%25.5%14.7%

We get mixed messages here, too. In 2008 D loans with "need" were about 2.5% more likely to never recover (meaning they were confirmed to Default or Charge Off) but roughly equally as likely to have ended with a "Paid" status. 2008 E loans with "need" are, to date, less likely to have finished their loan with a "Paid" status but 5.5% less likely to never recover. 2009 D loans are less likely to have ended and Paid and more likely to never recover.

Now obviously not all 2008 and 2009 3-year loans have reached the end of their term. We'll be able to draw better conclusions in the coming months, but it's entirely possible that there is no correlation between "need" and loans which aren't repayed.

In future posts I'll whittle down my list of words that fail and see if I can find a set of words that consistently has results which are worse than the average.



About the WordValue number:
I created the WordValue by taking the difference between the Paid percentage of loans containing each word and the average repayment rate for loans before 2008. I only used words that were more or less than .5% of the average.

The WordValue number is the sum of each of those differences from the average taken only once per word.

Wednesday, August 17, 2011

Machine Lending

For those of you not already in the know, Stanford is offering a few free AI classes online during the fall semester. Introduction to Artificial Intelligence and Machine Learning seem like they'd both offer interesting advice on creating a program which could pick loans with a better chance of repayment than I would pick by hand.

I'm looking forward to the start of the classes. It isn't as cool as designing a self driving car but it's a start.

Monday, August 15, 2011

What's Coming

With the belief that the way people write their requests for loans reflects their attitude towards lending and, therefore, their probability of repayment, I set out on a journey to find words which are associated with failed loans.

So I looked at the word "need" in a few different ways. I saw that there is variation on Prosper across the years and across the credit grades, and I've seen that something similar is happening on Lending Club.

I also looked at words that are associated with failed loans and words that are associated with loans that get paid off.

In the next few weeks, I'll start testing the words that have failed before and see if they fail on newer loans with the idea of building a list of words that, if used by a borrower, indicate that a loan is more likely to fail. I'll show how my first attempt failed -- words associated with failure in Prosper loans before 2008 weren't all associated with failure in 2008 loans -- and I'll try different sets of words to see if I can find some consistent pattern.

Sunday, August 7, 2011

Needs Series: Comparing to Lending Club

Of course all of this needs data does us absolutely no good if it isn't generalizable beyond the loans that it came from. Since the initial data was taken from Prosper, it seemed that Lending Club loans would be make a good comparison and continue to tell us if we're finding something relevant or just random noise.

Data Set: All Lending Club Loans made in 2007 and 2008

2007-2008 LoansTotal LoansPercent Charged Off or Defaulted
All Loans299621.2%
"Need" in title or description81426.8%
"Need" not in title nor description218219.2%
"Payday" in title or description757.1%
"Payday" not in title nor description298921.1%

So there you have it, the word "Need" appears to correlate more often with a failure to repay a loan in Lending Club as well. (I included "Payday" data just for fun -- with only 7 loans the data isn't likely to be relevant, but it does fall in line with what we'd expect.)

Since so many of the 2009 loans won't be paid off until 2012, I included the >30 days late category with the already Charged Off and Defaulted loans and found the following stats:

2009 LoansTotal LoansPercent Charged Off, Defaulted or >30 days late
All Loans528110.7%
"Need" in title or description126313.6%
"Need" not in title nor description40189.8%
"Payday" in title or description683.3%

The trend continues...


All Articles in the Needs Series
An Introduction
Initial Findings
Correlation Matrix
Comparing to Lending Club

Saturday, July 30, 2011

Words of Win

Let's take a look at the 10 most likely words to have been used in a loan that repaid:

lender
risk
card
rate
fund
investment
rates
0
minimum
ratio

These words range from lender, at 8% more likely to have Paid than average, to ratio, at 4% more likely to pay than average.

Interesting here is the discrepancy between the words of loss -- maxing out at 23% less likely to pay than average -- and the words of win -- maxing out only 8% more likely to pay than average. It could be that the words a borrower chooses may only help in identifying lenders who are less likely than average to repay, but not show borrowers who are more likely to repay.

Update Aug 30: I have added a post on the methodology used to find these words.

Words of Loss

I took a look at all loans that were made on Prosper up through the end of 2007 to see what words people who failed to repay their loans used most often. Looking only at words that were used more than 1000 times, the bottom 10 list is quite interesting:

payday
chance
behind
son
daughter
mother
children
child
track
deleted

Requests where the word 'payday' was used were 22% less likely than average to repay their loans. Requests containing the word 'deleted' were 13% less likely than average to repay their loans.

Most interesting, in my mind, is all of the words that invoke family. Son, daughter, mother, children, child -- half of the bottom 10 words are family members. Husband, at 10% less likely than average to pay, is 24th from the bottom.

In the next post I'll take a look at the words which were associated with a positive loan outcome.

Update Aug 30: I have added a post on the methodology used to find these words.

Monday, July 4, 2011

Need Series: Correlation Matrix

A correlation matrix tells us how much the change in various factors relate to one another. We're looking here to see if using the word "need" is strongly associated with some other variable, such as credit grade, Debt-To-Income Ration (DTI), or any other factor. If there is a strong correlation with some other factor then we know that we're not on to anything new here and can move along.

Generally it is my understanding that a correlation (or negative correlation) over .5 is strong and over .1 is weak. A correlation of 0 would mean that there is no relationship between the values at all.

Data set: Loans started between 2005-2008 which were not Cancelled
Credit Grade was scored as: AA=10, A=9, ..., HR = 4
Loan Outcome was scored as: Paid = 3, PaidInFull and RecoveredInFull = 2, Everything Else = 1
Title has need was scored as: Yes = 1, No = 0
Description has need was scored as: Yes = 1, No = 0
DTI and Amount Requested were as published by Prosper

Credit GradeDescription Has NeedDebt To IncomeAmount RequestedTitle has NeedLoan Outcome
Credit Grade1-.17.03.41-.11.30
Description Has Need-.171.01-.04.16-.11
Debt To Income.03.011.0900.04
Amount Requested.41-.04.091-.05-.06
Title Has Need-.11.160-.051-.06
Loan Outcome.30-.11.04-.06-.061

As we would expect, we see some stronger correlations, like the .3 between credit grade and loan outcome. We also see a decent correlation between the description and the title having the word "need" in them.

It is interesting to see a -.11 correlation between loan outcome and the word "need" in the description of the loan -- this is stronger than the -.06 correlation when we look at the title. In a future post we'll do a t-test to see if these results are statistically significant.


All Articles in the Needs Series
An Introduction
Initial Findings
Correlation Matrix
Comparing to Lending Club
What We Fund

Sunday, July 3, 2011

Need Series: Initial Findings

Let's look at my initial findings:

2005-2006 LoansTotal LoansNever RecoveredRecoveredPaid
Title contains "need"65246.9%52.8%51.2%
Title does not contain "need"532137.0%62.9%61.4%

2007 LoansTotal LoansNever RecoveredRecoveredPaid
Title contains "need"109247.7%52.3%50.6%
Title does not contain "need"1038737.3%62.7%61.5%

2008 LoansTotal LoansNever RecoveredRecoveredPaid
Title contains "need"86134.8%48.9%48.6%
Title does not contain "need"1070430.3%55.0%54.3%

It's worth pointing out that not all 2008 loans will have completed yet, so we expect the percent Paid to increase--for both "need" and not "need" loans--as the year goes on. Still, for loans which have reached their conclusion (2005-2007 loans) we see roughly a 10% difference in number Paid between the groups.


Now 312lender and frinxor pointed out on the prospers.org forum that use of the word "need" could be directly correlated with credit score and, hence, non-payment rate. Let's take a look at those numbers for all loans which originated between 2005-2007:

Credit ScorePaid With "Need"Paid Without "Need"Difference
AA79.6%87.0%7.4%
A76.3%76.2%-0.1%
B60.5%69.3%8.8%
C57.1%62.3%5.2%
D53.0%60.0%7.0%
E45.4%49.3%3.9%
HR32.3%37.0%4.7%

So it would seem that almost every credit grade has a difference between the groups. Now it's interesting to note that only 5% of A loans used the word "need" in the title and nearly 15% of HR loans used the word "need" in the title. This may, in fact, be relevant when we do more in-depth statistical analysis later on.

Initially, however, it still looks like we're on to something.


All Articles in the Needs Series
An Introduction
Initial Findings
Correlation Matrix
Comparing to Lending Club
What We Fund

Need Series: An Introduction

I look at the titles of many Prosper loans, and they sadden me:
Need money to pay off the high interest credit card bills !!
Fresh Start Needed!
need to build credit
HELP! Need money til I refi

I see borrowers needing money and I want to avoid those loans like the plague. It bothers me. I don't like the idea of needing things from others--and I don't like the idea of others needing things from me. It just seems to me that if a borrower really needs the money then they're not trying hard enough or thinking widely enough about the problem. And I associate that attitude with a failure to pay back loans.

So I set out to explore the question: "Do loans with the word 'need' in the title pay back less than loans without the word 'need' in the title."

My initial results are interesting. As you can see from the first batch of 2008 loans I tested, there was a 5.7% difference in the number of loans that were paid out as agreed:
DescriptionNumber of LoansPaid
2008 Loans, with "need" in the title86148.6%
2008 Loans, without "need" in the title1070454.3%

Five percent isn't huge, but it's big enough to keep my interest for a while. In future posts I'll break open the statistics book to start to explore these findings. I'll explore whether the numbers are even statistically significant and I'll see if there is a better explanation for my findings, such as credit rating and current delinquencies. Maybe I'm on to something new. Maybe I'm just tilting at windmills.


All Articles in the Needs Series
An Introduction
Initial Findings
Correlation Matrix
Comparing to Lending Club
What We Fund

Background

Chasing good returns, I've been lending on Prosper since summer of 2006. For me, and so many other early Prosper lenders, those returns never materialized and I lost money on my initial investments.

Here it is, some 5 years later, the housing and stock markets have tanked, Prosper has gone through an SEC Registration and LendingClub has emerged. To date my only peer-to-peer lending remains with Prosper, but it's time to explore money a bit more wisely.

In this blog I'll be exploring some ideas I have about lending, re-learning long-forgotten statistics, and ultimately chasing a healthy return for my investment.

My first series will explore an idea that I have that Prosper loans with the word "need" in the title end up with less of a payout than loans without the word need. I've got a statistics book on my desk and a stats professor ready to tell me my assumptions are wrong and 7 years of Prosper loan numbers to analyze.