Generally it is my understanding that a correlation (or negative correlation) over .5 is strong and over .1 is weak. A correlation of 0 would mean that there is no relationship between the values at all.
Data set: Loans started between 2005-2008 which were not Cancelled
Credit Grade was scored as: AA=10, A=9, ..., HR = 4
Loan Outcome was scored as: Paid = 3, PaidInFull and RecoveredInFull = 2, Everything Else = 1
Title has need was scored as: Yes = 1, No = 0
Description has need was scored as: Yes = 1, No = 0
DTI and Amount Requested were as published by Prosper
Credit Grade | Description Has Need | Debt To Income | Amount Requested | Title has Need | Loan Outcome | |
Credit Grade | 1 | -.17 | .03 | .41 | -.11 | .30 |
Description Has Need | -.17 | 1 | .01 | -.04 | .16 | -.11 |
Debt To Income | .03 | .01 | 1 | .09 | 0 | 0.04 |
Amount Requested | .41 | -.04 | .09 | 1 | -.05 | -.06 |
Title Has Need | -.11 | .16 | 0 | -.05 | 1 | -.06 |
Loan Outcome | .30 | -.11 | .04 | -.06 | -.06 | 1 |
As we would expect, we see some stronger correlations, like the .3 between credit grade and loan outcome. We also see a decent correlation between the description and the title having the word "need" in them.
It is interesting to see a -.11 correlation between loan outcome and the word "need" in the description of the loan -- this is stronger than the -.06 correlation when we look at the title. In a future post we'll do a t-test to see if these results are statistically significant.
All Articles in the Needs Series
An Introduction
Initial Findings
Correlation Matrix
Comparing to Lending Club
What We Fund
Correlation as you calculate it is the linear relationship between two variables. However, these values are very hard to compare to standards like 0.5 being strongish, because one of the variables is binary here (the loan is good or bad). This is called a point-bi-serial correlation.
ReplyDelete