Monday, July 4, 2011

Need Series: Correlation Matrix

A correlation matrix tells us how much the change in various factors relate to one another. We're looking here to see if using the word "need" is strongly associated with some other variable, such as credit grade, Debt-To-Income Ration (DTI), or any other factor. If there is a strong correlation with some other factor then we know that we're not on to anything new here and can move along.

Generally it is my understanding that a correlation (or negative correlation) over .5 is strong and over .1 is weak. A correlation of 0 would mean that there is no relationship between the values at all.

Data set: Loans started between 2005-2008 which were not Cancelled
Credit Grade was scored as: AA=10, A=9, ..., HR = 4
Loan Outcome was scored as: Paid = 3, PaidInFull and RecoveredInFull = 2, Everything Else = 1
Title has need was scored as: Yes = 1, No = 0
Description has need was scored as: Yes = 1, No = 0
DTI and Amount Requested were as published by Prosper

Credit GradeDescription Has NeedDebt To IncomeAmount RequestedTitle has NeedLoan Outcome
Credit Grade1-.17.03.41-.11.30
Description Has Need-.171.01-.04.16-.11
Debt To Income.03.011.0900.04
Amount Requested.41-.04.091-.05-.06
Title Has Need-.11.160-.051-.06
Loan Outcome.30-.11.04-.06-.061

As we would expect, we see some stronger correlations, like the .3 between credit grade and loan outcome. We also see a decent correlation between the description and the title having the word "need" in them.

It is interesting to see a -.11 correlation between loan outcome and the word "need" in the description of the loan -- this is stronger than the -.06 correlation when we look at the title. In a future post we'll do a t-test to see if these results are statistically significant.

All Articles in the Needs Series
An Introduction
Initial Findings
Correlation Matrix
Comparing to Lending Club
What We Fund

1 comment:

  1. Correlation as you calculate it is the linear relationship between two variables. However, these values are very hard to compare to standards like 0.5 being strongish, because one of the variables is binary here (the loan is good or bad). This is called a point-bi-serial correlation.