Saturday, September 3, 2011

Bad and Good Words Revisited

In a post the prospers.org forums, user havastat recommended looking at the listings for good and bad words, randomly dividing them, and seeing if they come out similarly. If they don't, there's a good chance that the findings were random. If they do, the findings are more likely to be relevant.

Percent Paid (By Loan): Is the percent of loans, containing the indicated word at least once, which finished with a status Paid.
Percent Paid (By Word): Is the percent of time that a loan ended with the status paid, weighted by the frequency of the word in the listing. (For example, a loan with a title "Help, help, help, help!" which did not pay would count four times more than a loan with "Help" listed only once.)
Word Count: The number of listings containing the word at least once. (Notably not the total number of times the word was used--the maximum here is once per listing.)

Like in the original posts, these are words from Prosper loans that were created before 2008. My methodology is at the bottom of the post, but loans were assigned to groups randomly and there were 8728 loans in each group.

Group 1 Worst Performing Words
WordPercent Paid (By Loan)Percent Paid (By Word)Word Count
[average Paid]61.1%
payday38.9%38.6%596
behind42.9%43.6%592
mother43.5%44.8%566
chance44.5%42.1%631
track46.8%45.6%581
son47.1%44.8%597
daughter48.1%46.3%516
child48.7%47.9%520
husband49%51.3%896
single49.5%49.7%707

Group 2 Worst Performing Words
WordPercent Paid (By Loan)Percent Paid (By Word)Word Count
[average Paid]59.6%
payday37.5%39.2%595
behind42.4%41.3%566
chance43.5%41.5%575
son45.7%44.2%514
mother46.6%46.7%601
children47%45.6%854
daughter47.7%44.6%539
DELETED47.7%46.6%507
child47.8%46.3%552
3000048.3%47.6%532

So, as with the original Words of Loss post, we see the word 'payday' at the bottom, with the words 'behind', 'chance' and then family words like 'mother', 'child', etc. to be on the bottom for both groups.


Now let's take a look at the best performing words:


Group 1 Best Performing Words
WordPercent Paid (By Loan)Percent Paid (By Word)Word Count
[average Paid]61.1%
tax67.1%66.7%504
early67.2%67.7%534
rate67.6%68.6%1952
term67.6%66.6%509
risk67.8%70.3%565
fund68.2%70.2%666
rates68.3%68.8%609
lender68.3%70.2%707
minimum68.4%68.4%583
investment69.1%69.3%679

Group 2 Best Performing Words
WordPercent Paid (By Loan)Percent Paid (By Word)Word Count
[average Paid]59.6%
risk64.2%66.1%592
card64.2%65.9%2910
higher64.4%63.2%765
style64.5%58.8%968
span64.9%59.5%1069
don't65%64.3%861
rate65.2%65.3%1938
student66%66.3%1078
lender66.2%67.5%754
I've66.2%63.6%888

It's interesting to see that lending words appear on both of these lists -- but there are fewer matches than the worst performing words. It looks like we've got 'risk', 'rate(s)' and 'lender' as matches but all of these are still much closer to the average paid than the worst performing words.

It could be that we will find that we can only tell if a loan is more likely to fail from the words that it uses, not that a loan is more likely to succeed.

Methodology:

Similar to the methodology I used in the previous two studies, I began with all Pre-2008 Prosper Loans.

I then placed all the loans in a random order and assigned them to Group 1 or Group 2 sequentially. From there I built a list of all the words in the title and body of the listing for those loans, tallying the number of times the word was used in each loan.

To come up with the Percent Paid (By Loan) I divided the number of loans with that word that finished with a status Paid by the number of loans with that word in total.

No comments:

Post a Comment