Toward Effective yet Impartial Altruism: A Fairness-aware Loan Recommender System for Microﬁnance Services

(1)

Toward Effective yet Impartial Altruism:

A Fairness-aware Loan Recommender System for Microfinance Services

Eric L. Lee

¹

, Jing-Kai Lou

²

, Wei-Ming Chen

¹

,

Yen-Chi Chen

¹

, Shou-De Lin

¹

, Yen-Sheng Chiang

³

, and Kuan-Ta Chen

²

1Department of Computer Science and Information Engineering, National Taiwan University

2Institute of Information Science, Academia Sinica

3Department of Sociology, The Chinese University of Hong Kong

ABSTRACT

Up to date, more than 15 billion US dollars have been invested in microfinance that benefited more than 160 million people in developing countries. The Kiva organization is one of the successful examples that use a decentralized matching process to match lenders and borrowers. Interested lenders from around the world can look for cases among thousands of applicants they found promising to lend the money to. But how can loan borrowers and lenders be successfully matched up in a mocrofinance platform like Kiva? We argue that a sophisticate recommender not only pairs up loan lenders and borrowers in accordance to their preferences, but should also help to diversify the distribution of donations to reduce the inequality of loans is highly demanded, as altruism, like any resource, can be congestible.

In this paper, we confirm the existence of strong diversity in lenders’ individual preferences when they select the recipients of the loan and the monetary amount to contribute for each individual loan applications. Therefore, we propose a fairness-aware recommendation system based on one-class collaborative-filtering techniques for charity and micro-loan platform such as Kiva.org. Our experiments on real dataset indicates that the proposed method can largely improve the loan distribution fairness while retaining the accuracy of rec- ommendations. To our knowledge, this is the first-ever collaborative filtering algorithm that takes fairness into consideration, which we believe is essential in designing a recommender system for social welfare.

1. INTRODUCTION

Since the pioneering endeavor by Yunus and Yusus [27], microfinance has received intense attention and been widely adopted around the world. Up to date, more than 15 billion US dollars have been invested in microfinance that benefited more than 160 million people in developing countries.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.

As a successful financial and philanthropic model, microfinance has attracted researchers from different disciplines to investigate what mechanisms underlie its success. Mi- crofinance successfully overcomes two hurdles that prevent poor people from getting loans from the traditional financial institutions. First and foremost, the risk problem—how can we decrease delinquency rates of loans provided to the economically disadvantaged ? Second, how can we find will- ing loan providers? Practitioners of microfinance solve the first problem in reference to social theories, using the mechanism of collective monitoring and punishment to minimize incidents of delinquency [5, 9, 11, 12]. Advocates of microfinance solve the second problem by linking microfinance to philanthropy [26], motivating loan providers to treat lending money to the poor as an act of kindness. With the two major problems being solved, it remains a question how to screen out potential loan borrowers and lenders. While the business of microfinance starts with a more traditional model that utilizes professional agencies to conduct screening and matching, thanks to technical advancement and the emer- gence of social media, we have increasingly seen successful models that adopt decentralized matching mechanisms—a model that would allow loan applicants and providers to get in contact directly with no (or very little) involvement of intermediaries.

The Kiva organization, started up by two young social entrepreneurs in 2005, is one of the successful stories that use a decentralized matching process. Kiva features a platform where the representative agencies of loan applicants can post their information on the web, such as applicants’ photos, description of their financial needs, and purpose of the loans.

Interested lenders from around the world can then look for cases among the applicants they found promising to lend the money to. There is no interest rate incurred to the loans and Kiva has maintained an incredibly high repayment rate (99%) since it was embarked.

Altruism is Congestible. But how can loan applicants and lenders be successfully matched up in Kiva? We argue that a successful matching program shall not only pairs up loan applicants and lenders in accordance to their preferences to the best, but also helps to diversify the distribution of donations to alleviate the concern of inequality. Experimental findings from behavioral sciences, such as Andreoni [3] and Chiang [7], have shown that altruism is congestible in the sense that people on average receive less money when there are more potential recipients competing

(2)

for limited donation, suggesting that concentration of the money to a limited set of recipients would crowd out the opportunities of others receiving the financial help.

Crowding-out Phenomenon. How loan applicants and providers are matched up is not only critical to determining the quality of the service that online microfinance providers, but is also crucial to alleviating global inequality and poverty—

one of the original goals microfinance aims to achieve. Ex- perimental research from behavioral sciences has indicated the congestible nature of altruism, meaning that charity donation, for example, would be crowded out when there are both more donors and more recipients available. Economists attribute the crowding-out phenomenon on the donors’ side to the free-rider problem: Treating improving inequality as a “public good” or obligation, the more helpers are available, the more likely one would save his/her cost by free riding on the contribution (helping) offered by others [2, 4, 15]. Social psychologists share a similar view on this problem, employ- ing the concept of “bystander effect”, arguing that donors would feel less honored to help when realizing that their help does not count much among the many other helpers available [24]. The crowding-out phenomenon occurs on the donee’s side too. Experimental findings show that average donation received by a donee decreases even though the total amount of donation of a donor may in fact increase, suggesting that although people become more generous when seeing more needs, the increase of his/her generosity does not catch up with the growth of needs [3].

Because of the crowding-out effect of altruism, it is important that loan borrowers and lenders be properly matched up, not only to maximize the motivation of providers, but also to prevent the loans from being over-concentrated in the hands of a very few. Yet, matching is by no means a triv- ial task [19] and it is difficult for the loan applicants and providers to do the matching themselves. It points to the importance of a well designed recommendation system to accomplish the task on online microfinance.

Proposal. In this paper, we propose a matching algorithm for microfinance whose goal is to not only maximizing the opportunity of successful matching but also diversifying the resources of loan providers. There are some technical dif- ficulties to overcome to achieve such a goal. First, relevant data has to be retrieved to train a model. There are usually available APIs for crawling data from the relevant websites.

For instance, Kiva.org provides API for researchers to col- lect information about lending transactions. However, due to privacy issue, some sensitive data might not be available for downloading. For example, in Kiva.org the amount of money that each lender contributes to a loan is not pub- licly available. Such record indeed is very important for designing a matching system as it corresponds to the ’ratings’ information in a recommendation engine (i.e. the more money a lender decides to lend, the better this lender likes the proposal), which is essential for creating a matching engine based on collaborative filtering methods. Fortunately, in some situations partial binary decision values (whether a contribution was made) of each lender to each loan can be extracted. Here we propose to utilize such information as

“implicit feedback” [6] to design the matching algorithm. In such scenario, known as the one-class collaborative filtering (OCCF) problem [16], negative (i.e. not interested) and un- labeled (i.e. not seen) examples are mixed together. Along this line, this paper designs an OCCF-based algorithm to

perform matching in the microfinance domain. Besides deal- ing with the implicit information, we also want to bring the concept of fairness into consideration while designing the recommendation system. In other words, such matching algorithm not only has to predict the preference of the lender to the loan accurately, but also needs to balance the chance each loan is recommended to the lender. The concepts of fairness and recommendation to some extent are contradict.

If one cares about pure fairness, then there is no need to perform recommendation as we can simply equally divide the resources to every person in need. On the other hand, the goal of performing recommendation is indeed to break the fair situation so that some specific loan is recommended to certain lenders. Thus, a successful fairness-aware matching system needs to take such trade-off into consideration.

To solve the one-class collaborative filtering problem, we adopt the Bayesian Personalized Ranking idea to a Matrix Factorization engine. To consider the fairness of recommendation, we propose two methods: 1) Item-based regularization method and 2) the fairness-aware BPR-MF method.

The first model exploits a regularization term to build the distribution of ratings to avoid skewed recommendation, but suffer the drawback of high computational cost. The second approach dynamically adjusts the learning step in the stochastic gradient descent process to achieve the goal of balancing recommendation, that is relatively efficient and yields good results.

Contribution. Our contribution in this paper is three- fold:

1. Based on an empirical dataset from Kiva.org, we confirm that strong diversity exists in lenders’ individual preferences. In particular, the crowding-out phenomenon is observed that lenders tend to “crowd out”

loans from the most prevalent sectors/countries; meanwhile, the least prevalent sectors/countries are less supported probably due to low visibility. Another observation is that lenders tend to contribute more to fa- vored loans and the difference can be significant. These findings highlight the importance of a recommendation mechanism that can consider both contribution maxi- mization and fair allocations among loans.

2. We propose a fairness-aware recommendation system for charity and micro-loan platform such as Kiva.org, and conduct experiments to verify the effectiveness of the system. To our knowledge, this is the first- ever loan recommendation algorithm that takes fairness into consideration.

3. We believe this paper belongs to the emerging cate- gory of the Industry and Government track because it identifies the diversity in lenders’ behaviors toward loan selection and loan contribution and further con- firms the crowding-out effect on the Kiva microfinance platform. The proposed fairness-aware recommender algorithm can be integrated in any microfinance platform that would enable a less congestible, more equal, and more efficient microfinance mechanism in a even larger scale to help the world fight poverty.

The remainder of this paper is organized as follows. Sec- tion 2 describes related works on recommendation systems.

We present an overview of the Kiva ecosystem in Section 3.

In Section 4, we propose a fairness-aware loan recommender system for microloaning services and then evaluate its per-

(3)

formance in Section 5. Following that, we take a closer look of lenders’ behavioral diversity in order to explore more opportunities in enhancing the proposed recommender system in Section 6. Finally, Section 7 draws our conclusion and future work.

2. RELATED WORK OF RECOMMENDA- TION SYSTEMS

Collaborative filtering (CF) techniques have long been proposed to model explicit feedbacks from users. Compar- ing to the content-based techniques [17,21], CF methods are more general, require less information, and in many situations produce superior results. One of the most straight- forward CF model is k-nearest neighbor based CF (kNN) [10,13,20,23] , which is based on user-wise or item-wise similarity for mapping. In general, the similarity measurement (e.g. the Pearson correlation) of kNN is chosen through a trial-and-error process on the validation datasets. Recently matrix factorization (MF) based methods have become popular and are widely accepted as the state-of-the-art single model for CF, as researchers have found that given sufficient rating data, MF methods outperform many other methods in competitions such as Netflix Challenge, KDD Cup, etc.

Comparing to kNN based methods, MF methods are usually more efficient and effective, as they are able to discove the latent features which are usually hidden behind the in- teractions between users and items. However, MF tends to overfit training data so there has been extensions to address this issue, such as regularized least-square optimization with case weights (WR-MF) [14,16], and max-margin MF [22,25].

Despite well-developed techniques to model explicit feedbacks from users, in many practical scenarios such as the one we are facing, available are only implicit feedbacks such as click-through history are available. This kind of problems is known as one-class collaborative filtering (OCCF) [16]. In OCCF, magnitude of user’s preference is usually invisible, therefore it is hard to distinguish negative examples from un- labeled examples. OCCF can be regarded as a ranking problem in which we need to rank the positive instances above others. Exploiting the idea of ranking optimization, Ren- dle et al. [18]proposed the Bayesian Personalized Ranking (BPR) framework to optimize area under ROC curve(AUC) for ranking problem directly. Pan et al. [16] also provided several novel approaches to deal with these problems, such as weighted Alternating Least Square (wALS) method which gives missing examples different weights to avoid drawbacks of two extreme cases, all missing as unknown and all missing as negative, and sampling ALS Ensemble (sALS-ENS) method that applies bootstrap aggregating (Bagging) techniques.

Our main focus is on the fairness-aware recommendation system. So far the closest works we have seen are the ones that emphasize on the diversification of recommendation [1, 28, 29]. The core concept of diversification is to recommend different kinds of items to improve users’ satisfaction.

However, it is very different from the concept of fairness where we want to diversify the matching between lenders and loan applicants.

Choo et al. [8] have focused on building personalized loan recommendation system which is based on content-based filtering techniques using specialized feature integration techniques and gradient boosting tree (GBtree). The goal is

Table 1: Loan summary in the full and reduced datasets

Status Full dataset Reduced dataset

Fundraising 5,672 (0.91%) 0 (0%)

Fully funded 280 (0.04%) 57 (0.02%)

Expired 10,594 (1.69%) 0 (0%)

Paying back 113,472 (18.14%) 99,085 (38.32%) Paid 484,267 (77.43%) 157,407 (60.87%) Ended with loss 11,143 (1.78%) 2,049 (0.79%)

Total 625,428 (100%) 258,598 (100%)

similar to ours except that fairness has not yet been taken into consideration in their design.

3. THE KIVA ECOSYSTEM

In this section, we describe the Kiva dataset, post-processing steps, and the user behavior of loan applicants and providers.

3.1 Data Description

On their own website, the Kiva organization provides three public datasets which contain the information about the loan applicants (i.e., borrowers), the lenders, and the connections between them. Since Kiva’s launch in April 2005, till De- cember 2013, there have been 643,495 loans coming from 80 countries, and 1,196,283 lenders registered on the website.

Since each loan can be contributed by a number of lenders (each loan contribution can be ranged from $25 up to $500 USD), the lenders have in the lump made 15,355,805 contributions to the loans worldwide through the Kiva platform.

The loans on Kiva contains descriptive information such as the personal biography of the borrowers (normally their gender and marital status are included), the purpose of the needs, and the amount of money needed. The timestamps of when a loan is posted and funded, as well as its repayment schedule, are also provided. The repayment schedule can be any of three types: monthly (66.5%), irregularly (25.4%), and at end of term (7.9%). A loan can be classified into one of the following six statuses based on their posted and funded timestamps:

• Fundraising: A loan which is currently accepting supports from potential lenders. A loan will be in the status fundraising when it is first posted on Kiva and stays in this status until it is fully funded or expires (30 days after posted date).

• Fully funded: A loan which has been fully funded (i.e., received the desired amount of loan contributions before expired) but the repayment has not started yet.

• Expired: A loan which had not been fully funded within 30 days after being posted online.

• Paying back: A loan is in the paying back status if it is fully funded and the borrower has started to repay the loan to the lenders (according to the repayment schedule specified in the loan descriptions).

• Paid: A loan which is fully funded and the borrower has fully re-paid the loan to the lenders.

• Ended with loss: A loan which was fully funded but the borrower fails to repay the loan according to pre- planned schedule.

We summarize the loans on Kiva in Table 1 according their status as of December 11, 2013. The figures show that

(4)

Female Male

0.00 0.25 0.50 0.75 1.00

1 2

Ranking

Propotion

(a) Borrower gender

Food Retail Agriculture Services Clothing Housing Transportation Arts Education Construction Personal Use Manufacturing Health Entertainment Wholesale

0.00 0.25 0.50 0.75 1.00

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Ranking

Propotion

(b) Loan sector

Philippines Kenya Peru El Salvador Cambodia Uganda Ecuador Nicaragua Tajikistan Ghana Rwanda Paraguay Colombia Jordan Bolivia Mexico Honduras Senegal Sierra Leone Vietnam

0.00 0.25 0.50 0.75 1.00

1 2 3 4 5 6 7 8 9 1011 1213 14 1516 1718 1920

Ranking

Propotion

(c) Borrower country Figure 1: Distributions of major loan attributes

Kiva is indeed a successful microfinance platform as the proportions of expired and faulted loans are both lower than 2%. Because we will focus on the funding behavior from the lenders, we create a smaller, reduced dataset from the whole 5-year dataset with the following reduction: 1) We retain only the funding records from November 1, 2011 to October 30, 2013 where the number of loans and funding activity are relatively stable, and 2) we retain only the loans that have been fully funded, and thus loans in the fundraising and expired status are removed. This results in a reduced dataset of 258,598 fully-funded loans, which we will analyze in more details below.

3.2 Loan Overview

On Kiva, a loan can be requested by an individual person or a group of people. Our dataset indicates that there are 541,937 (86.7%) individual loans and 83,491 (13.3%) group loans, where the number of borrowers associated with a group loan can be up to 50 persons. To help lenders find out the loans they are interested in, Kiva requires each loan to be associated with one of the 15 pre-defined sectors based on the expected usage of the loan.

In Figure 3, we show the proportions of borrowers’ gender¹and country of origin and the sector which the loans will be used for, where the red staircase lines denote the cumulative sums of the proportions. We have made a few inter- esting observations. First, we observe that approximately 75% of the loans are requested by female borrowers. Sec- ond, the loans are requested for various types of purposes, among them, food (26.3%), retail (22.7%), and agriculture (22.1%) are the three main sectors the requested loans are being used for. For example, a borrower may plan to use the fund to purchase flour and baking equipments to run his own bakery or cafe (food sector); in another example, a borrower plans to buy oxen, piglets and forage for feeding the livestock (agriculture sector). Some other uses of the fund include arts, entertainment, and housing, which are much less common but not necessarily with lower desired loan amount. On the other hand, the loan applicants are from a diverse range of countries in different continents: Philip- pines (21.4%), Kenya (11.0%), and Peru (8.8%) together contribute 41.4% of all loans. The figures support that Kiva reaches the global needs without geographical boundaries and supports a variety of uses to improve people’s lives.

3.3 A First Look at Funding Activity

1Only the individual loans are considered as a group loan may comprise borrowers of both genders.

To observe how the funding activities occur on Kiva, we first plot cumulative distribution functions (CDFs) of the desired loan amount with male and female borrowers, respectively. We can see that most loans are of small amounts (i.e., less than $2,000 USD) with a median around $500 USD.

Interestingly, the loan amounts requested by males are sig- nificantly larger than those by females (p-value < 2.2e⁻¹⁶ with Wilcoxon rank sum test). We suspect that might be associated with the purposes of the loans because males and females tend to request loans for very different purposes.

Some purposes, such as housing and wholesale, may incur larger needs of funding and are requested mostly by men.

We plan to look into this issue in further studies.

We analyze the average turnaround time of a loan being funded after it is posted on Kiva. For this purpose, we define two terms: the fundraising days and the fundraising rate. The fundraising days represents the number of days it takes before a loan is fully funded, where the fundraising rate represents the average daily amount the loan receives.

Thus, the fundraising rate can be calculated by fundraising rate = loan amount

fundraising days,

where the fundraising days is simply obtained by subtract- ing the funded date by the posted date. Intuitively, the fundraising rate of a loan can be seen as an indicator of how attractive a loan is to potential lenders.

Figure 2(b) plots the average fundraising rates of female- requested and male-requested loans with 95% confidence in- tervals. The graph reveals that, interestingly, female-requested loans on average attract contributions from lenders in a sig- nificantly higher rate. This may be due to the fact that people are generally more sympathetic to females’ needs, or it is also possible that female borrowers’ smaller loan amount and loan purposes are more acceptable to lenders, which is to be confirmed with more in-depth analysis. On the other hand, Figure 2(c) shows the medians of the fundraising rates and loan amounts of the 15 sectors. This graphs implies that lenders also have strong preferences over loans in certain sectors. For example, loans in education and health sectors can attract loan contributions four times faster than those for personal use, such as housing, retail, and transportation sectors. However, there seems to have no obvious explanation on why a sector is more attractive than another. Meanwhile, the loans in different sectors are also variable in terms of loan amount, even though the difference is not as much as that in terms of fundraising rate. Overall, the loans in high-demand sectors, such as housing and wholesale, request nearly twice the amount in low-demand sectors such as food and retail.

(5)

0.00 0.25 0.50 0.75 1.00

0 1000 2000 3000 4000

Loan amount (USD)

CDF

Female Male

(a) Loan amount distribution

●●

●

200 400 600

0 500 1000 1500 2000

Loan amount (USD)

Fundraising rate (USD/day)

Female Male

(b) Fundraising rate for female- and male-loans

●

Agriculture Arts

Clothing Construction Education

Entertainment

Food

Health

Housing Manufacturing

Personal Use Retail

Services

Transportation

Wholesale

100 200 300 400

400 500 600 700

Loan amount (USD)

Fundraising rate (USD/day)

(c) Fundraising rate for loans in different sectors

Figure 2: Summary of loan amount and fundraising rate

4. FAIRNESS-AWARE RECOMMENDATION

So far we have introduced the Kiva ecosystem and make some preliminary observations on the loans and how they are funded. In this section, we move on to discuss our design on a fairness-aware recommendation system for matching lenders and loan applications.

The matching problem for microfinance can be modeled as a recommendation task. We want to recommend some loans to certain lenders and maximize the chances those lenders would fund the loans. In this sense, we can use the existing data to create a large matrix to represent the connections between lenders and loans. Such matrix is usually sparse as a lender is not likely to investigate on most of the loans.

As described before, in many microfinances services such as Kiva.org, due to privacy concern we are only given the information about which lender has endorsed a loan, but not how much this lender contributes to the loan. Furthermore, if a lender does not endorse a loan, it is not possible to know whether it is because this lender has not yet reviewed this loan, or simply does not like it. Researchers have proposed the one-class collaborative filtering (OCCF) framework to design recommenders for such scenario. It is called one- class since we are only sure about the positive endorsement, where the negative and unseen behaviors of users are in- distinguishable. In this section, we first introduce a popular model for OCCF, the Bayesian Personalized Ranking (BPR) optimization criterion that can be coupled with some prediction model such as matrix factorization (MF) to achieve satisfactory results. Then we will describe a regularization on the BPR-MF factorization method to balance the recommendation for fairness. To address the efficiency concern of the previous method, finally we introduce an efficient model to dynamically balance the contribution of each lender to the loans.

4.1 Bayesian Personalized Ranking Model

The BPR model aims at optimizing the ranking of items during recommendation. It is suitable to solve a one-class recommendation problem where only the binary implicit feedbacks are available, because with binary feedbacks, what a model has to do is to rank the ones higher than the zeroes.

The formula goes as: For any given lender u, let the loans that u has funded be a set I_u⁺and those u has not funded be Iu⁻. The goal of BPR is to optimize the pair-wise ranking of items for each user. That is, for any given u, BPR optimizes the chance that the rank of I_u⁺is higher than that of I_u⁻.

Let θ be the parameters of an arbitrary model m coupled with BPR, ˆyuij(θ) represents the rating difference between

a loan i and a loan j given user u under model m. The objective of BPR is defined as

min

θ

X

(u,i,j)∈Ds

ln(1 + e^−ˆ^y^uij^(θ)) + λ|θ|²,

where Ds := {(u, i, j)|i ∈ I_u⁺ and j ∈ I_u⁺}. This says that the goal is to maximize the pairwise differences for all users between positive and negative sets. Note that the BPR framework can be coupled with any rating model to produce ˆyuij(θ), and here we exploit a widely successful MF model for rating prediction. Assuming there are some latent factors to describe users and items, MF technique ratings first decomposes the incomplete rating matrix into a fully observed user-latent and item-latent matrix, and then multiply them to estimate the missing rating information from users to items. Let Pukbe the k-th latent feature of a lender u, and let Qik be the k-th latent feature of a loan i, we can adopt BPR to MF (denoted as BPR-MF) and obtain the following objective function:

X

(u,i,j)∈Ds

ln 1 + exp(−

K

X

k

Puk(Qik− Qjk))

!

+λ |P |²+ |Q|² .

It can be learned using any optimization method such as gradient descent, after deriving the partial derivatives of Puk, Qik, Qjk as

∂Error

∂Puk

= −e⁻^P^K^k ^P^uk^(Q^ik^−Q^jk⁾

1 + e⁻^P^K^k^P^uk^(Q^ik^−Q^jk⁾ · (Qik− Qjk) + λPuk

∂Error

∂Qik

1 + e⁻^P^K^k^P^uk^(Q^ik^−Q^jk⁾ · Puk+ λQik

∂Error

∂Qjk

1 + e⁻^P^K^k^P^uk^(Q^ik^−Q^jk⁾ · (−Puk) + λQjk. For BPR-based model to optimize the ranking, one needs to perform all-pair updating in Stochastic Gradient Descent (SGD) in which the complexity is |Iu⁻| × |Iu⁺|. This can cause significant computational burden because usually I_u⁻ is a very large set (i.e. the size is roughly the number of users times the number of items, given sparse ratings). To overcome such limitation, in practice one usually resorts to the SGD method and perform down-sampling on the negative set Iu⁻. For each instance (u, i, j) where u represents the user, i represents the positive item, and j represents the negative item, SGD updates the parameter and move on to the next instance. Next we will talk about our design on building recommendation systems that concern the fairness.

(6)

4.2 Item-based Regularization for BPR-MF

Here we have two joint goals to satisfy. The first is to recommend suitable loans to lenders. In order to do so, our BPR-based method has to predict the unknown ratings of lenders to loans accurately. The second goal is to make sure the loans have as equal chance to be recommended as possible. As mentioned previously, these two goals seem to be a trade-off to each other, and ideally we hope our recommender system can achieve high fairness without sacrificing too much accuracy. Note that in the matrix factorization process, the ratings are generated from the multiplication of a user latent matrix P and an item-latent matrix Q. An intuitive method to achieve fairness is to make sure each item receives equal ‘attention’ during recommendation. We model this by adding an item-based regularity term to ensure the sum of the rating values for each item is equal. This constraint can be written as

PUser u

PK

k=1PukQ1k

=PUser u

PK

k=1PukQ2k

= · · ·

=PUser u

PK

k=1PukQItemk

= C.

Since C is a constant and the elements in P, Q can be either positive or negative (means dislike), we can simply set C = 0 and obtain the following Lagrange multiplier as the regularization term

PItem i

PUser u PuQ^Ti

2

. Thus the objective function becomes

X

(u,i,j)∈D_s

ln

1 + e^−P^u^(Q^Tⁱ^−Q^T^j⁾

+ λ · |P |²+ |Q|²

+PItem i

PUser

u PuQ^T_i2

. Then the update rule of Puk, Qik, Qjk becomes:

Puk←Puk− η{∂Error

∂Puk

+ 2 · cost ·

Item

X

i⁰

" _User X

u⁰

Pu⁰Q^Ti⁰

!

· Qi⁰k

# }

Qik←Qik− η{∂Error

∂Qik

+ 2 · cost ·

User

X

u⁰

Pu⁰Q^T_i

!

·

User

X

u⁰

Pu⁰k}

Qjk←Qjk− η{∂Error

∂Qjk

+ 2 · cost ·

User

X

u⁰

P_u0Q^T_j

!

·

User

X

u⁰

P_u0k}.

We can estimate the gradient by sampling |su| lenders and

|si| loans in each update

Puk←Puk− η{∂Error

∂Puk

+ 2 · cost · X

i⁰∈s_i







 X

u⁰∈s_u

Pu⁰Q^Ti⁰



· Qi⁰k



}

Qik←Qik− η{∂Error

∂Qik

+ 2 · cost ·



 X

u⁰∈s_u

Pu⁰Q^Ti



· X

u⁰∈s_u

Pu⁰k}

Qjk←Qjk− η{∂Error

∂Qjk

+ 2 · cost ·



 X

u⁰∈s_u

P_u0Q^T_j



· X

u⁰∈s_u

P_u0k}.

This method seems to be a reasonable one to ensure the fairness in recommendation. However, it introduces some

serious burden for computation, especially for updating

X

i⁰∈s_i



 X

u⁰∈s_u

Pu⁰Q^T_i0



· Qik,

which takes O(|su| × |si|) times of complexity than the reg- ular MF model.

4.3 Fairness-aware BPR-MF

Acknowledging the efficiency concern of the previous method, here we would like to propose not only a more efficient but also more general model for fairness-aware recommendation.

Recall that we need the tuples (u, i, j) for training the model in SGD. For a given lender u and the positive loan i, we find a negative sample j from Iu⁻ and perform updating. Nor- mally each (u, i, j) tuple is treated as equally important during updating. Our idea is that to achieve fairness, maybe we should treat the tuple with “popular” j more seriously than those with less popular j. The intuitive behind is that if j has been a popular loan liked by many lenders, it is preferable to update our model more toward a direction that j is disliked. Therefore during the SGD process, we do not as- sign equal step size for each instance tuple, but larger step to the situation where a popular tuple has been assigned the

“negative” role as loan j. Given this idea, the next question would be how to evaluate the “popularity” of a loan j during training. Our idea is to use the model learned up to date (i.e. most recent P and Q) to predict the ratings of both i and j on all users, and the popularity of a loan j with re- spect to an update (u, i, j) is defined as the probability that a user likes j more than i. The popularity then becomes a weight to adjust the step size of SGD.

The detailed process goes as: first we random sample a negative example j, and then sample N reference lenders u1, u2, . . . , uN, based on which we can generate the popularity of j, proportional to which we can determine the step size of SGD during updating:

popularity(j) := 2

N

X

n=1

JP^unQ^T_j > Pu_nQ^T_iK/N

Puk:= Puk+ α[C · Puk+ popularity(j) · ( ^e^−(PuQ

T i−PuQT

j) 1+e^−(PuQTⁱ^−PuQT^j

) · (Qik− Qjk)]

Qik:= Qik+ α[C · Qik+ popularity(j) · ( ^e^−(PuQ

T i−PuQT

) · (Puk)]

Qjk:= Qik+ α[C · Qjk+ popularity(j) · ( ^e^−(PuQ

T i−PuQT

) · (−Puk)].

Note that our modification mainly focuses on the learning rate of SGD, which means it can not only be applied to BPR-MF, but also other BPR models that exploit SGD for updating.

5. EVALUATING THE FAIRNESS-AWARE RECOMMENDATION MODELS

Up to date, we have not yet seen any recommendation model that considers fairness as a key factor. Thus in the evaluation we focus on comparing the proposed item-based

(7)

Table 2: The best AUC and the Std under such AUC for each model

Method Best AUC Std

BPRMF 0.667 2083.0

Item-based Regularized Method 0.663 2039.2

Fairness-Aware BPRMF 0.650 1225.6

regularized BPRMF method and fairness-aware BPRMF approach against the original BPRMF as a baseline. Note that the goal here is not about beating the competitors in the prediction accuracy, rather we want to test whether the goal of achieving fairness can be achieved without sacrificing too much accuracy in rating prediction.

5.1 Data Preparation

We use the Kiva.org dataset described previously for evaluation. Note that we only have positive values in the dataset (meaning the lender has agreed to provide funding to the loan), and they are divided into training, validation, and testing sets.

A realistic way to divide data is to setup three ordered time periods P1, P2, and P3, and treat all data occurred in P1 as training, in P2 as validation, and in P3 as testing. Here we choose P1 = {2005/4/15 − 2014/1/19}, P2 = {2014/1/20 only}, and P3 = {2014/1/21 only}. The three datasets are summarized below:

Data set Training Validation Testing

# lending actions 4,208,410 13,085 16,916

# lenders 41,875 41,875 41,875

# loans 587,901 2143 2778

Our model is trained following typical procedures: We first train on the training set, and then perform parameters tuning based on the AUC metrics on the validation set. We chose K = 4 and learning rate(α) = 0.05. Then we fixed the parameters, retrain the model on the union of training and validation set, and then perform prediction on testing set.

5.2 Performance Metrics

5.2.1 AUC Metrics

We choose the Area-under-ROC-curve (AUC) as the evaluation metrics for ranking accuracy, as it is one of the most popular metrics to evaluate a ranking problem such as OCCF.

AUC := 1

|U | X

u∈U

1

|E(U )|

X

(i,j)∈E(u)

δ(ˆyui> ˆyuj), (1)

where the evaluation pairs E(u) per user is defined as {(i, j)|(u, i) ∈ SValidation, (u, j) /∈ STraining∪ SValidation} .

5.2.2 Standard Deviation of Ratings

Here we consider whether each loan can be fairly recommended to all of the lenders. Assuming our recommendation system suggests a constant amount of K loans to each lender, which can be done easily in our model by choosing the loans of the top-k predicted ratings for each lender. Then we can gather the “recommendation statistics” of loans by counting how many times each of them has been suggested

0 50 100 150 200

0.5 0.55 0.6 0.65 0.7

iteration

AUC

BPRMF IReg Method Fairness−aware BPRMF

0 50 100 150 200

1000 1500 2000 2500 3000

iteration

Top30 Std

BPRMF IReg Method Fairness−aware BPRMF

Figure 3: The AUC and Std through learning iterations

0 50 100 150 200

0.5 0.55 0.6 0.65 0.7

iteration

AUC IReg Method

IReg Method j10 IReg Method j20 Fairness−aware BPRMF Fairness−aware BPRMF j10 Fairness−aware BPRMF j20

0 50 100 150 200

1000 1500 2000 2500 3000 3500 4000

iteration

Top30 Std

IReg Method IReg Method j10 IReg Method j20 Fairness−aware BPRMF Fairness−aware BPRMF j10 Fairness−aware BPRMF j20

Figure 4: The AUC and Std of the models under different sample sizes and number of reference lenders

to the lenders. Plotting the histogram on such statistics, a horizontal line is preferable since it implies all loans have equal chance to be recommended. We would not like to see a skewed distribution since it means some loans have received much more attention, and thus jeopardizes the chance of other loans to be funded. Here we simply choose the standard division of the chances each loan being recommended to evaluate whether the idea of fairness has been realized, the lower the better. Note that examining this measure itself is meaningless as one can always ‘enforce’ fair recommendation without considering the quality of prediction. Our goal is to do so without affecting the original accuracy of a recommender.

Our validation results show that the relatively performance of three methods are almost the same in different K. Therefore, we report the result with K = 30.

5.3 Evaluation Results

We train the models for 1, 000 iterations, and record the best AUC and best Std during the process. Table 2 presents the best AUC for each model, and the corresponding Std.

It shows that the fairness-aware BPRMF sacrifices the AUC slightly to gain significant improvement on the Std. The item-based regularized method performs roughly the same as BPRMF. We believe the reason is that, although the item regularization seems to be reasonable, it is too costly to iden- tify an optimal parameter due to high computational cost, as it took much longer amount of time to train comparing to other methods.

Figure 3 plots the AUC (left) improvement as well as the decrease of Std (right) through iterations. Within 100 iterations, all methods reach the top in AUC and also saturate on the decrease of Std. Figure 4 reports the sensitivity analysis on two parameters: the negative sampling size and the number of reference lenders N in fairness-aware BPRMF.

The results show that the outcomes are not very sensitive to those values, suggesting that our SGD (Stochastic Gradi- ent Descent) method can work well even on small sampling size to further improve the efficiency.

To summarize, the results show that the Fairness-aware

(8)

1.13

0.98 0.92

1.06 1.00 0.97

0.98 1.15 0.80

F M group

group M F

Borrower gender

Lender gender

1.00 1.03 1.00

1.38

1.01 0.82

1.01 0.97 1.01

1.15

0.96 0.83

0.83 0.96 1.19

1.09

0.95 0.84

0.99

1.05 0.99

0.93 0.91 1.24

1.17 0.94 0.91

1.06 1.21 0.75

1.05

0.87 0.98

1.02 1.00 0.93

1.05 0.99 0.98

0.90 1.10 0.84

1.08

1.15 0.86

Agriculture Arts Clothing Construction Education Entertainment Food Health Housing Manufacturing Personal Use Retail Services Transportation Wholesale

group M F

Loan sector

Lender gender

Figure 5: Lenders’ choices over gender and sector of candidate loans

BPRMF improve the proposed problem effectively, and retain the accuracy of prediction.

6. FURTHER LOOK ON LENDER DIVER- SITY

Having shown that our recommendation algorithm, which adopts a collaborative filtering approach, can suggest ap- propriate loans to a lender with fairness taken into account, here, we further look at the lenders’ diversity in their behaviors in order to 1) back up the need of sophisticated recommenders, and 2) explore further opportunities to enhance our proposed recommender system.

6.1 Gender-Gender Interaction

Kiva provides detailed information for each borrower, but does not do so for lenders. From the dataset, we know only the names of lenders, but their gender information are not available. Therefore, to understand what role gender plays in lenders’ selection behavior, we have to infer the lenders’

genders based on their first names. Based on Baby Name Statistics listed on The United States Social Security Ad- ministration Website², we infer the lenders’ genders with the following rule: If a first name is used only by women (or men), we will certainly consider the lender to be a woman (or man). If the first name is much more popular among women by an order of magnitude (i.e., ten times more popular), we consider the lender to be a woman, and vice versa; otherwise, we give up inferring the gender for this particular lender in order to ensure the quality of inference. This results in 406,414 female- and 453,392 male-lenders out of 1,160,739 individual lenders, where the first names of 300,933 (25.9%) of them cannot be confidently determined as men or women.

The gender preferences of lenders are by no means uniformly distributed, as was shown in Figure 5. With 1 refer- ring to perfect indifferent preference, i.e., lenders contribute to a loan regardless of the borrower’s gender, the values in the figure suggest a gender-homophily preference, as we find that male lenders are more likely to choose male borrowers (1.15 > 1) and likewise female lenders choose female borrowers (1.13 > 1). Moreover, group lenders also tend to select group borrowers.

The biased preferences of lenders are also found in their choices in loan sectors. While female borrowers are easier to be funded when they are engaging is arts-related business (1.38 > 1), health (1.24) and housing (1.17), male borrowers

2http://www.ssa.gov/oact/babynames/limits.html

are more successful in being funded when they propose to conduct manufacturing business (1.21), construction (1.15) and transportation (1.10). The difference between males and females in loan sectors that they are more likely to be funded suggests that lenders may posses a stereotype of the kinds of business they think males and females would be good at respectively.

Evidences from the two figures show that lenders possess their own preferences of borrowers and the choices of their sponsorship are by no means uniformly distributed. Consid- ering the fact, it is thus important that further recommendation systems should put gender into consideration in order to make correct suggestions as well as diversify the choices of loans.

6.2 The Crowding-Out Effect in Field

We begin the analysis by observing the differences of lenders in their behavior support female- and male-loans. Below we will restrict to repetitive lenders who have supported at least 10 loans. This leaves us 57,202 (4.7%) out of all the lenders in the reduced dataset. Having shown that male and female lenders favor male and female borrowers respectively; however, how individual lenders make their funding decisions remain unanswered. To do so, we plot the graph in Fig- ure 6(a) to observe how supportive toward female borrowers for the lenders. In the graph, the histogram shows the ratio of female-loans for each lender, where the red curve represents the cumulative sum of the proportions and the vertical dashed line stands for the proportion of female-loans out of all available loans. For example, supposing that a lender contributes to 20 loans where 10 are from male borrowers, the ratio of female-loans for this particular lender would be 0.5. While female-loans occupy approximately 75% of the loans, the graph reveals that more than 25% of lenders only contribute to female-loans and more than half of the lenders (determined by the intersection of the red curve and the vertical dashed line) show funding preferences toward female borrowers with a ratio of female-loans higher than 0.75.

We apply the same analysis to lenders’ preferences toward more prevalent sectors and countries. We find that on con- trary to the gender aspect, the sector and country aspects express a different phenomenon. The middle plot in Fig- ure 6(a) shows the histogram of lenders’ ratio of loans from the top-1 sector (i.e., food) in terms of number of loans.

The food-related loans occupy approximately 25% of loans;

thus, if all the lenders equally distribute their contributions to all types of loans, the ratios will be exactly 0.25 for all the lenders. However, this is not the case. According to the graph, nearly 60% of lenders show less interests (com- pared with an equal distribution) to food-related loans. The same observation applies to the country aspect (the right- most plot in Figure 6(a)), as nearly 75% never contributed to loans from the top-1 country, Philippine, and more than 80% of lenders show less interests to these loans.

We further extend our observations to top-N sectors and countries and record the proportion of lenders who show less interests to loans from the top-N sectors/countries as ratio of crowding-out-lenders. The results are shown in Fig- ure 6(b), respectively, and suggest that the lenders exhibit the so-called crowding out phenomenon [2,4,15,24]. In other words, lenders tend to avoid high-profile loans and support (relatively) less visible loans as lenders would feel less honored to help when realizing that their help does not count