Developing recommender systems with the consideration of product profitability for sellers

(1)

Developing recommender systems with the consideration

of product proﬁtability for sellers

Long-Sheng Chen

a

, Fei-Hao Hsu

b

, Mu-Chen Chen

c,*

, Yuan-Chia Hsu

d

a

Department of Information Management, Chaoyang University of Technology, 168 Jifong E. Road, Wufong Township Taichung County, 41349 Taiwan, ROC

b

Institute of Commerce Automation and Management, National Taipei University of Technology, 1, Section 3, Chung-Hsiao E. Road, Taipei 106 Taiwan, ROC

c

Institute of Traﬃc and Transportation, National Chiao Tung University, 4F, 118, Section 1, Chung-Hsiao W. Road, Taipei 10012 Taiwan, ROC

d

CIM Development Section, MIT Department, Inotera Memories, Inc., Hwa Ya Technology Park, 667, Fu Hsing 3rd Road, Kueishan, Taoyuan, Taiwan, ROC

Received 19 April 2007; received in revised form 13 August 2007; accepted 30 September 2007

Abstract

In electronic commerce web sites, recommender systems are popularly being employed to help customers in selecting suitable products to meet their personal needs. These systems learn about user preferences over time and automatically suggest products that fit the learned model of user preferences. Traditionally, recommendations are provided to customers depending on purchase probability and customers’ preferences, without considering the profitability factor for sellers. This study attempts to integrate the profitability factor into the traditional recommender systems. Based on this consideration, we propose two profitability-based recommender systems called CPPRS (Convenience plus Profitability Perspective Recom-mender System) and HPRS (Hybrid Perspective RecomRecom-mender System). Moreover, comparisons between our proposed sys-tems (considering both purchase probability and profitability) and traditional syssys-tems (emphasizing an individual’s preference) are made to clarify the advantages and disadvantages of these systems in terms of recommendation accuracy and/or profit from selling. The experimental results show that the proposed HPRS can increase profit from cross-selling without losing recommendation accuracy.

Keywords: Electronic commerce; Recommender systems; Product proﬁtability; Collaborative ﬁltering; Personalization; Cross-selling

1. Introduction

In recent years, recommender system (RS) is rapidly becoming a core tool to accelerate cross-selling and strengthen customer loyalty[37,38]due to the prosperity of electronic commerce. Enterprises have been

* _{Corresponding author. Tel.: +886 2 23494967; fax: +886 2 23494953.}

E-mail addresses:lschen@cyut.edu.tw(L.-S. Chen),ittchen@mail.nctu.edu.tw(M.-C. Chen). Information Sciences 178 (2008) 1032–1048

(2)

developing new business portals and providing large amount of product information to create more business opportunities and expand their markets[10,22,31]. However, it results in information overload problem which has become the burden of customers when making a purchase decision among a huge variety of products

[21,43]. Researchers have developed various techniques to solve this problem. An RS, which can learn

about user preferences and automatically suggest products ﬁtting customers’ needs, is one of the possible solutions.

Electronic commerce encounters a challenge of how to properly utilize personalization systems based on the users’ preferences to attract more customers [20]. RSs have been widely used in many web sites, such as Amazon.com, CDNOW.com, GroupLens, MovieLens, etc. [36]. Most of RSs adopt two types of tech-niques, the content-based ﬁltering (CBF) and collaborative ﬁltering (CF) approaches [39]. With the CBF approach, one tries to recommend items similar to those a certain user has liked in the past [29,36]. To develop RSs, CF may be the most successful and popular approach [12,25]. For the case of retail transac-tion dataset, Mild and Reutterer [34] developed an improved CF algorithm for the binary market basket data. In Mild and Reutterer [34], the CF approach is capable to predict multiple item choices at the indi-vidual user level.

The CF models can be constructed based on users or items [17]. Recommendations by CF models can be based on the ratings of items and behaviors of users[36]. In the CF approach, one identiﬁes users whose tastes are similar to those of the certain user and recommends items, which they have liked [41]. The CF based RSs have been very successful in both information ﬁltering and electronic commerce domains

[35]. Consequently, this study utilizes the CF approach to build RSs, and they are applied to the retailing sector.

In addition, most recommendations are traditionally made merely based on purchasing possibility and customers’ preferences. However, it is not enough for an enterprise, because the possibility and preferences should not be the only concerns to enterprises. Profit margin is another crucial factor for sellers. Therefore, we attempt to integrate the profitability factor into traditional systems. The proposed systems are not primar-ily intended to replace the traditional ones. Considering both the profitability of sellers and the purchase prob-ability of users, this study intends to more properly balance the views between customers and sellers. Therefore, we propose two RSs for retailing called the ‘‘Convenience plus Profitability Perspective Recom-mender System (CPPRS)’’ and the ‘‘Hybrid Perspective RecomRecom-mender System (HPRS).’’ Two indexes, ‘‘prod-uct profitability’’ and ‘‘profit from cross-selling’’, are also used to evaluate the proposed systems. Moreover, comparisons between the proposed systems (considering both purchase probability and profitability) and tra-ditional systems, the Convenience Perspective Recommender System (CPRS)’’ and the ‘‘Collaborative Filtering Perspective Recommender System (CFRS) (emphasizing an individual’s preference), are made to clarify the advantages and disadvantages of these systems in terms of recommendation accuracy and/or profit from cross-selling. The experimental results show that the proposed HPRS can increase profit from cross-selling without losing recommendation accuracy.

2. Recommender systems

Recently, data mining has increasingly become an important research area in information science and tech-nology (e.g., [32,51–54]). Recommender system (RS) is one of the important techniques in data mining. RS utilizes the opinions of users in a community to help individuals in the same community more efficiently iden-tify content of interest from a potentially overwhelming set of choices [8,39]. RS can enhance the electronic commerce sales in three ways [43] including reinforcing the use of browsers in buyers, intensifying the cross-selling effect and increasing a customer’s loyalty. Generally speaking, RS can be defined as a system that helps users find the product items they want by making recommendations based on either the content of the recommended items (with CBF), or ratings of similar customers on the recommended items (with CF)[15,36]. In CBF, both product contents and customer preferences must be analyzed for giving recommendations

[6,7]. The CBF approach encounters a diﬃculty of introducing new items to users [29,36], that is so called

the overspecialization problem in Balabanovic and Shoham [3]. Applications of CBF can be found in the related works such as Infoﬁnder [26] and NewsWeeder [28]. The products which can be recommended by CBF are much narrower than that of CF [50]. The CF approach can recommend products with any type

(3)

of content since the contents of products do not have to be analyzed[2,6,15]. This feature makes the CF model more practical than the CBF model in retailing sector.

Goldberg et al.[14]initially used the term ‘‘collaborative ﬁltering’’ when developing the Tapestry recom-mender system, which is employed to solve the problem of e-mail overload. The original CF simply refers to a system where people help each other ﬁlter their e-mails by recording their reactions to the documents they have read. Additionally, Kohrs and Merialdo[24], Kuo and Chen[27], and Lee et al.[30]developed CF based RSs. Tapestry[14], GroupLens[25], Siteseer[40], Ringo[45], and Phoaks[46]are some RSs using CF. A com-prehensive taxonomy of various RSs on the Internet can be found in Montaner et al.[36]. They have also ana-lyzed 37 systems in terms of 8 dimensions to collect the elements of RSs. More recently, Vozalis and Margaritis [48] proposed an improved CF approach integrating Singular Value Decomposition, and Xie et al.[49]developed the distributed CF algorithm with a higher scalability.

The shortcomings of CBF and CF systems were discussed by Montaner et al.[36], and these limitations initiate the hybrid systems. For example, Balabanovic and Shoham[3] and Lawrence et al. [29] developed the hybrid systems by integrating CBF and CF. The association rule algorithm[1]is often applied in market basket analysis in which one analyzes how the items purchased by customers are associated. In some RSs, association rule mining is employed to recommend products[6,41]. This is especially true for collaborative ﬁl-tering domains. Therefore, association rule algorithm is as well employed to build RSs. Cornelis et al.[11]

adopted fuzzy logic to propose a conceptual framework for recommending the one-and-only item which is only one single instance. A special case of the trade exhibition of e-government was used to illustrate this con-ceptual framework of RS.

In RS, sparsity generally indicates the item-to-user ratio is extremely high (i.e., most users do not rate most items)[33]. The sparsity problem is particularly difficult to deal with in the very beginning of system operation. The cold-start (namely first-rater in[33]) is another difficult issue to introduce new items or make recommen-dations to new users[44]. As indicated in Schein et al.[44], the problem of new user is symmetric to that of new item if users’ profiles can be accessed.

Melville et al. [33] addressed the sparsity and cold-start problems in CF by developing a hybrid system which applies content-based forecasts to transfer a sparse user ratings matrix into a full ratings matrix for rec-ommendations. With the experimentation on movie recommendations, the hybrid system outperforms CF and CBF systems. To address the cold-start problem, Schein et al.[44]also integrated the content and collabora-tive information to develop a hybrid system which utilizes the expectation maximization learning to approx-imate a model for a movie dataset. Breese et al. [5] evaluated some RSs with three datasets of MS Web, Neilsen Television and EachMovie. Their results indicated that the availability of votes may influence the favorite of RSs. Kim et al.[23]reported that RS can benefit from combining CF and CBF. They also proposed a hybrid system in which the clustering technique is adopted to obtain users’ profiles.

Tso and Schmidt-Thieme[47]developed three RSs in which the ﬁrst two hybridizes CBF and CF, namely Sequential CBF and CF and Joint Weighting of CF and CBF. The Joint Weighting of CF and CBF method predicts the extent of a user likes the item attributes (content) rather than the class or rating of an item based on attributes. The third is Attribute-Aware Item-Based CF which combines the user ratings and item attributes to calculate the similarities between items. From the experimental results by Tso and Schmidt-Thieme[47], Joint Weighting and Attribute-Aware outperform Sequential CBF and CF. Breese et al.[5]used the concept of inverse user frequency to determine the item weights. The generally voted items have smaller weights than the less voted ones since they are more representative to users’ preferences. Bradley and Smyth[4] simulta-neously took similarity and diversity into account by the weighting mechanism. Sarwar et al.[42]reported that the item-based CF could outperform the user-based CF by observing their experimental results. For CF, Jin et al.[18]proposed an automatic approach to determine the weights for diﬀerent items by learning weights from the item ratings given by users. Their approach primarily learns item weights to make similar users closer, while unlike users more isolated. Ghani and Fano[13]developed an RS for apparels which can learn customers’ tastes with the semantic attributes of products. These product attributes are collected from retailer websites.

Besides, in practice, profitability may be a primary concern for sellers, but it is not considered in the tra-ditional RSs. Therefore, this study is to discuss the feasibility and benefits when integrating the profitability factor into the traditional CF-based systems. More discussions related to different concerns when building RS are provided in next section.

(4)

3. Diﬀerent perspectives on recommender systems

In this study, we propose two RSs called ‘‘Convenience plus Profitability Perspective Recommender System (CPPRS)’’ and ‘‘Hybrid Perspective Recommender System (HPRS)’’ based on both purchase probability and product profitability factors. In this section we will compare the proposed systems and the traditional RSs in order to demonstrate the effectiveness of our proposed methods.

3.1. Diﬀerent perspectives on recommendations

Recommendations can be made from more than one perspective. The existing RSs make recommendations based mainly on purchase probability and assumed items with high purchase probabilities that are likely to satisfy the customers’ needs. Therefore, these systems can help sellers to easily define frequently purchased items. In other words, recommendations based on the purchase probability are made from ‘‘the convenience perspective of the seller’’. On the other hand, if a similar customer purchase probability is based on the pref-erences of the individual customer, then it is from ‘‘the buyer’s perspective’’. In addition, ‘‘the seller’s profit perspective’’ emphasizes that when an enterprise employs an RS, it should not only satisfy the different needs of customers but also provide a higher profit margin.

In summary, one of the objectives of this paper is to study the eﬀect of RS from diﬀerent perspectives shown

asTable 1. From this table, it’s easy to ﬁnd that the present paper discusses four RSs for retailing and their

considerations. These systems are listed as follows:

1. Convenience Perspective Recommender System (CPRS) is a convenient approach for sellers to recommend frequently purchased products. It only considers purchase probability.

2. Collaborative Filtering Perspective Recommender System (CFRS) recommends products depending only on the purchase probability by similar customers.

3. Convenience Plus Proﬁtability Perspective Recommender System (CPPRS) recommends products based on both purchase probability and the product proﬁtability.

4. Hybrid Perspective Recommender System (HPRS) makes recommendations by pondering both the purchase probability of similar customers and the product proﬁtability.

3.2. Traditional recommender systems

This section describes the procedures of two traditional RSs, CPRS and CFRS [8,14]. The basic process of the CF approach is shown in Fig. 1. In the beginning, we measure the similarity among customers. Then the CF approach computes purchase probability based on the transaction records of similar customers. Finally, recommendations are made according to this probability. In other words, ﬁrst, both the CPRS and the CFRS calculate and sort purchase probabilities. Second, N product items with the highest N pur-chase probabilities are determined and then these items are recommended. The detailed procedures are dis-cussed below.

Table 1

Summary of four recommender systems

RS Perspectives Considerations Degree of personalization

CPRS Seller’s perspective Purchase probability Non-personalization CPPRS Buyer’s perspective Purchase probability Non-personalization

Seller’s perspective Product proﬁtability

CFRS Buyer’s perspective Purchase probability Relatively higher degree of personalization HPRS Buyer’s perspective purchase probability Relatively lower degree of personalization

(5)

3.2.1. Computation of purchase probability and similarity

Before introducing the procedures of RS, we should discuss the computation of purchase probability and similarity. For the binary basket data, the Jaccard coeﬃcient is usually adopted as the similarity measure

[19,34]. The Jaccard coeﬃcient is deﬁned in Eq.(1).

xðt; jÞ ¼nðct\ cjÞ nðct[ cjÞ

¼ nðct\ cjÞ nðctÞ þ nðcjÞ nðct\ cjÞ

ð1Þ

where x(t, j) is the similarity between target customer t and customer j, t 5 j, j = 1, 2, . . . , Nc; n(ct\ cj) is the

number of items purchased by both target customer t and customer j; n(ct[ cj) is the number of items

pur-chased by target customer t or customer j; n(ct) is the number of items purchased by target customer t;

n(cj) is the number of items purchased by customer j.

There are two kinds of purchase probabilities: frequency-based probability and similarity-based probabil-ity. The frequency–based purchase probability employed in CPRS and CPPRS is deﬁned as Eq.(2).

Pi¼

Ri

NB ð2Þ

where Piis the purchase probability of product item i, i = 1, 2, . . . , NI; Riis the purchase frequency of product

item i; NB is the total number of market baskets.

The similarity-based purchase probability[5,34]employed in CFRS and HPRS is deﬁned in Eq. (3). Pt;i¼ j

X

j

xðt; jÞcj;i ð3Þ

where Pt,iis the probability that target customer t purchases item i; j is a normalizing factor to ensure the

absolute values of weights sum to unity; x(t,j) is the similarity between target customer t and customer j; cj,iis the binary choice that customer j purchases item i or not. More detailed information can be found in[5].

cj;i¼

1 if customer j purchased item i; 0 otherwise:

ð4Þ

3.2.2. CPRS

The CPRS follows the following procedure, as illustrated inFig. 2.

Step 1: Compute frequency-based product purchase probabilities (Pi) with respect to Eq.(2).

Step 2: Sort the frequency-based purchase probabilities in descending order. Step 3: Find the largest N product purchase probabilities.

Step 4: Find N items to recommend with the largest N product purchase probabilities. Step 5: Return N recommended items.

3.2.3. CFRS

The procedure of the CFRS is shown inFig. 3. The only diﬀerence from the CPRS is the computation of purchase probability. The CPRS utilizes a similarity-based probability instead of a frequency-based probabil-ity. The CFRS is made up of the following ﬁve steps.

Transaction database Measure similarities between customers Find similar customers Compute purchase probability of products Recommend products according to purchase probability

(6)

Step 1: Compute similarity-based product purchase probabilities (Pt,i) with respect to Eq.(3).

Step 2: Sort similarity-based purchase probabilities in descending order. Step 3: Find N highest similarity-based product purchase probabilities.

Step 4: Find N items to recommend with N largest similarity-based product purchase probabilities. Step 5: Return N recommended items.

From the previous studies, the hybrid systems of combining CF and CBF may potentially resolve some of the difficulties of sparsity and cold-start. However, many additional data such as product characteristics and customer profiles need be collected and analyzed. In the retail context with a wide array of products, it is very difficult to collect and analyze these data. From the beginning, RS can generate acceptable results if the users’ data are well-realized[36]. Users may not intend to spend much time to describe their profiles. Although the users’ profiles are important to RS, they are not easy to initialize and maintain. From the survey in[36], the mechanisms of gaining users’ data are diverse and vary from manual input to automatic generation.

i P i P i P i P Compute

Recommend Items by Max. N Sort

Return N Recommended Items Find Max. N Fig. 2. Procedure of CPRS. i t P, i t P, i t P, i t P, Compute

Return N Recommended Items Find Max. N

(7)

Introducing new customers and new items can be treated as special cases in this study. In the case of new customers, we can recommend the most frequently bought and/or most proﬁtable items to them. In the case of new items, we can recommend them to the customers having more transactions and/or promote the new items by marketing campaigns.

Incorporating the item attributes (content) can improve the performance of RSs as well as relax some of the problems of sparsity and cold-start. As mentioned above, it is however very difficult to collect and analyze item attributes in the retail context with a wide array of products. Additionally, the preference of customers to the product profitability (computed from the product content of price and cost) differs from that of sellers. This study incorporates the product profitability with respect to the preference of sellers, and takes it as a part of performance measure when making recommendations.

4. Proposed proﬁtability based recommender systems

This section describes the proposed profitability based RSs, CPPRS and HPRS. In addition to the tradi-tional systems, our proposed systems consider the profit margin (product profitability) of the seller in the rec-ommendation. The profitability of product item i can be measured by the profit margin of product item i, and it is defined in Eq.(5).

Mi¼ Ri Ci ð5Þ

where Miis the proﬁt margin of product item i, i = 1, 2, . . . , NI; Riis the unit price of product item i; Ciis the

unit cost of product item i. The product of probability and proﬁtability is computed to obtain the average margin. The average margins are then sorted, and the CPPRS and HPRS recommend items that depend on them.

4.1. CPPRS

The CPPRS considers both frequency-based purchase probability and proﬁtability. The steps of the CPPRS are shown inFig. 4and are described as follows:

Compute Compute

Compute

Recommend Items by Max. N Sort i P Mi ) (Pi×Mi ) (Pi×Mi ) (Pi×Mi

Return N Recommended Items Find Max. N (Pi×Mi)

(8)

Step 1: Compute the frequency-based purchase probability (Pi) and the product proﬁtability (Mi) with respect

to Eqs. (2) and (5), respectively.

Step 2: Compute the frequency-based average margin with (Pi· Mi).

Step 3: Sort the frequency-based average margins in descending order. Step 4: Find the N largest frequency-based average margins.

Step 5: Find the recommended N items with the N largest frequency-based average margins. Step 6: Return the N recommended items.

4.2. HPRS

The HPRS employs the similarity-based purchase probability and proﬁtability. The HPRS procedure is shown inFig. 5and is described as follows:

Step 1: Compute the similarity-based purchase probability (Pt,i) and the product proﬁtability (Mi) with

respect to Eqs.(3) and (5), respectively.

Step 2: Compute the similarity-based average margins with (Pt,i· Mi).

Step 3: Sort the similarity-based average margins in descending order.

Step 4: Find the N largest similarity-based average margins in descending order.

Step 5: Find the N recommended items with the N largest similarity-based average margins in descending order.

Step 6: Return the N recommended items.

4.3. Comparisons between diﬀerent perspectives

The comparisons between/among four RSs have been summarized in Table 2. The CPRS and CFRS are existing RSs. The CPPRS and HPRS introduce the proﬁtability factor for sellers. The CPRS and CPPRS are non-personalized because all customers are recommended with the same products. CFRS and HPRS are personalized because (a) for each customer, a set of similar customers (neighbors) is found based on a

i t P, Mi ) (Pt,i×Mi ) (Pt,i×Mi ) (Pt,i×Mi ) (Pt,i×Mi Compute Compute Compute

Return N Recommended Items Find Max. N

(9)

similarity measure; (b) the neighbors of each customer are diﬀerent; (c) products recommended to customers are diﬀerent; and (d) tailored product recommendations are designed.

The CFRS only considers individual customer’s preference data (transaction records) regardless of product proﬁtability. In addition to individual customer’s preference data (personal information), the HPRS takes product proﬁtability (non-personalized information, and overall product information) into consideration. As a result, the CFRS possesses a higher degree of personalization than the HPRS.

In addition, the parameters related to these four RSs are k and N. Parameter k is the number of customers similar to target one, and parameter N is the number of recommended items based on the customers’ market baskets. In the CPRS and CPPRS, N is the only parameter that has to be determined. In the CFRS and HPRS, both parameters k and N have to be determined.

In terms of recommendation accuracy and/or proﬁt from cross-selling, comparisons can be made among systems (CPRS vs. CPPRS, CPRS vs. CFRS, CFRS vs. HPRS, CPRS vs. CFRS vs. HPRS, CPPRS vs. CFRS vs. HPRS). The measures are as well shown inTable 2.

5. Experimentation 5.1. Dataset

The sample database, FoodMart, in Microsoft SQL Server 2000 is used to evaluate these four systems. Because the dataset is obtained from a supermarket, the data sparsity problem is common in this domain. Although, the dataset contains 20,522 market baskets of 5581 customers on 1559 items, the average item included in single market basket is only 4, and the average item purchased by single customer is only 12. The CF approaches cannot recommend products which are less purchased[2,6,15]. The data reduction there-fore is necessary to address the data sparsity problem. The data are reduced if items are purchased infre-quently, market baskets contain very few items, and customers purchase very few items. The reasons for reducing these three kinds of data are listed as follows:

1. Item reduction: In the original dataset, there may be some items appearing seasonally in a short time. Infre-quently purchased items are reduced to discard some special events (e.g., promotion campaign and seasonal eﬀect). Besides, infrequently purchased items have little chance to be recommended.

2. Market basket reduction: Collaborative ﬁltering is performed with extensive computation. It is necessary to reduce the market baskets containing very few items since there are 1559 items in the original dataset. This reduction can decrease the computation cost when performing recommendations.

3. Customer reduction: Before reduction, there are 5581 market baskets. Rejecting customers who purchase very few items can decrease the infrequent customers, and reduce the computation cost of forming customer neighborhood.

Table 2

Comparisons between/among four recommender systems

Measures Reasons of comparison

CPRS vs. CPPRS Recommendation accuracy • Comparison between non-personalized recommendations

• The eﬀect on recommendation accuracy when product proﬁtability is additionally taken into consideration

CPRS vs. CFRS Recommendation accuracy • Comparison between the seller’s convenience and the buyer’s preference perspective

CFRS vs. HPRS Recommendation accuracy & proﬁt from cross-selling

• Comparison between personalized recommendations

• The effects on recommendation accuracy and profit from cross-selling when product profitability is additionally taken into consideration CPRS vs. CFRS

vs. HPRS

Recommendation accuracy • Comparison between personalized and non-personalized recommendations

• The eﬀect of personalization degree on recommendation accuracy CPPRS vs. CFRS

vs. HPRS

Recommendation accuracy • Comparison between personalized and non-personalized recommen-dations

(10)

The average number of items included in a market basket and the average number of items purchased by a customer are very small (respectively 4 and 12), as well as the market basket size in reduced dataset ranges from 2 to 12, the range of N is therefore reasonably set to 2–12.

The reduced dataset contains 6845 market baskets of 895 customers with 859 product items. To provide more detailed information about the data format used in this study, additional five tables (Tables 3–7) of data samples including the historical transactions in training dataset, transactions in testing dataset, recommended items in testing dataset, product profitability and profit from cross-selling based on one single transaction have been added. These data can be obtained and formatted from FoodMart database.

Table 3

Samples of historical transactions in training dataset

Customer ID Item set

C001 {I001, I003, I005, I007, I009}

C002 {I002, I004, I006, I008, I010}

C003 {I001, I002, I003, I004, I005}

Table 4

Samples of transactions in testing dataset

Transaction ID Customer ID Item set

T101 C001 {I001, I002}

T102 C001 {I001, I004}

T103 C002 {I002, I003}

T104 C003 {I005, I006}

Table 5

Samples of recommended items in testing dataset

Transaction ID Recommended item set

T101 {I001, I002}

T102 {I002, I004}

T103 {I003, I004}

T104 {I003, I004}

Table 6

Samples of product proﬁtability

Item ID Proﬁtability I001 1.2 I002 1.5 I003 1.8 I004 2.1 I005 2.0 I006 3.0 I007 3.2 I008 3.1 I009 1.8 I010 1.5

(11)

5.2. Evaluation measures

The measure, F1, which combines Precision and Recall, is usually adopted as the evaluation measure of recommendation accuracy in the related literature[9,41]. Precision, Recall and F1 are mathematically deﬁned in Eqs.(6)–(8), respectively.

Precision¼nðPI \ RIÞ

nðRIÞ ð6Þ

Recall¼nðPI \ RIÞ

nðPIÞ ð7Þ

F1¼2 Precision Recall

Precisionþ Recall ð8Þ

where PI represents all items contained in one specific market basket, and RI stands for the items mended to one specific market basket. Precision is defined as the ratio of the number of correctly recom-mended items (i.e., the number of items recomrecom-mended really purchased by customers) to the total number of recommended items. Recall is defined as the ratio of the number of correctly recommended items to the total number of purchased items. F1 is adopted because Precision and Recall are in conflict with each other in nature. Increasing RI leads to an increase in Recall but a decrease in Precision. Precision, Recall and F1 are computed for each individual market basket.

Since in addition the product profitability of sellers is also considered in the proposed RSs, the profit from cross-selling is also used in evaluating the proposed systems. Total profit from cross-selling can be defined as Eq.(9).

TPCS¼X

b

PCSb ð9Þ

where TPCS is the total proﬁt from cross-selling; PCSbis the proﬁt from cross-selling based on one single

transaction (market basket) b, b = 1, 2, . . . , Nb.

5.3. Experimental results

5.3.1. Comparison between CPRS and CPPRS

In order to study the impact of profitability factor on non-personalized systems, the CPRS and CPPRS are compared in terms of recommendation accuracy. Due to the fact that the market basket size of the testing dataset ranges from 2 to 12, experiments of the CPRS and CPPRS are made with N (the number of recom-mended items) ranging from 2 to 12. The results of the CPRS and CPPRS are shown inFig. 6. From this fig-ure it is easy to understand that in most cases the F1 of the CPPRS with different recommended items is significantly lower than that of the CPRS. This indicates that the additional consideration of product profit-ability in the non-personalized system results in inferior recommendation accuracy.

5.3.2. Comparison between CPRS and CFRS

To only compare the items being recommended by each system from a seller’s convenience perspective and only from a buyer’s preference perspective, we must evaluate the recommendation accuracies of the CPRS and CFRS. Fig. 6 shows that the recommending of eight items (N = 8) in the CPRS result in the best F1 of

Table 7

Samples of proﬁt from cross-selling based on one single transaction

Transaction ID Recommended and never purchased item set Proﬁt from cross-selling (PCSb)

T101 {I002} 1.5

T102 {I004} 2.1

T103 {I003} 1.8

(12)

0.00642. Therefore, we ﬁx N to 8 and then compare the CFRS and the CPRS for a set of k (the number of customers similar to the target one).Table 8summarizes the computational results of the CFRS. From this table, all the recommendation accuracies F1 of the CFRS (from k = 10 to 100) are higher than the best result of the CPRS (0.00642). Therefore, the buyers’ preference perspective has a positive impact on RSs in terms of recommendation accuracy.

5.3.3. Comparison between CFRS and HPRS

To investigate the impact of additional proﬁtability factor on the personalized systems, CFRS and HPRS are compared in terms of recommendation accuracy (F1) and proﬁt from cross-selling (TPCS). In both the CFRS and HPRS, the parameter settings of k and N are determined by a set of pilot runs. Parameters k and N are then set from 10 to 100 (interval is 10) and from 2 to 12, respectively. For each k, only the com-bination of (k, N) with the best accuracies of CFRS and HPRS are listed inTables 9 and 10.Table 9shows that in most cases the recommendation accuracy of HPRS is higher than that of CFRS. The maximum F1 of HPRS (0.0143) is better than that of CFRS (0.0131).

The total profits from cross-selling (TPCS) of CFRS and HPRS are summarized inTable 10. From this table, it is evident that the total profits of the HPRS are significantly larger than that of the CFRS. Even the worst TPCS of HPRS (83.6022) is larger than the best one of CFRS (76.1147). Averagely speaking, the profit from cross-selling of HPRS dramatically increases 432% compared with CFRS. Meanwhile, asFig. 7

shows, the average F1 of HPRS (0.0125) is slightly better than that of CFRS (0.0099). Therefore, when addi-tionally considering product proﬁtability, the proﬁt from cross-selling can be improved remarkably without decreasing the recommendation accuracy.

0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 2 3 4 5 6 7 8 9 10 11 12

N (number of recommended items)

A v g. F 1 CPRS CPPRS

Fig. 6. Experimental results of CPRS and CPPRS on N.

Table 8

Experimental results of CFRS (with N ﬁxed to 8)

k N F1 Precision Recall 10 8 0.0067 0.0113 0.0057 20 8 0.0083 0.0145 0.0068 30 8 0.0073 0.0139 0.0054 40 8 0.0067 0.0123 0.0051 50 8 0.0083 0.0150 0.0062 60 8 0.0093 0.0172 0.0071 70 8 0.0102 0.0188 0.0077 80 8 0.0085 0.0161 0.0063 90 8 0.0086 0.0161 0.0065 100 8 0.0091 0.0166 0.0068

(13)

5.3.4. Comparison among CPRS, CFRS and HPRS

The recommendation accuracies of CPRS, CFRS and HPRS are compared to determine the impact of the degree of personalization on RSs. To compare the CPRS, experiments with the CFRS and the HPRS are run on a set of k with ﬁxed N = 8.Fig. 7illustrates the accuracies of both the CFRS and the HPRS.

Table 9

Recommendation accuracies of CFRS and HPRS

CFRS HPRS

(k, N) F1 Precision Recall (k, N) F1 Precision Recall

(10,11) 0.0010 0.0140 0.0091 (10,4) 0.0139 0.0139 0.0035 (20,12) 0.0099 0.0139 0.0090 (20,11) 0.0119 0.0168 0.0109 (30,11) 0.0086 0.0133 0.0070 (30,12) 0.0128 0.0186 0.0113 (40,12) 0.0089 0.0125 0.0077 (40,12) 0.0098 0.0139 0.0091 (50,12) 0.0101 0.0147 0.0086 (50,12) 0.0116 0.0165 0.0107 (60,12) 0.0112 0.0161 0.0097 (60,11) 0.0143 0.0195 0.0138 (70,12) 0.0105 0.0150 0.0091 (70,12) 0.0136 0.0179 0.0134 (80,12) 0.0131 0.0182 0.0118 (80,12) 0.0125 0.0168 0.0121 (90,12) 0.0126 0.0175 0.0113 (90,12) 0.0125 0.0168 0.0120 (100,12) 0.0131 0.0179 0.0121 (100,12) 0.0124 0.0165 0.0123 Table 10

Proﬁts from cross-selling of CFRS and HPRS

CFRS HPRS (k, N) TPCS (k, N) TPCS (10,11) 76.1147 (10,4) 155.1510 (20,12) 66.1306 (20,11) 153.5530 (30,11) 34.9702 (30,12) 171.4960 (40,12) 18.8049 (40,12) 106.2540 (50,12) 21.6847 (50,12) 125.0430 (60,12) 23.6967 (60,11) 116.4090 (70,12) 8.6924 (70,12) 126.6410 (80,12) 13.7868 (80,12) 91.1905 (90,12) 8.6924 (90,12) 95.7923 (100,12) 10.8431 (100,12) 83.6022 0 0.002 0.004 0.006 0.008 0.01 0.012 10 20 30 40 50 60 70 80 90 100 K Av g. F 1 CFRS HPRS

(14)

The maximum F1 of CPRS, CFRS, and HPRS are 0.0064, 0.0131, and 0.0143, respectively. Compared with CPRS (non-personalized), the recommendation accuracy of CFRS (relatively higher degree of personaliza-tion) and HPRS (relatively lower degree of personalizapersonaliza-tion) are signiﬁcantly higher than that of the CPRS.

FromFig. 7, the HPRS outperforms the CFRS in the cases of relatively smaller k in terms of recommendation

accuracy.

5.3.5. Comparison among CPPRS, CFRS and HPRS

The recommendation accuracies of CPPRS, CFRS and HPRS are further compared to study the impact of the degree of personalization on RSs. Fig. 6shows that the CPPRS generates a maximum F1 in the case of N = 12. Therefore, CFRS and HPRS are further run on a set of k with ﬁxed N = 12.Fig. 8illustrates the accu-racies of the CFRS and HPRS in this experiment.

The maximum F1 in CPPRS, CFRS, and HPRS are 0.0046, 0.0131, and 0.0143, respectively. Compared with the CPPRS (non-personalized), the F1 of CFRS (relatively higher degree of personalization) and HPRS (relatively lower degree of personalization) are significantly higher than that of the CPPRS. At the same time the HPRS outperforms the CFRS in terms of recommendation accuracy. Although personalization can sig-nificantly improve recommendation accuracy, we cannot conclude that the higher degree of personalization can produce higher recommendation accuracy. However, the above results do indicate that the HPRS which considers product profitability can be very profitable to sellers.

The approaches are investigated by FoodMart database from supermarket. In such case, thousands of products are oﬀered for sale. It is nearly impossible to ﬁnd a set of elements that can properly describe the content of all products. The attempt to analyze the content of products is therefore abandoned. The proposed approaches can be adopted in retail industry with a wide array of products, but data reduction may be required if data is sparse.

Parameters k and N, respectively represent the number of customers similar to the target one (customer neighborhood size), and the number of recommended items (top-N items) with respect to customers’ market baskets. Herlocker et al.[16]indicated that the customer neighborhood size (k) can inﬂuence the performance of CF according to their experimental results, and the best-k-neighbors method is most suited to select users for neighborhood. Sarwar et al. [41] further demonstrated that the optimal neighborhood size is dataset dependent. In[41], the number of items recommended (N) is ﬁxed to 10 in the experimentation.

In CPRS and CPPRS, N is the only parameter that has to be determined; in CFRS and HPRS, bothk and N need to be determined by analyzing the metadata of database and by performing some pilot experiments. However, these tasks can be done oﬀ-line. In this paper, N is ﬁrstly determined by considering the average number of items included in a market basket and the average number of items purchased by a customer. After the determination of N, k is then selected by additional pilot tests.

0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 10 20 30 40 50 60 70 80 90 100 K Av g. F 1 CFRS HPRS

(15)

6. Conclusions

RS is an effective means for solving the information overload problems in the personalized retailing system. Traditional CF algorithms can provide reliable and accurate recommendations by using purchase probability. However, they do not consider the profit margin of sellers. RSs should be designed not only for satisfying the diverse needs of customers, but also for obtaining a better profit margin. In this paper, the additional factor of profitability of sellers has been taken into consideration. Two systems, CPPRS and HPRS, which consider both product profitability and purchase probability were developed in this study. The experimental results show that the proposed HPRS can significantly improve the profit from cross-selling without a reduction in recommendation accuracy.

Furthermore, this paper analyzed four RSs based on diﬀerent perspectives. Four RSs were compared in terms of recommendation accuracy and/or proﬁt from cross-selling. From the above experimental results, some conclusions can be drawn.

(1) In the non-personalized recommendation, the recommendation accuracy decreases if product proﬁtabil-ity is also taken into consideration.

(2) In the case of personalized recommendation, the profit from cross-selling significantly increases if prod-uct profitability is also taken into consideration. Moreover, the recommendation accuracy does not drop when product profitability is additionally taken into consideration.

To sum up, RSs that always make recommendations that fit customers’ needs can earn their trust and assure the good reputation of the seller. Enterprises should do their best to develop an RS with high recommendation accuracy. In addition to satisfying customers’ needs, profitability must be an essential concern to enterprises. Therefore, the profitability for sellers should also be taken into consideration when developing an RS, espe-cially for mass personalization.

There exists no enough evidence to build the relationship between recommendation accuracy (F1) and total proﬁt from cross-selling. From our experimental results, the proposed HPRS however is more favorable than the traditional approaches. The relationship between recommendation accuracy and total proﬁt from cross-selling requires extensive experimentation. Although the results have shown the superiority of the proposed HPRS method, considering the product attributes according to sellers’ perspectives is still a tough task. From the survey in[36], there are many existing systems using numerical ratings. The proposed systems based on binary data can be extended to numerical ratings for wider applications. These issues are noteworthy for future works.

Acknowledgements

The authors would like to thank the National Science Council of Taiwan, ROC for ﬁnancially supporting this research under Contract No. NSC 95-2416-H-009-034-MY3.

References

[1] R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large databases, in: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington DC, USA, 1993, pp. 207–216.

[2] A. Ansari, S. Essegaier, R. Kohli, Internet recommendation systems, Journal of Marketing Research 37 (3) (2000) 363–375. [3] M. Balabanovic, Y. Shoham, Combining content-based and collaborative recommendation, Communications of the ACM 40 (3)

(1997) 66–72.

[4] K. Bradley, B. Smyth, Improving recommendation diversity, in: Proceedings of the 12th Irish Conference on Artiﬁcial Intelligence and Cognitive Science, Maynooth, Ireland, 2001.

[5] J. Breese, D. Heckerman, C. Kadie, Empirical analysis of predictive algorithms for collaborative ﬁltering, in: Proceedings of the 14th Conference on Uncertainty in Artiﬁcial Intelligence, 1998, pp. 43–52.

[6] S.W. Changchiena, C.-F. Lee, Y.-J. Hsu, On-line personalized sales promotion in electronic commerce, Expert Systems with Applications 27 (1) (2004) 35–52.

(16)

[8] K.-W. Cheung, J.T. Kwok, M.H. Law, K.-C. Tsui, Mining customer product ratings for personalized marketing, Decision Support Systems 35 (2) (2003) 231–243.

[9] Y.H. Cho, J.K. Kim, Application of Web usage mining and product taxonomy to collaborative recommendations in e-commerce, Expert Systems with Applications 26 (2) (2004) 233–246.

[10] Y.H. Cho, J.K. Kim, S.H. Kim, A personalized recommender system based on web usage mining and decision tree induction, Expert Systems with Applications 23 (3) (2002) 329–342.

[11] C. Cornelis, J. Lu, X. Guo, G. Zhang, One-and-only item recommendation with fuzzy logic techniques, Information Sciences 177 (22) (2007) 4906–4921.

[12] M. Deshpande, G. Karypis, Item-based top-N recommendation algorithms, ACM Transactions on Information Systems 22 (1) (2004) 143–177.

[13] R. Ghani, A. Fano, Building recommender systems using a knowledge base of product semantics, in: Proceedings of the Workshop on Recommendation and Personalization in E-Commerce, at the 2nd International Conference on Adaptive Hypermedia and Adaptive Web Based Systems, 2002.

[14] D. Goldberg, D. Nichols, B.M. Oki, D. Terry, Using collaborative ﬁltering to weave an information tapestry, Communications of the ACM 35 (12) (1992) 61–70.

[15] P. Han, B. Xie, F. Yang, R. Shen, A scalable P2P recommender system based on distributed collaborative ﬁltering, Expert Systems with Applications 27 (2) (2004) 203–210.

[16] J. Herlocker, J. Konstan, A. Borchers, J. Riedl, An algorithmic framework for performing collaborative ﬁltering, in: Proceedings of the 1999 Conference on Research and Development in Information Retrieval, August 1999.

[17] C.-S. Hwang, P.-J. Tsai, A collaborative recommender system based on user association clusters, Lecture Note on Computer Science 3806 (2005) 463–469.

[18] R. Jin, J.Y. Chai, L. Si, An automatic weighting scheme for collaborative ﬁltering, in: Proceedings of the 27th Annual International Conference on Research and Development in Information Retrieval, ACM Press, Sheﬃeld, United Kingdom, 2004, pp. 337–344. [19] L. Kaufman, P.J. Rousseeuw, Finding Groups in Data, An Introduction to Cluster Analysis, Wiley, New York, 1990.

[20] P. Kazienko, M. Adamski, AdROSA-adaptive personalization of web advertising, Information Sciences 177 (11) (2007) 2269–2295. [21] J.K. Kim, Y.H. Cho, W.J. Kim, J.R. Kim, J.H. Suh, A personalized recommendation procedure for Internet shopping support,

Electronic Commerce Research and Applications 1 (3–4) (2002) 301–313.

[22] J. Kim, E. Lee, Semantic web recommender systems based personalization service for user XQuery pattern, Lecture Notes on Computer Science 3828 (2005) 848–857.

[23] B.M. Kim, Q. Li, C.S. Park, S. Kim, A new approach for combining content-based and collaborative ﬁlters, Journal of Intelligence Information Systems 27 (2006) 79–91.

[24] A. Kohrs, B. Merialdo, Creating user-adapted websites by the use of collaborative ﬁltering, Interacting with Computers 13 (6) (2001) 695–716.

[25] J.A. Konstan, B.N. Miller, D. Maltz, J.L. Herlocker, L.R. Gordon, J. Riedl, GroupLens: applying collaborative ﬁltering to usenet news, Communications of the ACM 40 (3) (1997) 77–87.

[26] B. Krulwich, C. Burkey, Learning user information interests through extraction of semantically signiﬁcant phrases, in: Proceedings of the AAAI Spring Symposium on Machine Learning in Information Access, 1996.

[27] Y.-F. Kuo, L.-S. Chen, Personalization technology application to internet content provider, Expert Systems with Applications 21 (4) (2001) 203–215.

[28] K. Lang, NewsWeeder: learning to ﬁlter netnews, in: Proceedings of the Twelfth International Conference on Machine Learning, 1995, pp. 331–339.

[29] R.D. Lawrence, G.S. Almasi, V. Kotlyar, M.S. Viveros, S. Duri, Personalization of supermarket product recommendations, Data Mining and Knowledge Discovery 5 (1/2) (2001) 11–32.

[30] D.-S. Lee, G.-Y. Kim, H.-I. Choi, A web-based collaborative ﬁltering system, Pattern Recognition 36 (2) (2003) 519–526. [31] W.-P. Lee, C.-H. Liu, C.-C. Lu, Intelligent agent-based systems for personalized recommendations in internet commerce, Expert

Systems with Applications 22 (2002) 275–284.

[32] P. Lingras, M. Hogo, M. Snorek, C. West, Temporal analysis of clusters of supermarket customers: conventional versus interval set approach, Information Sciences 172 (1–2) (2005) 215–240.

[33] P. Melville, R.J. Mooney, R. Nagarajan, Content-boosted collaborative ﬁltering for improved recommendations, in: Proceedings of the Eighteenth National Conference on Artiﬁcial Intelligence (AAAI-2002), Edmonton, Canada, 2002, pp. 187–192.

[34] A. Mild, T. Reutterer, An improved collaborative ﬁltering approach for predicting cross-category purchases based on binary market basket data, Journal of Retailing and Consumer Services 10 (3) (2003) 123–133.

[35] S.-H. Min, I. Han, Detection of the customer time-variant pattern for improving recommender systems, Expert Systems with Applications 28 (2) (2005) 189–199.

[36] M. Montaner, B. Lo´pez, J.L. de la Rosa, A taxonomy of recommender agents on the Internet, Artiﬁcial Intelligence Review 19 (4) (2003) 285–330.

[37] F.F. Reichheld, Loyalty-based management, Harvard Business Review 71 (2) (1993) 64–73.

[38] F.F. Reichheld, W.E. Sasser, Zero defections: quality comes to services, Harvard Business Review 68 (5) (1990) 105–111. [39] P. Resnick, H.R. Varian, Recommender systems, Communications of the ACM 40 (3) (1997) 56–58.

[40] J. Rucker, M.J. Polenco, Siteseer: personalized navigation for the web, Communications of the ACM 40 (3) (1997) 73–76. [41] B. Sarwar, G. Karypis, J. Konstan, J. Riedl, Analysis of recommendation algorithms for e-commerce, in: Proceedings of the 2nd

(17)

[42] B. Sarwar, G. Karypis, J. Konstan, J. Reidl, Item-based collaborative ﬁltering recommendation algorithms, in: Proceedings of the 10th international conference on World Wide Web, 2001, pp. 285–295.

[43] J.B. Schafer, J. Konstan, J. Riedi, Recommender systems in e-commerce, in: Proceedings of the 1st ACM Conference on Electronic Commerce, 1999, pp. 158–166.

[44] A.I. Schein, A. Popescul, L.H. Ungar, D.M. Pennock, Methods and metrics for cold-start recommendations, in: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, Tampere, Finland, 2002. [45] U. Shardanand, P. Maes, Social information ﬁltering: algorithm for automating word of mouth, in: Proceedings of International

Conference on Human Factors in Computing Systems, 1995, pp. 210–217.

[46] L. Terveen, W. Hill, B. Armento, D. McDonald, J. Creter, Phoaks: a system for sharing recommendations, Communications of the ACM 40 (3) (1997) 59–62.

[47] K. Tso, L. Schmidt-Thieme, Attribute-aware collaborative ﬁltering, in: Proceedings of 29th Annual Conference of the German Classiﬁcation Society, Magdeburg, Germany, 2005.

[48] M.G. Vozalis, K.G. Margaritis, Using SVD and demographic data for the enhancement of generalized Collaborative Filtering, Information Sciences 177 (15) (2007) 3017–3037.

[49] B. Xie, P. Han, F. Yang, R.-M. Shen, H.-J. Zeng, Z. Chen, DCFLA: A distributed collaborative-ﬁltering neighbor-locating algorithm, Information Sciences 177 (6) (2007) 1349–1363.

[50] P.S. Yu, Data mining and personalization technologies, in: Proceedings of the Sixth International Conference on Database Systems for Advanced Applications, 1999, pp. 6–13.

[51] J.X. Yu, Z. Chong, H. Lu, Z. Zhang, A. Zhou, A false negative approach to mining frequent item sets from high speed transactional data streams, Information Sciences 176 (14) (2006) 1986–2015.

[52] U. Yun, Eﬃcient mining of weighted interesting patterns with a strong weight and/or support aﬃnity, Information Sciences 177 (17) (2007) 3477–3499.

[53] S. Zhang, J. Zhang, C. Zhang, EDUA: An eﬃcient algorithm for dynamic database mining, Information Sciences 177 (13) (2007) 2756–2767.