4. Experimental Evaluation 38
4.2. Application Setup
The application requires J2SE 5.0 or above to run. The bundled java web archive file (WAR) requires Servlet 2.3 or above containers such as Apache Tomcat. Copy the WAR file to Tomcat’s webapps directory, and the start Tomcat by executing the startup.bat command in terminal window. Recommendations are automatically retrieved with reference to three controlling attributes: “userID”, “movieID” and
“howMany”. UserID and movieID attributes are automatically checked by the system, whereas howMany attribute is preset in a global web context fashion. The “userID”
denotes which user id one is seeking recommendation for, and “howMany” denotes how many recommendations the application should return from the computation. The movieID associated with the web page is then passed along with the userID and howMany attributes to the BPEL engine. BPEL engine takes charge in checking the rating counts for the particular movie. Depending on the rating counts for the particular movie, the dependent CF scheme as illustrated in Figure 11 will be chosen to generate recommendations. Upon receiving the web server renders the recommendation at the lower part of the target page.
4.3. Experiment Result
The data were randomly divided as 90% for training, and 10% for testing purposes.
With the trained dataset, the correlation scores thereby generated were used to predict the ratings in the test dataset. The actual rating is compared with the estimated rating generated by the recommendation engine. MAE is then calculated to be the average of the actual and estimated differentials. Experiments were run with the different collaborative filtering scheme aforementioned in section 3.3. We've divided the evaluation into 3 parts: (1) initial run time consumption, (2) subsequent runs time consumption, and (3) MAE (Mean Absolute Error). The Time consumptions were divided into initial run and subsequent runs to illustrate Item-Based Collaborative Filtering is magnitudes higher in Time consumption during startup (see Chart 1), but is more efficient in subsequent runs (see Chart 2). The reason for this is that during startup phase, Item-Based CF scans through the entire database and compute each item pair's correlation score. Nevertheless, since relationships between item pairs are rather static, this calculation can be pre-computed in a separate offline batch Process.
The computed similarity scores can then be stored in cache for later online Item-Based CF's quick reference. The initial run of compared collaborative filtering scheme in milliseconds is shown next in Chart 1, the associated data sheet is shown in Table 2.
0 50000 100000 150000 200000 250000 300000 350000 400000 (msecs)
Initial Run
Item-Based User-Based SlopeOne
Chart 1: Initial Run Time Consumed
Table 2: Initial Run Time Consumed Data Sheet
Collaboration Scheme Time (milli-seconds)
ItemCorrelation(Pearson) Initial Run 380328
UserCorrelation(Cosine) Initial Run 27453
SlopeOne Initial Run 14610
The subsequent runs of compared collaborative filtering scheme in milliseconds is shown next in Chart 2, the associated data sheet is shown in Table 3.
0 5000 10000 15000 20000 (msecs)
Run 1
Run 2
Run 3
Run 4
Run 5
Item-Based User-Based SlopeOne
Chart 2: Subsequent Runs Time Consumed
Table 3: Subsequent Runs Time Consumed Data Sheet
Collaboration Scheme Run 1 Run 2 Run 3 Run 4 Run 5
ItemCorrelation(Pearson) 12250 9031 10750 8157 8609
UserCorrelation(Cosine) 14468 14579 18906 13484 14656
SlopeOne 8953 10297 10500 8609 10156
0
Chart 3: Various Runs MAE (Mean Absolute Error)
Table 4: Various Runs MAE (Mean Absolute Error) Data Sheet
Collaboration Scheme Run 1 Run 2 Run 3 Run 4 Run 5
ItemCorrelation(Pearson 0.8300 0.8023 0.8100 0.8081 0.8134 UserCorrelation(Cosine) 0.9393 0.9393 1.0132 0.9821 0.9789
SlopeOne 0.7332 0.7284 0.7248 0.7402 0.7166
For the clickstream tree evaluation, since we do not have access to user navigation logs with our current application, we make use of the msweb data courtesy of Microsoft.com covering the web pages each user has navigated in a one-week time frame in February 1998. We evaluated the clickstream tree by first generating the clickstream via the frequent visited navigation paths. Upon completing the clickstream tree, the tree elements (e.g. frequent navigation path) will be scored via
the light collaborative filtering case scoring scheme where the apriori score is calculated for each case. The top cases are then tested to check against a purposely hidden path id (e.g. web page id) to verify if it’s among one of the top cases. If any of the case matches, it is considered an accurate path recommendation. The observed accuracy scores are listed in the following Table 5.
Table 5: Clickstream Tree Accuracy
Recommendation Length 3 5 8 Accuracy 0.3333 0.4833 0.5883
4.4. Experiment Analysis
In Table 2, the MAE for among all collaboration schemes are comparable. The runs were divided in two runs: initial run and next run. As expected ItemCorrelation takes the longest time in the initial run as it has to scan through the entire database to calculate the ItemItemCorrelation scores for all items, though subsequent computing time toped all other schemes. SlopeOne scheme ranked first in lowering the MAE, and thus is observed to be the more accurate scheme. UserCorrelation ranked last in MAE and time consumed. It’s interesting to see that accuracy actually decreases with greater count of data processed. This is likely to be the result of over-fitting. As a result of this, our Dynamic Collaborative Filtering model efficiently makes use of BPEL engine to dynamically choose a scheme that is more accurate but requires more processing time for smaller data counts and switch to a more scalable scheme that cuts the processing time for larger data counts to balance the prediction accuracy and processing time.
5. Conclusions and Future Work
With the proposed dynamic model, we predicted the potential next page (movie title) of interest with higher confidence via the help of clickstream tree. We observed that ItemCorrelation is the faster recommendation scheme, and SlopeOne predictor is the more accurate scheme. Our dynamic recommendation system based on SOA, orchestrated by BPEL dynamically switches among the schemes to generate more accurate recommendation within a timely fashion in a scalable manner. We expect that for users with committed buying will rent even more movies through the recommendation computed by the dynamically binded collaborative filtering. The ultimate goal of this research is to turn traditional video rental stores into an e-commerce capable business through Knowledge Discovery in Database (KDD) techniques such as product recommendation via collaborative filtering approach.
Having the framework built in a service oriented architecture (SOA), we leave the room for improvement with a very scalable and yet adaptable infrastructure. To sum it up, what we achieved in this research is to turn a traditional business into a e-Business by KDD techniques to mine the useful knowledge buried within legacy data in hope that data can some day be formalized into information, information be turned into knowledge, and eventually be transformed into intelligence to not only increase customer loyalty but also maximize the net profit. The data source from Movie Lens, albeit useful in proving our concept will be much more practical when we tailor our design to capture that of a real video store. We've only made use of SOA to orchestrate the collaborative filtering Web Services with our local implementation, the service can greatly be enhanced when external collaborative filtering or data mining schemes can be integrated and orchestrated.
References
1. S.R. Ahmed, “Applications of data mining in retail business,” Information Technology: Coding and Computing, 2004, Proceedings, ITCC 2004, IEEE, pp.
455-459 Vol.2.
2. J. S. Breese, D. Heckerman, and C. Kadie, “Empirical Analysis of Predictive Algorithms for Collaborative Filtering,” Proc. 14th Conf. Uncertainty in Artificial Intelligence, Morgan Kaufmann, 1998, pp. 43-52.
3. A.Y. Chen and D. McLeod, “Collaborative Filtering for Information Recommendation Systems,” Department of Computer Science and Integrated Media System Center.
4. M. Deshpande, G. Karypis, “Selective Markov models for predicting Web page accesses,” ACM Transactions on Internet Technology (TOIT) 2004, pp.
163-184.
5. Ş Gündüz, MT Özsu, “A Web Page Prediction Model Based on Click-Stream Tree Representation of User Behavior,” Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp.
535-540.
6. J.L. Herlocker, J.A. Konstan, J. Riedl, “Explaining collaborative filtering recommendations,” Proceedings of the 2000 ACM conference on Computer supported cooperative work, 2000, pp.241-250.
7. Z. Huang, D. Zeng, H. Chen, “A Link Analysis Approach to Recommendation under Sparse Data,” Proceedings of the Tenth Americas Conference on Information Systems, New York, New York, August 2004.
8. Dong-Ho Kim, Il Im, Atluri, V., “A clickstream-based collaborative filtering recommendation model for e-commerce,” Seventh IEEE International Conference, E-Commerce Technology, 2005. CEC 2005, pp. 84-91.
9. D. Lemire, A. Maclachlan, “Slope one predictors for online rating-based collaborative filtering,” Proceedings of SIAM Data Mining (SDM’05), 2005.
10. Greg Linden, Brent Smith, and Jeremy York, “Amazon.com recommendations:
item-to-item collaborative filtering,” Internet Computing, IEEE, 2003, pp. 76- 80.
11. B. Sarwar, G. Karypis, J. Konstan, J. Reidl, “Item-based collaborative filtering recommendation algorithms,” Proceedings of the 10th international conference on World Wide Web, ACM, pp. 285-295.
12. A. Schein, A. Popescul, L. Ungar, and D. Pennock, “Methods and Metrics for Cold-Start Recommendations,” Proceedings of the 25th International ACM
Conference on Research and Development in Information Retrieval, 2002, pp.253-260.
13. Chieh-Yuan Tsai, Min-Hong Tsai, “A dynamic Web service based data mining process system,” The Fifth International Conference on Computer and Information Technology (CIT’05), IEEE, 2005, pp. 1033-1039.
14. Sholom M. Weiss and Nitin Indurkhya, “Lightweight Collaborative Filtering Method for Binary Encoded Data,” Proceedings of PKDD Freiburg, Germany, September 2001.
15. ActiveBPEL Designer, http://www.active-endpoints.com/active-bpel-designer.htm 16. An introduction to SOA,
http://www.javaworld.com/javaworld/jw-06-2005/jw-0613-soa.html 17. BPELJ: BPEL for Java technology,
http://www-128.ibm.com/developerworks/library/specification/ws-bpelj/
18. Clickstream, http://www.active-endpoints.com/active-bpel-designer.htm 19. MovieLens Data Sets, http://www.grouplens.org/taxonomy/term/14
20. Recommendation System, http://en.wikipedia.org/wiki/Recommender_system 21. W3C, Web Service Architecture, http://www.w3.org/TR/ws-arch/