Weblogs have emerged as a new communication and publication medium on the Internet for diffusing the latest useful information. Blog articles represent the opinions of the population and react to current events (e.g., news) on the Internet [17]. Most people read blogs because it is a new source of news [24]. Looking for what is the latest popular issue discussed by blogs and attracting readers’ attention is an interesting subject. Another issue is the lack of channel to receive blog information passively for user. This is a major disadvantage over traditional media, which broadcasts content right to your eyes over readily available channels (i.e. TV, newspapers, etc.). Moreover, providing value-added mobile services such as blog articles is increasingly important to attract mobile users to mobile commerce, to benefit from the proliferation and convenience of using mobile devices to receive information anytime and anywhere. There are, however, a tremendous number of blog articles, and mobile users generally have difficulty in browsing weblogs owing to the limitations of mobile devices such as small screens, short usage time and poor input mechanisms. Accordingly, providing mobile users with blog articles that suit their interests is an important issue. Very little research, however, focuses on this issue.
There are mainly three types of research regarding blogs. The first type of research focuses on analyzing the link structure between blogs to form a community [21, 22]. Through the hyperlinks between blogs, people can communicate across blogs by publishing content relating to other blogs. Nakajima et al. [31] proposed a method to identify the important bloggers in the conversations based on their roles in preceding blog threads, and identify “hot” conversation.
The second type of research focuses on contents analysis to derive the propagation of topics and trends in blogsphere. Gruhl et al. [15, 16] modeled the information propagation of topics among blogs based on blog text. Mei et al. [27] proposed a method to discover the distributions and evolution patterns over time and space. With the analysis of tracking topic and user drift, Hayes et al. [17] examine the relationship between blogs over time. Most researches have not
considered how to predict the popularity degree of blog topics. The last type of research focuses on how to model the blogger and derive their interests for personal recommendation [20, 40]. A variety of methods has been proposed to model the blogger’s interest, such as classifying articles into predefined categories to identify the author’s preference [26]. Bloggers can receive the recommended content which is similar to their earlier experiences.
However, the majority of previous studies about blogs ignore the hot topics and popular articles discussed by the reader mass, who take browsing actions on the blog articles. Moreover, existing studies do not consider recommending blog articles to mobile readers in mobile environments. With more and more blog articles published on the Internet, the scale and complexity of blog contents are growing rapidly and result in information overload for bog readers. Mobile readers could only browse very few blog articles because of the restriction of mobile device. Accordingly, traditional recommendation methods, such as the collaborative filtering approach, may suffer the sparsity problem of finding similar users or items due to insufficient historical records of browsing blog articles by mobile readers. To address the sparsity issue and blog information overload, it is essential to design an appropriate mechanism for recommending blog articles in mobile environments. Blog readers are often interested in browsing emerging and popular blog topics, from which the popularity of blogs can be inferred according to the accumulated click times on blogs. Popularity based solely on click times, however, cannot truly reflect the trend of popularity. For example, a new event may trigger emerging discussions such that the number of related blog articles and browsing actions is small at the beginning and rapidly increases as time goes on. Thus, it is important to analyze the trend of time-sensitive popularity of blogs to predict the emerging blog topics. In addition, blog readers may have different interests in the emerging popular blog topics. Nevertheless, very few researches have addressed such issues.
In this work, we propose a Customized Content Service on a mobile device (m-CCS) to recommend blog articles to mobile users. The m-CCS can predict the trend of time-sensitive popularity of blogs. First, we analyze blog contents retrieved by co-RSS to derive topic clusters, i.e., blog topics. We define a topic as a set of significant terms that are clustered together based
on similarity. By examining the clusters, we can extract the features of topics from the viewpoints of the authors. Moreover, we analyze the click times the readers give to articles. For each topic cluster, from the variation in trends of click times we can predict the popularity degree of the topics from the readers’ perspectives.
Second, mobile users may have different interests in the latest popular blog topics. Thus, the m-CCS further analyzes mobile users’ browsing logs to derive their interests, which are then used to infer their preferred popular blog topics and articles. We scrutinize the browsing behaviors and dissect the interests of the mobile users, then modify the ranking of topic clusters according to their preferences. Moreover, the m-CCS recommends blog articles by integrating personalized popularity of topic clusters, item-based CF and attention degree (click times) of blog articles. The filtered articles are then sent to the individual’s mobile device immediately via a WAP Push service. This allows the user to receive personalized and relevant articles and satisfies the demand for instant information. Finally, the system of m-CCS demonstrates that the system can effectively recommend desirable blog articles to mobile users that satisfy popularity and personal interests.
We summarize the contributions of this paper as follows:
We propose a value-added mobile service to provide customized blog articles for mobile users, and the basic idea is to combine the estimated popularity of articles in the topic cluster and the predicted interest of the user in the articles. Without the effort of user rating, the implicit interest of customer in an article is inferred by comparing the time spent on reading the article with the average time spent on articles with the same size.
The proposed recommendation process mainly integrates contents analysis and collaborative filtering to improve the shortcoming of pure collaborative filtering such as sparsity and cold start problem, including aspects as (1) the prediction of popular topic cluster concerned by blogger and readers on the Internet, (2) the prediction of users’ preference score by item-based collaborative filtering, and (3) attention degree (click times) of blog articles obtained from Internet users.
In general, the effectiveness of CF recommendation approach mostly depends on the set of historical data. There is still potential limitation such as sparsity and cold start problem. It may deliver low-quality recommendation results when the system only has a few rating records of users, for measuring the similarity between users. For new items or new users, because of no active records viewed by users, the system will present weak performance in recommendation.
In our research, we focus on the mobile user and blog articles. For dimensionality reduction, we apply clustering techniques to group the data set first and then forming neighborhoods form the partitions, which can reduce the sparsity and improve scalability of recommender systems.
Previous studies [4, 41] had also indicated the benefits of clustering application in recommender systems.
Additionally, many blog articles have not been viewed by any mobile user in our system due to the limitation of mobile device. It makes that most articles, which are popular on the Internet and the masses of Internet users pay attention to, may be ignored in the process of recommendation. Thus, we consider the activities of bloggers and the Internet readers as different viewpoints to identify the popularity of each article, so as to improve the recommender performance.
This study implements m-CCS which is suitable for thousands of real users in practice. To recommend the latest and the most popular blog articles instantly, the system needs to timely process the article contents and analyzes browsing behavior of thousands of online mobile users within two hours. Therefore, it must overcome the issue of efficiency and scalability. We not only adopt the load balancing architecture, but also carefully choose the algorithm and caching technology, in order to apply the system in a real business environment.
In the experiment, we compare different strategies: unified push of articles selected by experts and personalized push of articles selected by system recommendation service with m-CCS. The experiment result shows that the m-CCS can increase the click rate of blog articles to enhance customer satisfaction.
The remainder of this paper is organized as follows. Section 2 introduces the related works about blog, forecasting, and the recommendation. A brief introduction to our system is given in Section 3. The detailed descriptions about the processing module of our system are presented in Section 4 and Section 5. Section 6 illustrates how to integrate different modules of our system to develop recommendation methods. The system architecture is illustrated in Section 7. Section 8 presents the evaluation of the usefulness of m-CCS empirically and practically. The conclusions and future work are finally made in Section 9.