Literature review - 行動應用之熱門部落格個人化推薦服務

2.1 Blog

Blogging is a new method of publishing articles on the Internet. Due to its ease of use, anyone can publish and maintain a blog article on the Internet via a publishing software tool.

Blog has emerged as an important type of web page with a set of dated entries, and is usually associated with profile of writers. In general, bloggers can freely voice their views on any subject of interest and share their knowledge with others. Therefore, blog content represents the opinions of the population and reacts to current events (e.g., news) on the Internet [17].

Under the notion of Web 2.0, in which everyone is encouraged to participate in public discussions, the process to gather different assemblies will thus, empower public attentions to off-stream occurrences. By definition and in practice, blogs have a distinction over traditional media, with even broader diversification as opposed to the relatively narrow view of the media, dictated by a handful of journalists and editors. Blogs have become such a force that mainstream media cannot help but take notice [12]. Moreover, one of the reasons for people to read blogs is that it is a new source of news [24].

The blog fever is accompanied by increasing interests from research and industrial communities to harness this important information source. There are three stems of research regarding blogs. The first type of research focuses on analyzing the link structure between blogs to form a community. The leading academic research [21, 22] on the weblog community

proposed a way to discover bursty evolution of blogspace by applying the hyperlink among blogs to cluster blogs, form communities, and inspect the changes of the communities.

Subsequently, researches were introduced about the distribution of blogs over locations and how to form communities.. Through the hyperlinks between blogs, people can communicate across blogs by publishing content relating to other blogs. Nakajima et al. [31] proposed a method to identify the important bloggers in the conversations based on their roles in preceding blog threads, and identify “hot” conversation. Moreover, it has been argued, by Herring and others, that most blogs are less-connected as few bridging hyperlinks are available on the Internet [17].

For this reason, blog analysis may not perform well if they are viewed as typical web pages for page rank algorithms.

The second type of research focuses on analyzing blog content. Gruhl et al. [15, 16] modeled the information propagation of topics among blogs based on blog text. The patterns they proposed for topic propagation were useful to predict the ranks of sales forecasts. In addition, more and more researches pay attention to studies about blog content recently. Blog text analysis focuses on eliciting useful information from blog entry collections, and determining certain trends in the blogosphere. Natural Language Processing (NLP) algorithm has been used to determine the most important keywords within a definite time period, and it can automatically discover trends across blogs [12]. Nevertheless, above researches emphasized on assigning blog articles to only one topic, while blogs, in fact, contain many topics. Mei et al.

[27] focus on a mixture of subtopics and recognize the spatiotemporal topic patterns within blog documents. They proposed a probabilistic method to model the most salient topics from a text collection, and discover the distributions and evolution patterns over time and space. For tracking topic and user drift, Hayes et al. [17] examine the relationship between blogs over time . However, most researches have not considered how to predict the popularity degree of blog topics.

The last type is about user modeling and personal recommendation in blog space. A variety of methods [20, 40] has been proposed to model the blogger’s interest, such as classifying articles into predefined categories to identify the author’s preference [26], and thus to

automatically recommend the blog articles which are suitable for their interest by analyzing the contents which bloggers have acted on. Bloggers can receive the recommended content which is similar to their earlier experiences, but the methods ignore the hot topics and popular articles discussed by the reader mass that can attract mobile user’s interest.

Nevertheless, the preceding studies, no matter what type of researches, were all observed from the viewpoints of bloggers. They mainly examined the interests of bloggers and identified which topics were widely discussed by the bloggers.

2.2 Forecasting

Forecasting [8] is to estimate what is going to happen. It mainly uses the history to reflect the developing statement in the future. There are mainly three types of forecasting approach. The fist is subjective judgments with experts which depend on the professional knowledge for a specific domain. The second method is to construct the relation models which mostly use explanatory variables to explain the predictor variable with inductive reasoning and hypothesis test. The third type is time series prediction. Time series is a set of observation value by time orders. The time series prediction build up a suitable model to forecast the future trend from the past observation value.

Within the variety of methods, Exponential smoothing method [7] is easy to understand and highly reliable, and this method can use few data to make a short term prediction. Exponential smoothing method assumes that there exists stability and regularity in the trend of time series which can be reasonably postponed with the drift. Since the latest trend will last to the nearest future in some level, we put the latest information in higher weight.

Exponential smoothing method has been widely used in the production forecast, and also used in the short term or medium term economic development trend forecasting. The exponential smoothing method gives historical data a dynamic weight fading by time to converge to zero.

2.2.1 Simple exponential smoothing method

In simple exponential smoothing method, to get the current prediction value, we weight the past values including the prediction value and the actual value belonging to the preceding time period. In Eq. (1), for preceding time series, x(t) is the actual value at time t, and ^{ˆx t}

( )

^{is the}

prediction value at time t. To forecast the current value for time t+1, ^{x t}^ˆ

(

⁺¹

)

is the average value between two parameters, x(t) and ˆx t

( )

, weighted by α which is a smoothing constant.

Therefore, the difference of smoothing constant would determine which parameter has more power of influence to affect the prediction value.

Learning from the formula, each prediction value is weighted from the series value within past period. The more recent history data is. The more important weight of prediction is.

Here the smoothing constant can be decided in a subjective way or by minimizing ESS as Eq.

(2) [9]. In usual, the smaller smoothing constant is suitable for the time series data which is change more violently, or we can say which is anomalous obviously. In contrast, the larger value of smoothing constant is suitable for the stable time series data.

Simple exponential smoothing is suitable for stationary time series which don’t have trend effect. Moreover, if there is a trend of time series, we can predict it by implementing double exponential smoothing method.

2.2.2 Exponential smoothing method with trend effect

In this section, we introduce the double exponential smoothing approach to process the time series data which has trend effect and is predicted as follow [10].

( ) ( ) ( ) ( )

The basic concept is similar to the simple exponential smoothing method. But the most distinction between them is to consider the value of b(t), which represents the trend effect at time t and is calculated as Eq. (4). Apply b(t) to weight the difference between two prediction values, ^{ˆx t}

( )

^and^{x t}^ˆ

(

⁻¹

)

, belonging to adjacent days and the preceding trend effect, b(t-1).

Using double exponential smoothing method in prediction, the value of ^{ˆx t}

( )

and b(1) have to be assigned in the initial stage. The simplest way is to make a assumption for ^x^{ˆ 2}

( ) ( )

=^x ¹ and

b(1)=0 [13]. Some research has also suggested that the selection of initial value is not important toward the stationary [10], since it does not have a significant effect on the prediction result.

2.3 Recommendation

Due to the flourish development of the Internet, information grows and circulates very fast.

For solving the problems of information overload, the recommender system is needed to provide suitable personalized information to the users according to their needs and preferences [28, 30, 37]. The recommender system has been highly used in many different areas [38], such as news [25], movie [32], book [14] and the music [39], and gives not only personalized recommendation service for each customer, but also brings the benefit to business marketing strategies.

Generally, the recommender system mainly has two types, including content-based filtering, collaborative filtering [1, 2].

2.3.1 Content-based recommendation

Content-based Recommendation approach analyzes the customers’ preferences on item attribute feature to build up a personal feature profile, and then predict which items the customer will like [18, 42]. In other words, this approach is used to recommend items with similar attribute features to the customers according to their preference in the past. It is more

( ) ^ˆ( ) (^ˆ ¹) (¹ ) ( ¹)

b t =β⎡⎣x t −x t− ⎤⎦+ −β b t− (4)

likely to be used for document recommendation, and also been used to recommend webpage and news articles. However, this method still has some restrictions to be improved, such as it is not easy to analyze the features of items, users can only receive the recommended items which are similar to the past [23].

2.3.2 Collaborative filtering recommendation

Collaborative filtering (CF) approach is one of the most popular recommending approach, and it has been successfully applied in many areas [3, 32]. This method can solve some problem of content-based method mentioned before. There is no need to analyze the contents of item; the recommended items are identified for a target user solely based on similarities of historical profile to other users. Furthermore, it can deal items with dissimilar content to those seen in the past.

Based on the relationship between items or users, CF method can be classified into two types [37], user-based CF and item-based CF. User-based CF is to calculate the similarity between users, and predict the target user preference toward different items. GroupLens is an example of such systems [32]. With the number of user and item being exploded, how to quickly produce high quality recommendation and search a large amount of potential neighbors in real time are important issues, especially for commercial systems. Considering from the aspect of items, item-based CF method has been proposed to identify the relationships between different items that users had already rated and then rank recommended items each user has not viewed before.

The application has already been used on the Amazon platform [14] with a good performance.

在文檔中行動應用之熱門部落格個人化推薦服務 (頁 14-20)