• 沒有找到結果。

Research in progress sharing. CQA service facilitates users to post questions and get right answers. In this sense, its vitality heavily relies on active users possessing valuable expertise in domains from which questions are proposed. In practice, system assigns unsolved questions to appropriate users – namely, question recommendation which is a crucial process in providing CQA service. Previous research mainly focused on developing techniques for associating each question with potential active user based on overall users, leading to the prohibitive system cost. With the expansion of CQA, user management becomes difficult. In this work, we complement previous research by proposing an interesting process - user segmentation- for efficient question recommendation and user management before the question recommendation process. Use segmentation makes the reduction of system cost in question recommendation possible. Besides, it helps community organizers develop and maintain relationships between the community and users in groups of different characteristics. By applying fuzzy c-means clustering, active users may be classified into different groups representing different preferences in providing answers. A dataset collected from Stack Overflow might help conduct nice user segmentation and inform the design of future practical question recommendation mechanism and user management.

Keywords: User segmentation, question recommendation, fuzzy c-means clustering.

1. INTRODUCTION

Community question answering (CQA), such as Yahoo! Answer, Stack Overflow, Quaora, and TurboTax, is a type of online social networks (OSN) and contributes greatly to problem solving and knowledge sharing. In these CQAs, people are able to post questions that cannot be solved by searching the web content and obtain answers from other members. Besides, CQA provides a platform for all Internet users to exchange and share their knowledge. Therefore, a CQA of high vitality and activity relies heavily on values of questions and answers it publishes. Although the CQA’s prevalent, there still remains a question: how to make this service to solve users’ question efficiently?

In practice, there are active users who are the major drivers for community’s development and prior studies usually define active users as those who are active in responding to questions and giving answers of high

95

credibility(Pal et al. 2011). Questions are volatile and new questions come every second in popular CQAs.

Thus, it’s time-consuming for users to find the questions that they relate to. As a result, although there are lots of active users, a number of questions remain unsolved or without accepted answers. To improve the users’

management of CQA portals and expedite the answering of new questions, question recommendation emerges.

Question recommendation is a mechanism that exposes the right question to the right users (Dror et al. 2011).

There is another fact that active users receiving the recommended question might not answer these questions in a timely manner. People provide answers in different patterns. For example, some prefer to answer questions at any moment of a day while some others only want to response in the evening. Some active users even have a question selection bias and prefer to answer questions from which they have a higher chance of making a valuable contribution (Pal et al. 2010). A better understanding of users’ behaviour can help the CQA portal run user management efficient. Therefore, modelling users’ answering behaviours is of great importance. To our best knowledge, research in question recommendation run the model based on overall active users for each question, which leads to a prohibitive system cost. Besides, in order to achieve the integration of social behaviour of users (e.g. voting, making comments.) with their questions and answers, researchers tend to combine topic models and link analysis However, this task in non-trivial when dealing with volatile questions and answers.

Moreover, rapid expansion of CQA inhibits an efficient management of the community. Organizers need to nurture and promote relationships between users and the community for the sake of the community’s sustained development. With such a large scale, community possibly lost focus. These considerations turn the need for CQA users’ classification in an appealing idea from both an academic and practitioner approach(Alarcon-del-Amo et al. 2011).

Introducing the user segmentation into CQA area can bridge the aforementioned gaps. User segmentation or customer segmentation is a strategy for enhancing sales in direct marketing(Seret et al. 2012). By dividing users into different groups, company can devise and tweak its policies to attract users with different preferences (Ozer 2001). We claim that for the sake of practical recommendation and efficient user management, the benefit of introducing user segmentation in CQA is two-fold. Firstly, question recommendation would be much efficient if we only need to match the question with limited number of user cliques represented by group centres. Hence, the question would be routed to users in the most appropriate group. Secondly, maintaining relationships with actives users would be easily and appropriated fulfilled if we know user’s preference in terms of interest, answering preferences, cultural background, and geographical location, etc.

2. METHODOLOGY

In order to perform user segmentation, we design a methodology as the process model shown in figure 1.

96

Figure 1. A Process Model of User Segmentation

Attributes Representation

For each active user, we describe him/her from four perspectives: demogrphic informaiont, personal interests, answering patterns, and authority. For detailed attributes for each perspetive, please refer to table 1.

Demogrphic Information

Gender, age, location.

Interests Topics extracted from collection of questions and answers with topic models.

Answering Patterns

Prefered time peirod, median length of words in answers, percentage of answers with codes, percentage of answers with linkage, toatal # of answers, median of # of prior answers, median length of words in questions answered.

Authority Reputation score, percentage of answers been accepted, # of upvotes, # of downvotes, # of favorites, # of pageviews.

Table 1. Attributes Representation

Interests of users are hidden patterns and can be revealed from users’ questions and answers. We regard collection of questions and answers from all users as a corpus of documents, from which topics can be explored with Latent Dirichlet Allocation (LDA) model . LDA is a latent topic model and it assumes each document is a mixture of serveral topics with different weights (Blei et al. 2003). Therefore, each active user’s passion for different topics is quantified with weights. Please note that we understand there are some attributes that we have not leveraged yet and plan to consider in the future.

Factor Analysis & Feature Generation

High correlation might exist in the above attributes and factor analysis is an efficient approach for eleminating redundant variables. In the context of CQA, we refine our attributes generated (see table 1) with factor analysis and find the higher level representative features for furthur user segmetation.

UserClustering

Fuzzy c-means (FCM) is a widely used method of clustering which allows one piece of data to belong to two or more clusters and it remains one of the general purpose fuzzy clustering techniques (Bezdek, 1984). In the context of CQA, this can represent similarity of one user shares with each cluster with the membership function. This method (Dunn, 1973) is based on minimization of the following objective function:

97 center of the cluster j, and ||*|| is any norm expressing the similarity between any measured data and the center.

Fuzzy clustering is solved by an iterative optimization of the objective function shown above, with the update of membership uji and the cluster centers cj by: The algorithm is composed of the following steps:

1. Initialize U=[uji] matrix, U(0) reflecting active users and examine the utility of these attributes in identifying interesting user segments. We downloaded the data dump from its official website8 and parsed the data set with regards to user personal information, questions, and answers. Since we conduct the use segmentation with active users who are more involvedin the community, we select users with reputation scores above the sample median for our future experiment. Experiments on this data set will be conducted in the next phase of our research.

4. CONCLUSION

In this work, we tap on the potential of user segmentation in CQA for efficient question recommendation and user management. In order to build the profile for users, we sketch user from four perspectives: demographic, interests, answering patterns, and authority. By applying FCM technique, we tend to fuzzily subdivide users

8http://blog.stackoverflow.com/2009/06/stack-overflow-creative-commons-data-dump/

98

into different segments with different preferences. Being similar to FCM’s application in direct marketing, CQA organizers match the question with each segment and can reduce system cost in question recommendation. Most importantly, organizers can adapt community’s policies for users in each segment. We select dataset from a popular CQA – Stack Overflow– for future analysis.

REFERENCES

Alarcon-del-Amo, M. C. Lorenzo-Romero and M. A. Gomez-Borja (2011). "Classifying and Profiling Social Networking Site Users: A Latent Segmentation Approach." Cyberpsychology Behavior And Social Networking 14(9): 547-553.

Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. "Latent dirichlet allocation," the Journal of machine Learning research (3), pp 993-1022.

Dror, G., Koren, Y., Maarek, Y., and Szpektor, I. Year. "I want to answer; who has a question?: Yahoo! answers recommender system," Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM2011, pp. 1109-1117.

Ozer, M. 2001. "User segmentation of online music services using fuzzy clustering," Omega (29:2) 4//, pp 193-206.

Pal, A., Farzan, R., Konstan, J. A., and Kraut, R. E. 2011. "Early detection of potential experts in question answering communities," in User Modeling, Adaption and Personalization, Springer, pp. 231-242.

Pal, A., and Konstan, J. A. Year. "Expert identification in community question answering: exploring question selection bias," Proceedings of the 19th ACM international conference on Information and knowledge management, ACM2010, pp. 1505-1508.

Seret, A., Verbraken, T., Versailles, S., and Baesens, B. 2012. "A new SOM-based method for profile generation: Theory and an application in direct marketing," European Journal of Operational Research (220:1) 7/1/, pp 199-209.

99

Investigation on the relationship between China’s social media and the purchases made on the e-Commerce platforms for better promotion strategies

Terence Chun-Ho Cheung

Department of Information Systems, City University of Hong Kong, Hong Kong SAR is.tc@cityu.edu.hk

ABSTRACT

This paper summarizes some observations from the temporal trends of the discussion volume in China’s social media against the actual purchase transactions made in China’s e-Commerce platforms such as Taobao and Tmall. Such temporal trends could be used as a reference for better promotion strategies in any industry, for example, beauty industry chosen in this study, and for any campaigns in specified festivals. The temporal trends in the Q1 of both 2012 and 2013 could help demonstrate the informal communication among peer customers, i.e. electronic word-of-mouth, as kind of their social media footprints for obviously supporting actual purchases made in the periods of Chinese New Year and the 3.8 Women’s Day in 2012 and 2013. Lastly, a hypothesis is set and tested that there is significant positive relationship between number of posts relevant to skin care topics in social media and actual purchases made in an e-Commerce platform in China.

1.INTRODUCTION

The increasing huge amount of social media discussions and posts reflects that the “big data” could affect various aspects of our daily activities and represents many opportunities to the online sellers and also scientists to address fundamental questions about the complex world we inhabit [1-2]. Collective human trading behavior in financial market could be predicted and illustrated by massive behavioral data from Google Trends [3]. Investigation on electronic word-of-mouth (eWOM) is significantly and positively related to the revenue of a movie (i.e. information exchange in Facebook) [4]. It is both the precursor and outcome of retailing [5], and the total box office revenue of a movie can be predicted by, for example, average frequency, peak frequency, total number of posts, number of positive and negative posts, and all their relationshipsare found to be positive to each other [6]. Branded-related comments and product-related comments from social networking sites on public display are found to help increase sales, improve brand image and encourage customer referrals [7]. Ultimately, they help generate large volume of sales in e-Commerce. In China’s social media, up to 40%

SinaWeibo (a Twitter alike social media platform) users use Alibaba’sTaobao, a famous B2C e-Commerce platform in China. About 2% share of Weibo’s traffic to Taobao.com and of about 3.5% Weibo users will help spread and share their shopping experience from Taobao [8]. Thus, social media data, such as those obtained from forums, blogs, Weibo, is expected to be interactively and promptly beneficial to shopping experience on Taobao. However, how should the online sellers plan ahead for their promotion strategies?How many days in advance to be ready for seasonal sales?And how many days the sales last for, such as during Chinese New Year (CNY), 3.8 Women’s day, 10.1 China National Day and 11.11 Singles Day? An investigation on the

100

relationship between the China’s social media and the actual purchases made on the e-Commerce platformis carried out and several phenomena in Q1 2013is shared in this study. This can provide references for better promotion strategies during similar seasonal or Chinese festivals. The objective of this study is to investigate whether the number of posts regarding to a particular industry, for example, beauty industry, in China’s social media can affect the actual transactions created on the China’s e-Commerce platforms, such as Taobao.This paper first gives a literature review and describes our research methodology followed with the data collection, analysis and results. Lastly, a conclusion, limitation and future research direction will be given.

2. LITERATURE REVIEW

Recent studies on the relationship between online consumers’ reviews and the sales in game industry indicate that the online consumers’ reviews influence the sales in games, especially for less popular products [9]. Through text-mining analysis, positive relationships between blogging activities and box office revenue of movies are found [10]. Amount of blogging activities is suggested to forecast sales revenue, measure effectiveness of traditional marketing, conduct consumer evaluation and review the adoption of new product.

However, empirical investigation on eBay shows that the online community participation had mixed effects on customers’ likelihoods of participating in buying and selling behaviors [11]. This does not affect the participation on the number of bids placed or the revenue earned, but gives a negative impact of participation on the number of listings and the amount spent. Thus, research on product sales depends on the balance between positive word-of-mouth and negative word-of-mouth. For example, discovered pre-consumption word-of-mouth (tweets) owns a larger effect than post-consumption word-of-mouth because of the increase of awareness effect and the persuasive effect of recipients [12]. It is noted that negative relationship between advertising and online word-of-mouth among consumers is found [13]. Managers are suggested to set budget on these strategically because both of them will affect final sales. Therefore, it is important to estimate the trend of demand from the text-mined social media in order to capture the right audiences at the right time at the rapid expansion e-Commerce market. This paper is a preliminary investigation on the relationship between the discussion posts in China’s social media and actual purchases in China’s e-Commerce platforms, and followed by some temporal phenomena observed between the trend of discussion volume in social media and the trend of actual e-commerce transactions volume. This also reveals the trend of how the discussion in social media related to the actual searches and buying behavior of consumer, and thus be used to predict consumer decision process.

3. RESEARCH METHODOLOGY AND HYPOTHESES

3.1 Experiment design and data collection

A statistical analysis will be used to analyze whether there is a relationship between the number of posts and transactions in the first quarter of 2013. Discussion posts and transaction trends of 62 international brands (except Maybelline) were monitored, and retrieved from three main sources of China’s social media channels, namely, Weibo, forums and blogs Q&A. These are the most important discussion-based media in China with very active participation from local users every day. The online transaction index are retrieved from thebiggest

101

B2C e-commerce platform in mainland China, namely, Taobao, that climbs up to 51.3% [14] online sales in 2013 Q1.

3.2 Analysis and results

Descriptive statistics

To first compare the trends of discussion volume and transactions in beauty industry in Q1 of 2012 and 2013, data is collected using hot term extraction technology. The total numbers of relevant topics, posts and authors in the three channels are shown in Table 1. It is noted that an obvious shift from postings in forums/blogs Q&A to Weibo and about 4.24 times increases in Taobao transactions. There were totally 66,397 posts from weibo, forums and blogs Q&A contributed by 35,235 unique authorsin 2013 Q1. In 2012 Q1, there were totally 100,722 posts from weibo, forums and blogs Q&A contributed by 34,378 unique authors. From Table 2, there are 675.62 posts in average per day with standard deviation 450.02 posts in 2013 Q1, while 1,097.01 posts in average per day with standard deviation 351.19 posts in 2012 Q1.

Q1 topics posts authors # of people

viewed topics posts authors Skin care [護膚]

Channels Forums and Blogs Q&A (159 websites) Weibo Transaction index (Taobao) [min, max]

2013 12,329 36,664 11,440 25,062,767 11,800 29,733 23,795 [7, 390]

Channels Forums and Blogs Q&A (197 websites) Weibo Transaction index (Taobao) [min, max]

2012 36,159 95,462 30,679 307,272,196 4,108 5,260 3,699 [5, 92]

Table 1. China’s social media activities in Q1 of 2012 and 2013

Year Q1 Av. posts Av. Authors* Av. posts Av. Authors* Av. posts Av. Authors* Skin care [護膚]

2013 /

Channels Forums and Blogs Q&A (159

websites) Weibo ALL: Forums and Blogs Q&A (159

websites) and Weibo Transaction index

Channels Forums and Blogs Q&A (197

websites) Weibo ALL: Forums and Blogs Q&A (197

websites) and Weibo

Mean 1039.21 1049.03 57.80 48.81 1097.01 1097.85 47.49

Standard

deviation 356.40 351.19 65.90 55.65 351.19 352.75 19.91

*Same author can post in different days in Q1 of 2012 or 2013.

Table 2. Descriptive statistics of China’s social media activities in Q1 of 2012 and 2013

China e-Commerce reached RMB$352.1 billion in 2013 Q1, representing a year-over-year increase of 36.6% [14]. The Taobao transaction index increases in an average of more than 200% in 2013 Q1 comparing to that of 2012 Q1. A significant drop before CNY holidays in both years. In 2012 Q1, the lowest transaction index and discussion volume were found on the last day of CNY (年廿九), i.e. Jan 22, whereas it resulted in a transaction index fall of 5 points and only 417 discussion posts in total. Similarly, in 2013, the quarter lowest transaction index and discussion volume were on Feb 9 Feb (年廿九), resulted a transaction index fall to 7 points and 133 discussion posts. The trend for both discussion and transaction volumes resumed after CNY in both years and reached a climax on the 3.8 Women’s Day. Vivid seasonal patterns of CNY and 3.8 Women’s Day in both trends of discussion and transaction volume in Q1 of 2012 and 2013 can be seen in Fig. 1a. and 1b.

Both trends dropped below the normal level of an average of 4 days before the start of CNY holidays. The transactions and discussion volume resumed after the 4th day of CNY (年初四). In 2012, it took 7 and 9 days respectively for the transactions and discussion volume resumed to normal levels while a faster recovery of about 6 days was noted for both trends in 2013.

102

Types of discussion on different categories of cosmetic products

In 2013 Q1, the number of relevant posts regarding “advices seeking” and “reviews sharing” were 5,331 and 10,648 extracted from 107 websites and 129 websites respectively.

topics posts authors # of people

viewed topics posts authors # of people viewed

Types Advices seeking Reviews sharing

Channels Forums and Blogs Q&A (107 websites) Forums and Blogs Q&A (129 websites)

2013 Q1 3,321 5,331 3,056 4,250,178 6,825 10,648 6,702 5,659,906

Table 3. Types of discussion extracted from China’s social media activities in 2013 Q1

Discussion types Advices seeking Reviews sharing Advices seeking Reviews sharing

Rank/Channels Weibo Forums and blogs

(107 websites) Forums and blogs (129 websites) Top 1st Cosmetic Tools (32%) Cosmetic Tools (38%) Lotion & Cream Facial Care Set Top 2nd Facial Care Set (20%) Facial Care Set (16%) Facial Care Set Facial Mask Top 3rd Facial Mask (12%) Facial Mask (12%) Facial Mask Lotion & Cream Top 4th Lotion & Cream (6%) Lotion & Cream (8%) Eye Care Cosmetic Tools

Top 5th BB Cream (6%) Sun Block (7%) Sun Block Facial Cleansing

Top 5th BB Cream (6%) Sun Block (7%) Sun Block Facial Cleansing