INTRODUCTION

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Chapter 1 INTRODUCTION

1.1 Background and motivation

Due to the rapid development of computational power and data analysis over decades, it is easy to find out information from Internet-based resources. The appearance of Google Chrome, having optimized searching algorithms, cultivate people “google” habit when they confused about something at the very first moment. However, not everyone can come out precise queries which efficiently and effectively narrow down the range of diverse results. These users then often decide to add more details into the search to find closer answers. The truth is that the search results become more ambiguous and noisier if items in the searching pool do not have the exact same structure of the search query. Unlike above constraint of search engines, Q&A online forums (where users can post questions and respond to others’ questions) allow people to write anything in sentences if users feel hard to summarize their problems. Further, some Q&A online forums are more community-oriented mainly focusing on a particular domain to maintain the quality of their content. When time passes, text data collecting from platform’s users can be accumulated. By a suitable manipulation, it is feasible to optimize the whole asking process. This natural language-based environment encourages researchers and forums’

runners to make the asking process of users on Q&A websites more automatic and customized.

We summarize the asking process of users on Q&A websites in four steps: (a) the intention to look for a solution of a problem, (b) the action to search for asked relevant questions, (c) the action to formulate a new post if exist threads cannot answer doubts and finally (d) get solutions from other forums’ users. After observing several Q&A websites, we found the most direct supportive method in the asking process is to utilize various recommendation systems. Yahoo!

Answers, one of the earlier and well-known Q&A websites, asks users to categorize their questions by chosen topics before submitting their posts. A famous general Q&A website, Quora, which announced they were hit by 300 million user accounts monthly in September 2018, includes lots of modern websites’ mechanism. It combines traditional forums with the social mechanism (i.e. tagging) to provide recommendations and narrow down the scope of existed questions. Interestingly, most recommendation systems appear in step (b) and (d). Even the forum like Quora combines (b) and (c) together, its system seldom considers in how to make the new post having good expression and being with a direct objective (not only pair

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

similar questions’ threads). It is probably because the analyzation of posts, the user generated content (UGC), is very complicated.

In the past, most of studies focused on analyzing transaction logs to navigate users’ behavior, but often the result was not enough for revealing their information needs (Mccray, Loane, Browne, & Bangalore, 1999; Rose & Levinson, 2004). UGCs are the element containing useful message and information like contextual factors, including linguistic features that users formulated, motivations and timing to ask the questions. Through understanding UGCs, the cognitive representation of the problem, can improve the design of information systems (Zhang, 2010). However, the rising production of UGCs have become more complex lexicon and syntax at the same time, supportive methods (recommendation systems) turn to be sometime time-consuming and ineffective. For example, recommenders resent irrelevant threads when a post given out by the user is complicated because of not a native writing or too much professional details to categorize well. To create great user experience, why not develop a semi-automatic system that encourage users to reconsider and reformulate posts (UGCs) before submitting to forums? The goal of this design is to address difficulties of UGCs’ interpretation by converging ambiguous concepts of posts on the user side.

Another feature of Q&A forums is domain-specific. Healthcare and its social groups have been an attractive domain to users and researchers for a long time. A Pew research study (Fox &

Fallows, 2003) showed above 63% of Internet users in the U.S.A. searched health information online. More than 54% Internet users have visited websites sharing people’s experiences of medical condition and personal situation. Another 2013’s survey reported from Pew (Fox, 2013) presented 35% adults figured out what medical condition they or someone else might get online.

These online search results may guide users to decide their further medical appointment.

Although the clinician’s care and conversation about serious health episodes take place mostly offline, 53% of people who have figured out the medical condition themselves online have a habit of sharing what they have found with a clinician. In addition, when facing a problem that people are unfamiliar with, especially highly specialized areas like healthcare and medicine, they will try to ask someone who is with more expertise or able to share experiences (Wildemuth et al., 1994). It is not surprising that Pew Research in 2009¹ concluded that the

1 https://www.pewinternet.org/2009/06/11/the-social-life-of-health-information/

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

online information significantly impacts user decision on how they treat their own illnesses or care the health of their significant the other.

Healthcare Q&A forums can not only let users describe more details about their conditions but also become a good source of contextual factors that is to find the knowledge back of users’

information needs, which is fundamental for the design of information systems. There was a limitation of domain-specific Q&A forums due to low computing power and sparse theoretical coverage; however, current techniques are able to retrieve fruitful resources, comprehend more about natural languages and provide deeper reasoning (Arora, Li, Liang, Ma, & Risteski, 2016;

Gittens, Achlioptas, & Mahoney, 2017; Mimno & Thompson, 2017). Our study confirms online behaviours in healthcare field will be a steady presence in lives, so the optimization of the asking process on healthcare forums is important to help users deal with health-related problems. In addition, if the recommendation methods came from the ideas of data analysis could become more reliable in supporting the asking process, online users may be more willing to discuss their medical condition on Q&A websites.

1.2 Research purpose and questions

Even though search engines are the most popular channel to everyone, search results are sometimes too general to find solutions directly. Searching information on Google.com usually popped out good preselected hosts and Wikipedia results in the first listing (Höchstötter &

Lewandowski, 2006). It is hard for users who want to solve unfamiliar problems like healthcare and medicine conditions to follow because these answers are usually not customized or toward professional information. In addition, the search engine only gets results efficiently by keeping queries succinct while using a Q&A forum can type questions as long as users want. When the problem needs more details to comprehend, it is necessary for an asker to explain the whole thing. Therefore, online Q&A websites gradually attracted more users to join; however, if forums’ topics cover too wide and do not have any mechanism to control the quality of answers, finding professional and reliable answers may be tough. Take Yahoo! Answer for an example, some reviews in a question thread are just advertisements or unrelative descriptions attracting specific users. So, do users participate in a domain-specific Q&A forums solve all difficulties?

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

As we mentioned, not all users are familiar with specific domains. Take our study, healthcare, for example, when formulating queries, people without medical training often feel hard to formulate their requests. Different people may have different mental representation (e.g.

various description of the pain scale to the same illness) (Zeng-treitler, Kogan, Ash, & Greenes, 2002). Secondly, most users can only pop out simple words to describe their diseases and medical conditions. Vocabularies in queries are also not match with medical terminologies.

Sometimes, even the intention of asking questions is not concrete. Not to mention some lexical barriers such as partial misspellings and uses of abbreviations. Therefore, simply change a place to propose questions is not enough. It encourages us to propose a design that suggests users taking more ideas (e.g. topics and features) that they never think of and help them formulate posts with more reasonable details, not just presenting existed questions in the searching pool as most Q&A websites. Referred to the research from Baltadzhieva, enhancing the input content’s quality does affect users to get useful answers (2015). If a system can interpret the main idea of input content and recommend directions of modifications to make queries better, people finally get good feedbacks after a short time. We want to help participants engage in the process of making queries via a posting recommender. This design could be quite useful when general users are not familiar with a domain.

We plan to build two recommendation systems, a word embedding method and a semantic method, based on the concept of a posting recommender. The word embedding method is known to be one of the greater tools that can process words into space vectors and improve the understanding of human languages all by the machine. This recommendation, implemented by a Word2Vec model (Mikolov, Chen, Corrado, & Dean, 2013), trains 5,319 questions and 500 abstracts of publication crawled from health-related websites. The second recommendation adopts a WordNet² model, which is a lexical database for English, provides several synonyms that are tagged artificially. Grouping words together from their meanings makes it a useful tool for computational linguistic and natural language processing (NLP). Both Word2Vec model and WordNet model are meant to recommend some ideas and features that users may need in their current post but still not consider in. These feature-based recommendations push posts from users to become more subject-specific.

2 https://wordnet.princeton.edu/

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

To implement the recommendation action, we first adopt a processor to tag part-of-speech (POS) and extract features as a main idea from input sentence(s). Then, users decide to accept or not accept new ideas. We think that participating more in the user asking process based on text analytics may successfully support users formulating UGCs and enhance the integrity and distinction of askers posts in the end. But how to judge if our two posting recommenders are better than original designs? So, the last step is to verify whether reformulating queries organizes users’ wordings better and finding desired answers easier. We propose a user study and a satisfaction questionnaire to understand users’ perspectives of our recommender. In addition, posts written by our study’s participants will be evaluated by health-related background experts to see if each UGC can be solved easily at the first moment.

To sum up, two research questions are posited:

RQ1: Do the posting recommender help users formulate questions in healthcare Q&A forums?

RQ2: Do questions supported by the posting recommender attract experts to answer?

1.3 Contribution

In this work, we investigate how features-based mechanisms enhance rational and reliable recommendations when formulating questions. In the meantime, we want to intensify the convergence of UGCs that sometimes contain ambiguous terms or lack straight focus in a healthcare forum. If online answerers and experts feel hard to answer a condition without details, they may forgo the current query to browse another one. Therefore, this thesis sets the goal to build a closer link between questioners and answerers via the posting recommender. If numeric and interview results are positive, adding the semi-automatic method into the asking process may be feasible to construct clear posts and acquire high quality feedback easier (e.g.

more tagging recommendations or suitable answers from machine-chose and experts-replied).

Further, the recommendation mechanism we proposed may not limited in the healthcare area.

Take e-commerce for example, when people are purchasing products that they are unfamiliar, it is common for them to ask details before and after buying. If there is a system formulate questions before posting, problems may be solved by FAQs or posts that have already existed in the customer service webpage but now is hard to be found because of ambiguous wordings.

Thus, the unanswered rate in a Q&A forum may go down and the possibility of getting

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

solutions may go up. We think any industry who needs to deal with queries and interpret UGCs are suitable for building a post recommender in the users’ asking process.

1.4 Content organization

In Chapter 1, we elaborate the obstacle of searching for information only on the search engine.

The asking process and analyzation of UGCs are discussed in the second and third paragraph.

Next, we state the Q&A forum, especially domain-specific platforms such as healthcare and medicine, encourages users to solve problems online. This chapter is an overview of our research background, motivations, and defines the research purpose and questions. In Chapter 2, which is the literature review part, we will discuss the theoretical support that can help us define our research’s objectives and organize what others researcher have done in the field of recommendation mechanisms and healthcare Q&A forums. In Chapter 3, we go into details of how to process our research through building a visualization system with the recommendation mechanism. Because we have to validate the feasibility of posting recommenders, this section also organizes the design of our user study including what methods we use to judge this artefact.

In Chapter 4, we manipulate the statistical computation method to analyze data collected from participants’ behaviors on our system in user study. In addition, we compare participants’ posts rated by experts to the statistical result and briefly present the average satisfaction points of participants. In Chapter 5, we discuss the experimental result organized from Chapter 4. In Chapter 6, we conclude our research based on two research questions and talk about our research’s limitation, contributions and future works.

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

在文檔中線上論壇提問推薦機制：以醫療照護問答網站為例 - 政大學術集成 (頁 8-14)

國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

Chapter 1 INTRODUCTION

1.1 Background and motivation

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

1.2 Research purpose and questions

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

1.3 Contribution

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

1.4 Content organization

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學