Chapter 3 Methodology
3.2 Regression Model
The general purpose of regression model is to understand the relationship between several independent variables and a dependent variable. The dependent variable is the variable being tested or measured, while the independent variables are changed or controlled to observe the effect on the dependent variable. The model is presented as the following:
𝑦𝑖 = 𝛽0+ 𝛽1𝑥𝑖1+ 𝛽2𝑥𝑖2+ ⋯ + 𝛽𝑛𝑥𝑖𝑛+ 𝜖𝑖
Where y is the dependent variable, 𝑥𝑖𝑗 is the j-th independent variable, and there are n independent variables.
Since our research goal is to find out the potential impact of different keyword attributes on the advertising performance, we are convinced that regression model is well-suited for our data analysis. We choose the measurement commonly used in
keyword advertising evaluation as the dependent variable, such as clicks or conversions, and we design a set of independent variables which represent the keyword attributes.
The variable selection will be shown in details in the next chapter.
We use the R programming language’s linear and logistic regression functions to perform the calculation. For linear regression, the lm() function in R uses the linear least squares design from (Chambers, 1992). The glm() function used for logistic regressions
After the coefficients are fitted to the research data, we look for ones with small p-values, which indicates strong statistical relationship between the independent variables and the dependent variable. These relationships are then carefully examined against common sense and known business best practices to uncover some insight into the special nature of NPO keyword advertising.
Chapter 4
Empirical Analysis
In this chapter, we will first introduce our case – Open Culture Foundation. The content covers the background of the organization, the website structure, and the
donation project. Secondly, the research data will be presented in details, including data source, data format, data timeframe, and the data overview, that is, the overall online marketing performance in the given timeframe. Next, we will analyze the empirical data by conducting the regression analysis of keyword attributes in the AARRR model, which represents five phases of customer lifecycle and conversion behavior, to discover different attribute’s potential influences in each particular stage.
We select 12 important attributes in five categories as our independent variables, which are 1) Keyword Essence: foundation, open source, technology, government; 2) Event Type: computer science-related event, student summer/winter camp; 3) Action Type: call for donation, call for newsletter subscription; 4) Device Category: mobile traffic, tablet traffic7; 5) Language: Chinese, mixed language8. For each stage of the AARRR model, we designate the commonly used advertising performance
7 Device-wise, we choose “desktop traffic” as the reference level, and make “mobile traffic” and
“tablet traffic” as indicator variables. Details of variables are elaborated in Section 4.3.1.
8 “Chinese” and “mixed language” are indicator variables while “English” is the reference level.
measurement as the dependent variable. 1) Acquisition stage: Clicks; 2) Activation stage: Bounce rate; 3) Retention stage: Percentage of new sessions; 4) Revenue stage:
Conversions. However, the referral stage will be excluded from the research because we do not have any referral data related to Google AdWords keyword advertising. After examining the relationships between the independent variables (keyword attributes) and the dependent variable (advertising effectiveness measurement), we accordingly put forward our suggestions based on the findings and implications.
4.1 Open Culture Foundation
4.1.1 Organization Introduction
Open Culture Foundation (OCF) is a nonprofit organization pursues the awareness and usage of open source in a broad sense, founded in 2014 by several members of Taiwan’s open source communities.
The concept of open source has been influencing many ideologies and movements with its ethos of access to the source, free remix and redistribution, end to predatory vendor lock-in, and higher degree of cooperation (Socailsquare, 2014). The term
“source” was originally referred to source code in computing; however, the idea of open, transparent, accessible and participable source has been adopted in many fields, not limited to software and hardware engineering. For instance, the free sharing of skills and knowledge, the Creative Commons-licensed works, and the open documents of governments.
Taiwan’s open source communities have been highly active, and many events and conferences are held frequently by these communities. They usually also sell the events and conference tickets. Nevertheless, when it comes to handling the ticket’s transactions and accounts, it is a big trouble for each community due to the lack of expertise in government regulations and each organization’s own limitations. The idea of establishing a registered foundation9 was thus ignited, and OCF was founded accordingly.
The original intention of OCF was to assist the local communities in handling administrative issues such as ticket sales transactions and receipts. On top of that, it greatly helps online advertising campaigns, and sometimes provide volunteers on-site.
The main goal of OCF now has shifted to advocating the use of open source software/hardware and open data by supporting the open source communities.
On the other hand, in order to be self-sustained, OCF also launches donation campaigns to cover its own expenses, which were mostly spent on personnel costs.
Individuals and organizations’ donation is OCF’s main income source.
4.1.2 Website Structure
There are five main sections on the website of OCF: About, People, Projects, Journal, and Media kit. There is a “Join Us” button on top of the homepage, which leads to the donation page (more details explained in section 4.1.3). An English version is
9 According to Taiwan’s Civil Code, the non-profit-seeking legitimate and registered public groups can be categorized under charitable corporations; foundation is one of the varied forms.
available, yet not all contents are completed. The homepage screenshot and the website structure are shown in Figure 1 and Figure 2, respectively.
Note. From http://ocf.tw/ Retrieved September 10, 2017.
Figure 1: Screenshot of OCF homepage
Figure 2: OCF website structure
4.1.3 Donation Project
OCF initiated the Google AdWords campaign and later on joined Google Ad Grants program in March, 2015. On February 3rd 2016, OCF launched a long-term small donation project named “OCF 300 Warriors (OCF 開源 300 壯士)”10. It aims to call for donors to donate NTD 300 per month continuously. When a user reaches the donation page, he or she will read the descriptions, after filling out the online sheet and finishing the donation process, it will be recorded as one conversion in the Google Analytics report. Donors are free to terminate the donation, but the terminations are not reflected on the report.
The steps of conversion path are as follows:
Step 1: Understand how does the donation project works. OCF asks donors to donate NTD 300 per month, and the amount will be auto-paid by credit cards. Donors will receive an email each month after the transfer is done; donors can terminate the donations by contacting OCF. The default monthly payment is NTD 300, but donors are free to adjust the amount.
Step 2: Fill out the donation sheet. Donors will fill out the online sheet of payment setting (monthly amount) and personal information (name, email address, receipt info, ID number, etc.)
Step 3: Confirm the donation information.
10 Starting from May 2016, there is an another project called “OCF x g0v Joint Donation (OCF 開源 300 壯士 x g0v 大松認養人)” . The joint project asks donors to donate NTD 600 per month; half of the amount will be donated to OCF, and half to g0v. In this thesis, the term “donation” denotes the “OCF 300 Warriors” project only.
Step 4: Complete the donation. Donors will provide and submit the credit card information, and finish the donation process. Donors will be redirected to ECPay11, the third party payment webpage (https://payment.ecpay.com.tw/Cashier/AioCheckOut) afterward. Then the conversion is recorded in Google Analytics.
Figure 3: Conversion path
11 The payment service is provided by Green World FinTech Service Co. (綠界科技). See more service introduction on https://www.ecpay.com.tw/ Retrieved September 16, 2017.
Step 1: Read (Redirect to third party payment webpage)
4.2 Research Data
4.2.1 Data Scope
OCF granted us with the access to its Google Analytics account with the “viewing and analysis” permission.
The basic reports in Google Analytics contains two data types: dimensions and metrics. Dimensions are attributes of the data while metrics are quantitative
measurements. The dimensions and metrics we will use for our analysis are listed as follows:
Dimensions
1. Channels: There are five default channel categories in Google Analytics, indicating where does the acquisition come from. 1) Paid Search: Traffic that arrives through a paid search campaign like Google AdWords advertisements.
2) Direct: Traffic that arrives directly by typing the URL, clicking on the bookmark, etc. 3) Organic Search: Traffic that arrives through unpaid search like a non-paid Google Search result. 4) Social: Traffic that arrives through social media or social network like Facebook, Twitter, LinkedIn, etc. 5) Referral: Traffic that arrives after the user clicked on a website other than a search engine.
2. Keywords: OCF’s Google AdWords and Google Analytics accounts are linked, hence the keywords bought in Google AdWords with at least one click (used by users to reach the website) being tracked will be shown in this
dimension.
3. Device Category: There are three default categories, which are desktop, mobile, and tablet.
4. User Type: The two types are new (first-time) visitors and returning visitors.
Metrics
1. Clicks: The number of times users click on the advertisement.
2. Sessions: Total number of sessions within the date range. Session is a group of interactions one user takes within a given timeframe (30 minutes by default) on the website
3. % New Sessions: An estimated percentage of sessions created by the first-time visitors.
4. New Users: Total number of the first-time visitors.
5. Bounce Rate: The percentage of users to leave the website without any other interaction after viewing only one page.
6. Avg. Session Duration: The average length of a session, measured by seconds.
7. Conversions (Goal Completions): The total number of conversions to the goal. In OCF’s case, the goal is a user to complete the donation process.
As for the timeframe, we wish to evaluate the performance of keyword advertising of OCF in the year of 2017. The study was conducted in September, so the data we analyzed was from January to August, 2017.
4.2.2 Data Overview
In the timeframe of January to August 2017, OCF has accrued 32,268 sessions, acquired 25,199 new users, and reached 144 conversions. The average bounce rate is 75.15% and the average session duration is 79 seconds. The ratio of new users vs.
returning users is 8:2. We will present the overview of marketing performance in the perspectives of cross-year comparison, cross-channel comparison, and seasonality analysis in the following sections.
4.2.2.1 Cross-Year Comparison
Compared to the same timeframe in the previous year, both acquisition and conversion decrease, while user behavior (bounce rate and average session duration) improves. The retention rate remains steady. The changes from 2016 to 2017 of marketing performance is shown in Table 2.
Table 2: Changes from 2016 to 2017 (January – August) of marketing performance
Metric 2016 2017 Change Indicator
Session 58,411 32,268 -45% Decreasing
New User 45,765 25,199 -45% Decreasing
Conversion 211 144 -32% Decreasing
Bounce Rate 80.24% 75.15% -6% Improving
Session Duration
(sec.) 55 79 44% Improving
Ratio of New vs.
Returning User 7.84 : 2.16 7.81 : 2.19 -0.4% Stabilizing
4.2.2.2 Cross-Channel Comparison
For acquisition metrics, paid search channel contributes the most traffic and referral channel contributes the less. For adjusted12 conversions, the direct channel contributes the most conversions, while the paid search channel contributes the less.
12 The original conversion number is misleading due to some technical problems. The clarification and detailed adjustment are explained in Section 4.3.5.
For user behavior metrics, paid search has the worst performance both in average bounce rate (the highest) and average session duration (the shortest). For retention data, paid search channel has the highest new-to-returning user ratio, while referral channel has the lowest.
An overall view of the best and the worst performing channels in each
measurement is presented in Table 3. The numbers under those metrics are shown in the following tables and figures.
Table 3: Marketing performances overview by channel (January – August, 2017)
Metric Goal Best
Performance
Worst Performance
Session Higher Paid Search Referral
New User Higher Paid Search Referral
Conversion Higher Direct Paid Search
Bounce Rate Lower Organic Paid Search
Session Duration Longer Referral Paid Search
Ratio of New vs.
Returning User Lower Referral Paid Search
Note. In this discussion, we assume the lower the new-to-returning ratio the better.
Table 4: Sessions in different channels (January – August, 2017)
Channel Sessions Percentage
Paid Search 13,398 42%
Direct 8,653 27%
Organic Search 4,769 15%
Social 4,003 12%
Referral 1,415 4%
Total 32,238 100%
Figure 4: Percentage of sessions by channel (January – August, 2017) Paid Search
42%
Direct 27%
Organic Search 15%
Social 12%
Referral 4%
Table 5: New users in different channels (January – August, 2017)
Channel New Users Percentage
Paid Search 11,452 45%
Direct 7,106 28%
Organic Search 3,247 13%
Social 2,735 11%
Referral 677 3%
Total 25,217 100%
Figure 5: Percentage of new users by channel (January – August, 2017) Paid Search
45%
Direct 28%
Organic Search 13%
Social 11%
Referral 3%
Table 6: Adjusted conversions in different channels (January – August, 2017)
Channel Adjusted Conversions Percentage
Direct 65 45%
Social 42 29%
Referral 18 13%
Organic Search 18 13%
Paid Search 1 1%
Total 144 100%
Figure 6: Percentage of adjusted conversions by channel (January – August, 2017) Direct
45%
Social 29%
Referral 12%
Organic Search
13%
Paid Search 1%
Figure 7: Bounce rate in different channels (January – August, 2017)
Figure 8: Session duration (seconds) in different channels (January – August, 2017) 83.62%
Table 7: Sessions generated by new and returning visitors (January – August, 2017)
Channel New Visitors Returning Visitors Ratio of New vs.
Returning
In terms of seasonality, the period of May to June has accumulated most sessions;
and the peaks of conversion also happen in these two months.
0
4.3 Regression Analysis of Keyword Attributes in the AARRR Model
Our goal is to identify the significant factors or the specific attributes of the keywords that can affect advertising effectiveness in each stage of the AARRR model.
However, we do not have any referral data related to Google AdWords keyword advertising; hence the referral stage will be excluded from our following discussion.
In each phase, we will first present the overview of OCF’s keyword advertising.
Then, we will reveal the relationships between the keywords’ attributes and the advertising performances by using the regression analysis. Lastly we will make our suggestions based on the findings and the implications.
4.3.1 Sample and Variable
There are 278 keywords bought by OCF in the timeframe of January to August 2017. Each keyword can be reached by users via three different kinds of devices:
desktop, mobile, and tablet. Taken the device difference into account, we have the total sample size of 580 keywords.
For the distinct perspectives of evaluation – acquisition, activation, retention, and revenue – we choose different dependent variable to better understand the effectiveness.
However, to keep the comparison of keyword attributes among stages consistent, we use the same set of independent variables in each stage. We designate the attributes of
“Essence”, “Event type”, “Action type”, “Device category” and “Language” to put
independent variables into five main categories; the variable IDs are shown in the parentheses.
I. Essence of the keyword 1. Foundation (foundation)
2. Open Source in a Broad Sense (opensource) 3. Technology (tech)
4. Government (gov)
OCF is a nonprofit foundation whose ultimate goal is to advocate the usage of open source. Therefore, the majority of its keywords are related to foundation and open source. Among the keywords, there are also many falling into the category of
technology, that is, the keywords are about some specific programming languages, source codes, hardware and software, or online collaborative editors/platforms. Another type of keywords is government related. The free and accessible documents of public sectors, the movement of citizen participation, or the communities dedicated to governments’ open data, etc.
II. Event type of the keyword
5. Computer Science Related (event_cs)
6. Student Summer/Winter Camp (event_camp)
Since one of the main function of OCF is to promote and assist the events held by the local communities, there are many activity keywords. Because of the nature of open source events, most of them are computer science related, for instance, hackathon, COSCUP (Conference for Open Source Coders, Users and Promoters), PyCon (Python
Conference). Perhaps it is SITCON (Students’ Information Technology Conference) that gives OCF the idea to buy student camp related keywords, there are also a considerable amount of keywords about children and student summer/winter camp.
III. Action type of the keyword 7. Donation (action_donation)
8. Newsletter Subscription (action_newsletter)
When it comes to call-to-action advertisement of OCF, there are two types:
donation and newsletter subscription. Intuitively, this kind of keywords should have obvious difference from others, for example, higher conversion rate.
IV. Device category of the keyword 9. Mobile (device_mobile) 10. Tablet (device_tablet)
There are three device categories tracked in Google Analytics: desktop, mobile, and tablet. We choose desktop as the reference level, and make mobile and tablet as indicator variables. With the high smartphone user penetration in Taiwan, we wish to see whether there is, and how is the differences between desktop and mobile user behavior.
V. Language of the keyword 11. Chinese (language_ch)
12. Mixture of English and Chinese (language_mix)
There are three types of language: all Chinese characters (e.g. “開放政府”), all
English characters (e.g. “g0v”), or mixture of both (e.g. “政府 open data”). We choose English as the reference level, and we wish to find out whether or not the language of keywords is an important factor.
We labeled each keywords with the attributes above, and the example is shown in Table 8. The total count of each attribute is shown in Table 9.
Table 8: Example of keywords attributes labeling
基金會 COSCUP 政府open data 免費電子報
foundation 1 0 0 0
opensource 0 1 1 0
tech 0 1 0 0
gov 0 0 1 0
event_cs 0 1 0 0
event_camp 0 0 0 0
action_donation 0 0 0 0
action_newsletter 0 0 0 1
device_mobile 0 1 1 0
device_tablet 0 0 0 0
language_ch 1 0 0 1
language_mix 0 0 1 0
Table 9: Count and example of each independent variable
Variable Count Example
foundation 88 基金會, 基金會 徵才, 財團法人基金會, 基
金會 贊助, open culture foundation
opensource 215 自由軟體, 開放資料, COSCUP, firefox, g0v
Note. The total count exceeds the sample size because a keyword can have more than one attributes.
To clarify, keywords with the same word strings can be reached by users from different devices and accordingly have different labeling, and are regarded as different keywords.
Table 10: Example of labeling for keywords with same word strings
基金會 基金會 基金會
foundation 1 1 1
opensource 0 0 0
tech 0 0 0
gov 0 0 0
event_cs 0 0 0
event_camp 0 0 0
action_donation 0 0 0
action_newsletter 0 0 0
device_mobile 1 0 0
device_tablet 0 1 0
language_ch 1 1 1
language_mix 0 0 0
Also, keywords with exact same meaning can be presented in Chinese, English, or mixture of both languages. Therefore, they have different labeling, and are regarded as different keywords.
Table 11: Example of labeling for keywords with same meaning
政府開放資料 government
open data 政府open data
foundation 0 0 0
opensource 1 1 1
tech 0 0 0
gov 1 1 1
event_cs 0 0 0
event_camp 0 0 0
action_donation 0 0 0
action_newsletter 0 0 0
device_mobile 1 1 1
device_tablet 0 0 0
language_ch 1 0 0
language_mix 0 0 1
The approach of variance inflation factors (VIF) is commonly used to identify collinearity among independent variables. A recommended maximum VIF value is 5 (Rogerson, 2001) or even 4 (Pan & Jackson, 2008). We use the vif() function in R to perform the calculation, and the outcomes show that all VIF values are below the desired threshold.
Table 12: Variance inflation factors values
VIF
foundation 1.839480
opensource 2.988788
tech 2.572220
gov 1.517867
event_cs 1.048061
event_camp 2.365573
action_donation 1.098440 action_newsletter 1.312481
device_mobile 1.134302
device_tablet 1.161969
language_ch 1.860706
language_mix 1.113754
4.3.2 Acquisition
4.3.2.1 Descriptive Statistics
How do visitors find OCF and arrived at the website? There are many ways: a user can find the website through a Google AdWords text ad, by scanning the QR code, or by clicking the shared link on his or her friend’s tweet. According to Table 5, paid search (Google AdWords advertising) is the largest channel that contributes 11,452 new users, which accounts for 45% of the total.
In acquisition, we care about how users arrive on the OCF website. When clicking on an AdWords advertisement, a user will be redirected to the website. Therefore, we choose the total number of “clicks” for each keyword as the dependent variables.
There are total 20,279 clicks attributed from 580 keywords, with the average at 34.96 clicks per keyword. There is only one keyword, “基金會 (foundation)” in the
There are total 20,279 clicks attributed from 580 keywords, with the average at 34.96 clicks per keyword. There is only one keyword, “基金會 (foundation)” in the