舞臺劇產業中銷售資訊對銷售速率的影響：藉由台灣劇團資料之實證研究

(1)

國立臺灣大學管理學院資訊管理學系碩士論文

Department of Information Management College of Management

National Taiwan University Master Thesis

舞臺劇產業中銷售資訊對銷售速率的影響：

藉由台灣劇團資料之實證研究

Impact of Performance Ticket Sales Information on Sales Speed: An Empirical Study Based on a Taiwanese Theater

孫珮瑜 Pei-Yu Sun

指導教授：孔令傑博士 Adviser: Ling-Chieh Kung, Ph.D.

中華民國 106 年 7 月 July, 2017

(2)

(3)

謝辭

碩士班這兩年能夠加入資訊經濟與決策最佳化實驗室中的的大家庭，並得到熱誠的孔令傑老師的指教。這段時間中，老師一直以亦師亦友的態度，不斷地給予我機會去審思自己並提點不足的地方，提供我許多進步、發展的空間。除了課堂中的專業知識，如作業研究、資訊經濟等，老師更鼓勵我們不停突破自己的舒適圈，挑戰自己的極限。在您的帶領之下，碩一的時候，我就在決策分析研討會上得到最佳論文獎並在 PACIS 國際會議上用英文向台下的學者們講述我們的研究。碩二時開始撰寫論文，在期間老師指引我發想題目、提點架構與寫作方式、

並逐字審閱字句。專案部分也讓我參與神腦國際與 Coursera 的資料分析專案，

讓我實作如何清理、分析、並呈現資料。另外，我很感激老師您選擇我來當服務學習三的助教。藉由帶領著大學生推廣程式教育，我深刻瞭解回饋社會不僅是影響他人更是帶給自己快樂並適時地替自己充電。此外也特別感謝我的口試委員，

林真如老師、莊皓鈞老師與裘家寧老師，在口試時給予我許多建議與肯定。

當然，我要謝謝無時無刻都給予我支持並鼓勵我時刻充實、突破自我的家人。

感謝冠宇學長、何禾學姊、偉宏學長與騏瑋學長，幫我指引出碩班研究之路的明燈。感謝同屆的實驗室夥伴們，在研究的路上互相扶持和成長。謝謝雪兒、韋志和宸安在神腦專案時的合作與互助。謝謝維哲和千瑜在碩一時一起征服 PACIS 研討會。謝謝韋志、柏宣和宸安的帶領，讓日子過得快樂又有深度。感謝學弟妹敬傑、佩蓉、子翔和鑑霖，每次的互動都讓我有所學習。特別感謝一路走來心靈夥伴，維哲。謝謝你這些年的陪伴與鼓勵，有你我的世界多了許多色彩。未來我們也將一同互助學習，一同成長。

孫珮瑜謹致于臺大資訊管理研究所民國一百零六年七月

(4)

中文摘要

本研究資料透過銷售資訊的揭露，期望能幫助決策者最大化獲利。我們藉由迴歸模型分析一個台灣劇團的售票資料，研究消費者購票瞬間觀察到的各票種售票狀況如何影響消費者購買意願，並探討決策者持有此資訊後，能如何透過動態調整各票種的可售數量來最大化預期營收。我們發現票種的銷售量與銷售速度成正相關，但是和其他票種的累積銷售量呈負相關。這樣的現象尤其在票種之間的票價差距越大的時候越趨明顯。我們的研究也證實銷售速度會因為銷售期間的增加而降低。

透過需求預測模型，我們可以根據當前的銷售來計算並調整不同票種最適座位數量。我們也考量了動態決策的情境，使決策者能動態分配不同票種的座位數量，並最佳化獲利。在我們的數值實驗中，透過銷售資訊的揭露，本研究發現考慮銷售狀況對銷售速度影響的動態座位數量調整能使獲利提高 4.60%。

關鍵字：票券銷售、銷售資訊揭露、購買意願不確定性、動態座位分配、迴歸分析

(5)

Thesis Abstract

The data in this study highlight the lengths that a decision maker can go to in order to maximize its profit through sales information disclosure. By incorporating sales information of price bands into a single regression model, designed to represent the example of Taiwanese theater ticket theoretical framework formalizes how decision makers are able to affect customers’ valuation. It is verified that sales of a price band has a positive relation to its sales speed while the cumulated sales of other price bands have negative effects. The relationships intensify when the price of the price bands are closer to one another. Our analyses further support that the sales speed decreases when the length of selling period escalates.

The estimated demand system allows for the calculation of capacity allocation for each price bands based on its current sales. A scenario in which decision makers can dynamically allocate ticket capacities of various price bands of the same performance to optimize profit is investigated. The profit difference between the model with the sales information disclosure is 4.6% higher than the suboptimal model.

Keywords: Ticket Sales, Sales Information Disclosure, Valuation Uncertainty, Dynamic Seat Allocation, Regression Analysis

(6)

Chapter 1 Introduction

1.1 Background and motivation

Online environment allows merchants to play various inventory-related selling tactics by deciding whether to divulge certain information to customers. As a top online auction platform, eBay shows on the products’ detailed page the amount sold, the number of views per hour, and the quantity left when the products have high selling speed. China’s market has a more visible mechanism. Both Taobao and JD.com, two dominant online marketplaces in China, disclose inventory information directly on advertisements and on search result lists using a bar chart in bright red showing the percentage of products sold versus inventory left. The phenomenon is also found on hotel booking websites.

Hotels.com reveals the information that hotels were booked several hours ago beside the hotel names on the search result list. As for Bookng.com, it shows on the hotel’s detailed page that whether the room is in high demand. Group buying websites like Groupon, GOMAJI, and LivingSocial especially utilize the inventory-disclosure mechanism. Peo-

(9)

ple can see the amount of products sold almost on every web page. Inventory-related information disclosure tactics seem to be widely used by existing online merchants. They seem to believe that disclosing related information on the popularity of the product can trigger higher sales.

Academic evidence of the above practice were provided in several streams of literature. Bikhchandani et al. (1992) define that an informational cascade occurs when an individual, having observed the actions of those ahead of him, finds it optimal to follow the behavior of the preceding individual without regard to his own information. Watts (2002) studies a network setting in which an agent is able to affect its neighbors only.

It is found that limited numbers of initial shocks have the potential to trigger a global network’s information cascade. Herding and information cascade might also occur among professional analysts (Alevy et al., 2007; Welch, 2000). Ho et al. (2014) state that sales velocity indicates the popularity of the product. Showing high sales velocity would have a positive effect on purchase likelihood of customers that it even reverses the participants’

preferences for a product with a higher sales rank.

Both literature and practical applications have the information that sales outcome has effects on customers’ purchase decisions; however, limited literature on performance ticket selling discuss this issue. They normally assume people know their valuation (Courty, 2002; Leslie, 2004; Rascher et al., 2007) and only price change affect customer utility (Drayer et al., 2012; McAfee and Te Velde, 2006; You, 1999). As for the business aspect, we can see from the websites of ticket selling like Ticketmaster and UDN ticket selling that websites disclose cumulated sales outcome. Even though some do not directly show any number related to inventory on the detailed product page until customers click into

(10)

the show and choose the specific date of the performance, the websites need to disclose cumulated sales outcome during the process of seat selection. Normally ticket websites show the information by disclosing a seat map with different color as background indicat- ing various price bands and seats with red color are sold seats. Customers can directly understand the sales data and percentage of all price bands at a glance.

It is intriguing that people of both academic and business fields in performance ticket pricing have limited research whether people will be influenced by other people under valuation. Theater performance is a type of product with high valuation. The quality of performance might be affected by the location, seat, casts, actors’ emotion, and the entire crew. In comparison to the uncertainty people might obtain from general online product purchase, performance ticket buying should have higher risks. Customers might be more influenced by other people’s opinions; therefore, cumulated sales outcome might play a role in affecting the sales.

Performance ticket purchase has some other interesting characteristics that are different from most general products selling. First, performance tickets are perishable and have no salvage value. Second, there are multiple price bands in a show. Customers have different valuations for each price band based on their quality and customer preference.

Lastly, there is no inventory replenishment in performance. People might buy the tickets of higher or lower price when their most desirable price is sold out.

(11)

1.2 Research objectives

In this study, we investigate the relationship between ticket sales outcome and customer valuation and how the relationship affects the profit of a ticket seller. We obtain a performance data set spanned from 2008 to 2012 of a local theater group in Taiwan. They had been actively performing and producing innovative and educational kids’ drama for sixteen years. The ticket selling data set contains detailed information of ticket sales.

From the data set, we obtain the information of the sales volumes of all price bands for each performance in each day.

Aforementioned literature shows that customers’ valuation may be affected by sales outcome. Utilizing the data and regression model, how essential inventory information disclosure is to performance ticket selling can be estimated. In addition to the cumulated sales outcome data, we also add other related performance information like location and selling period into the model and examine the interaction effect between basic performance information and cumulated sales. The relationships of the target price band and other price bands are also investigated.

In addition to proving the relation between sales speed and cumulated sales data, an operational application is considered. In a performance, each price band has its own capacity and different price bands have various cumulated sales. We study a scenario in which allocated seats of a price band can be transferred to another price band while considering the potential influence on customers’ willingness to buy. The capacity of the price band that is not selling so well can be transferred to one that is with better sales performance. With quantity control, we examine how the additional information about

(12)

the relationship between sales outcome and customers’ valuation may contribute to a ticket seller’s profit. .

1.3 Research plan

The remainder of the thesis is organized as follows. In Chapter 2, we review some related works about perishable products, inventory disclosure, and ticket pricing. In Chapter 3, a proposed regression model describes the relationship of customer utility between sales speed. Chapter 4 shows the analysis based on the proposed models. An application based on our research finding is examined in Chapter 5. Chapter 6 concludes.

(13)

(14)

Chapter 2 Literature review

Our research discusses the relationship of ticket cumulated sales to its sales speed; therefore, we focus on three streams of literature: perishable product, sales information disclosure, and ticket selling.

2.1 Perishable products

Many inventory models assume that stock items can be stored indefinitely to meet future demands. However, certain types of inventories undergo change in storage so that in time they may become partially or entirely unfit for consumption (Nahmias, 1982). Those assets are called perishable products. Some examples are fresh food, fashion or seasonal goods, and travel-related products like hotels or air tickets. Theater tickets, of course, is a kind of perishable good.

Retailers offer discounts on perishable products nearing their shelf-lives to encourage consumers to buy under the assumption that customers are less willing to purchase per-

(15)

ishable products when the expiration date is near. The assumption is verified by Tsiros and Heilman (2005). They conduct a survey of 300 customers of a grocery shop and found that the willingness-to-pay of the customers decreases throughout the course of the product’s shelf life. In situations of discount offering for perishable goods, Sezen (2004) uses expected profit approach to identify the timing and quantum of discount for perishable commodities. Ramanathan (2006) extends the expected profit approach presented in Sezen (2004) by including decisions on the quantities of the perishable products to be stocked by the retailer. Zhao and Zheng (2000) conduct numerical studies to show the revenue impact of dynamic pricing on perishable assets. Their examples show that using optimal dynamic optimal policies achieves 2.4 − 7.3% revenue improvement over the optimal single price policy. Price changes become even more critical when the reservation price distribution shifts over time. In the case, the revenue increase can be as high as 100%. These results explain why yield management has become so essential to fashion retailing and travel service industries.

Weatherford and Bodily (1992) propose Perishable-Asset Revenue Management (PARM) and define it as optimal revenue management of perishable assets through price segmen- tation. Fourteen taxonomies are investigated such as discount price classes, group reser- vations, and overbooking. The fourteen taxonomies are mostly price-related; however, the price for theater are fixed in many circumstances. As Gallego and Van Ryzin (1994) state, in reality, fixed-priced policies appears to be more appealing for dynamic pricing needs constant adjustment and is undesirable for practical application. The option of dynamic pricing is thus eliminated in our situation. We focus on whether information disclosure of cumulated sales will affect the customers’ willingness to pay for various price

(16)

bands and leads to modification of sales speed.

2.2 Sales information disclosure

Aforementioned, we observe several e-commerce platforms utilizing sales information disclosure method to urge customers’ purchase desire. Academic evidence are also provided by several streams of literature. We know that people will be affected by others’ decisions, especially when entering a new arena of products. Word of mouth (WOM) is a classic example. It has been proven that WOM influences people’s short-term (Herr et al., 1991;

Bone, 1995; Anderson, 1998) and long-term judgements (Bone, 1995). A WOM data, collected from the Yahoo! movies website, show that WOM information offers significant explanatory power for both aggregate and weekly box office revenue, especially in the early weeks after a movie is released (Liu, 2006).

The situation is the same when disclosing sales information. Bikhchandani et al.

(1992) define information cascade as an action when a person follows the behavior of the preceding individual without considering his own information. The cascades can explain not only conformity but the rapid spread of new behaviors and the new behaviors might be positive or negative (Liang et al., 2014). The early demand realization attracts more customers when it is good; however, when the early demand realization is bad, it works in the opposite way to depress future demand. An experiment with 870 participants shows the same result (Tseng, 2016).

Chen et al. (2011) combine the effects of WOM and observational learning, which is the effect that customers are influenced by others’ actual purchase decisions. An intrigu-

(17)

ing finding is that, while negative WOM is more influential than positive WOM, positive observational learning information significantly increases sales but negative observational learning information has no effect. The finding conflicts with that of Liang et al. (2014).

Chen et al. (2011) argue that consumers might think that other people might have different individual preferences; therefore, bad sales does not necessarily indicate low quality.

In sum, sales outcome reveals the popularity of the product and the power of information disclosure of accumulated sales outcome seems to influence customers’ decision making. The effect can be strong. Ho et al. (2014) show that the positive effect of disclosing high sales velocity on purchase likelihood of customers can be so strong that it reverses the participants’ preferences for a product with a higher sales rank.

2.3 Ticket selling

Ticket selling owns the prerequisite circumstances identified by Kimes (1989). There are six prerequisites: (1) The ability to segment markets − By separating consumers into different groups, managers can have different marketing strategies and varying prices across groups. (2) Perishable inventory − This is a common issue faced by several facets of the tourism and hospitality industries (hotels/motels, airlines, and car rental agencies).

(3) Product sold in advance − This issue deals specifically with time and uncertainty of sales. (4) Low marginal sales costs − Servicing additional customers will not cost the firm a large amount of money. (5) High marginal production costs − Creating additional inventory is difficult. For example, it is practically impossible for a hotel to quickly add rooms or an airplane to quickly add seats. Therefore, as inventory runs low, managers may

(18)

have the opportunity to increase prices. (6) Fluctuating demand − The hotel industry, in particular, experiences frequent fluctuations according to season and day of the week.

Tickets for airlines, hotels, sports, and performance are all included.

Many past studies examine the relationship between price policy and revenue potential. Using censored regression and elasticity analysis, Rascher et al. (2007) demonstrates that variable pricing would have yielded approximately $590,000 additional ticket revenue for each major league team in 1996. Huntington (1993) indicates that theaters offering a range of ticket prices tend to return a higher box office revenue compared to those theaters that only offered tickets at a single price. Boyd and Boyd (1998) suggest that whenever secondary market sellers can resell tickets for a profit, it indicates that tickets are not priced optimally. Conversely, there are other events with high numbers of unsold seats, which indicates the tickets are priced too high (Howard and Crompton, 2004). Shapiro and Drayer (2012) find that it is essential to assess demand-based pricing to see if dynamic ticket pricing policy is appropriate to fit for every professional league because each team has a different market of consumers with varying perceptions of value and willingness to pay for tickets. They obtain the data of San Francisco Giants and find that ticket prices are significantly higher through dynamic ticket pricing compared with the fixed season ticket price. Even though these prices were still lower than secondary market prices for comparable tickets, the willingness to pay for the same tickets are better captured using dynamic pricing strategies.

Time’s effect on ticket pricing is also investigated. People hold different opinions on how time variable affects pricing. Rosen and Rosenfield (1997) assume that those attending later performances are paid to wait, getting their tickets at lower prices. A

(19)

policy of declining prices over time allows the waiting market to clear in all but the final period. Different preference groups are served sequentially. People who desire the service the most are willing to buy early if they expect that the price will not fall too quickly.

As for the case of sports tickets, the average dynamic-priced tickets gradually increase as the game drew closer (Shapiro and Drayer, 2012). Fans choosing to attend a game at the last minute are asked to pay a premium for that convenience. This phenomenon seems to target the customers Courty (2003) considers to be the typical price insensitive, last- minute buyer or what he calls the “busy professional”. Courty (2003) suggests that some people simply appreciate the convenience of having flexibility with their social calendar and are willing to pay a premium for that. He also categorizes another groups of people to be “diehard fans”. Those are the people who plan their schedule in advance and is not willing to pay anything if they cannot commit earlier. He suggests that the number of diehard fans is greater than busy professionals so it is optimal for ticket provider to focus on diehard fans and leave busy professionals to brokers.

Even though many articles support that dynamic pricing helps companies gain profit, research indicates that consumers will respond negatively to paying different prices for the same product (Hall and Hitch, 1939; Kung et al., 2002). Furthermore, with the proliferation of Internet, it is easier for customers to search for the product of their choice within their desired price range (Kung et al., 2002). In some established industries, such as auto retailing and term insurance, consumers pay lower prices as a result of the Internet. Car buyers use the Internet to gather information and borrow the negotiating clout of an online buying service. The Internet has increased consumers’ sensitivity to price and changes in price (Kotler, 2009).

(20)

In the case of theater, we believe the effect of negative response of customers to the price change is more significant. In reality, theater tickets are mainly sold through the Internet. It is extremely easy for customers to check the price. In addition, the existing sales method of most performance producers is that ticket prices are set from the beginning. They maintain consistency of prices between seat locations. Theater suppliers can only change price slightly by offering discounts.

A more related work is by Rosen and Rosenfield (1997). In the article, price discrim- ination among ticket service classes is analyzed when aggregate demand is known and individual preferences are private information. Serving customers in cheap second-class seats limits the sellers’ ability to extract surplus from expensive first-class seats because of some switch to the lower class. The article assumes that there are two different types of customers: high- and low-type. They want to know how high- and low-type tickets’

quantity and price are decided, focusing on substitution effect of the tickets at the same time. A buyer’s reserve prices for each type of seat are increasing functions of the in- tensity of the demand. The demand has two parameters α and β that depend on the specific service, seat quality, and prices of complements. The conclusion is that marginal buyers of first-class seats switch to second class rather than no purchase when price rises.

The marginal second-class buyers are less likely to cease purchasing. Customers with the highest general tastes for the basic service never leave the market when prices of their most preferred class of seat increases. The seller can charge them more and deter substitution to the other class by increasing the interclass quality difference. The cross effects of quality on price in the socially optimum marginal benefit calculations turn out to be symmetrically negative in both equations because an increase in one service quality

(21)

reduces consumer surplus in the other quality.

Streams of literature related to ticket selling investigate on how dynamic pricing in regards to time improves ticket sellers’ profit. However, with the proliferation of the Internet, customers can check prices with convenience and are more sensitive to price change. It is especially true when it comes to theater ticket selling. In this industry, price bands are predefined and will not alter throughout the entire sales period. Instead of focusing on pricing strategies, we turn to quantity control. Rosen and Rosenfield (1997) utilizes fixed-price strategy and assumes there are two segments of pricing with cross effects in the market; however, they do not investigate the relation of cumulated sales to customer utility and focus on the effect of price fluctuation. In this study, we investigate how cumulated sales outcome leverage the revenue of performance ticket retailers.

(22)

Chapter 3 Problem Description and Formulation

In this study, we investigate the relationship of cumulated sales outcome to customer utility and how the relationship affects the profit of a ticket seller. We obtain a performance data set spanned from 2008 to 2012 of a local theater group in Taiwan. They had been actively performing and producing innovative and educational kids’ drama for sixteen years. The ticket selling data set contains detailed information of ticket sales of every price band. For example, ticket time, selling period, and ticket sold per day. The data set is used to verify our hypotheses.

We define a show to have one or multiple performances. Take Notre Dame de Paris for example. It’s performance period is from February 24 to March 5 and there are thirteen performances during the period. Shows have great extends of uncertainty in terms of cast, story, and performance location. Neglecting one of the key factors of the show might lead to mismatch between expected quality and the actual quality of the performance. We

(23)

assume that there are fixed portions of knowledgeable customers who understand all the factors of theaters’ various price bands; therefore, those who are not familiar with the performance quality will consider cumulated sales outcome as a useful information.

This paper examines how cumulated sales outcome affects customer valuation on different price bands. The cumulated sales outcome mentioned is translated into the percentage of the sold quantities divided by the total quantity. Let’s use an example to explain the logic. There are two products in the market. Product A and B are sold for 12 and 20 products respectively in the same period. It seems that product B has a better sales; however, product A has 20 products in total and product B has 50. It is intuitive that people will consider product A as a more popular product. People not only look at the sales number but also the total inventory prepared because the risk of having stock out will also drive people to buy products. The data provided by the theater does not have the exact value of customer utility of various tickets. Therefore, we use tickets sold each day to be a proxy of customers’ willingness to buy. If the tickets of the price band is popular and people are more prone to purchase, the quantity tickets sold today will increase and vice versa. We believe that higher sales outcome leads to higher customer utility (sales speed) and Hypothesis 1 is stated.

Hypothesis 1: A price band’s cumulated sales positively affect sales speed of this band. More precisely, the higher a price band’s cumulated sales, the higher the sales speed of this band.

A show usually contains multiple price bands. When a show started to sell tickets, people evaluate various price bands, compare the utility of each price band, and purchase the one with maximum utility. If one of the price bands has a better sales outcome,

(24)

people will have been more willing to purchase it and less willing to purchase the others.

It is easy to justify that there are cross effects between various price bands; therefore, the cumulated sales outcome of other price bands are taken into consideration when examining a specific price band. Hypothesis 2 is made based on the assumption.

Hypothesis 2: Other price bands’ cumulated sales negatively affect sales speed of this band. More precisely, the higher other price bands’ cumulated sales, the lower the sales speed of this band.

The data set contains other information like the sales period, time, and location of the performances. We hypothesize that when the selling period become longer, the average sales speed will become slower. Hypothesis 3 is therefore formed.

Hypothesis 3: Sales period negatively affect sales speed of the performance. More precisely, the higher a performance’s sales period, the slower the sales speed of the performance.

Hsieh (2015) says that the total sales of the performances in the afternoon and in Taichung are the highest. As a result, we consider time and region of the performances factors that might influence our hypotheses. We add them as control factors of our model.

The data set spanned from 2008 to 2012; therefore, Y ear_k is also added to be another control variable. Other control variables includes Show_k, P erf ormanceId_k, and W eek_kt, where k stands for performance k and t is week t. W eek_kt stands for the t week since the start of ticket sales of performance k.

From the hypotheses, we formulate the equation

yikt

s_ikt+ q_ikt = β0+X

β_i^S( sikt

s_ikt+ q_ikt)+β₁^PP eriodk+β₂^PW eekkt+β₁^TM orningk+β₂^TEveningk

(25)

+β₁^RN orthern_k+ β₂^RSouthern_k+ β₁^T Y ear_k+ β₁^DShow_k+ β₂^DP erf ormanceId_k+ ,

where N is the set of price bands of a performance. For example, performance A has three price bands: 300, 500, and 600; therefore N is 1, 2, and 3. Three price bands ordered from the highest price band to the lowest. The price band 600’s i is 1, 500 is 2, and 300 is 3. y_ikt is the tickets’ sales speed in performance k during the week t for price band i. The notation β_j^S stands for sales related variable, while P is period, T is time, R is region, and D is identity (ID) related. Both T ime an Region are factored variables.

s_ikt is cumulated tickets sold of the price band i of performance k in week t. As for, q_iktis tickets’ quantity left of the price band i. We let Af ternoon_k and Central_k as the factors for T ime_k and Region_k for performance k. y_ikt and s_ikt are both divided by s_ikt + q_ikt, which is the capacity, to make performances of various capacity to have the same scale.

As we mentioned before, one of the unique characters of performance ticket selling is that a performance has different price bands. We examine the interaction between prices.

We hypothesize that the relationships of a price band to another should be stronger when their prices are closer to each other. Hypothesis 4 is added.

Hypothesis 4: The effects of a price band to another are stronger when their prices are closer to each other.

Using the historical ticket selling data from a Taiwanese theater, we use regression model to verify how cumulated sales outcome affect customer utility and how various price band’s sales outcome (_s ^s^ikt

ikt+qikt) affect the sales speed of other price bands (_s ^y^j^kt

jkt+qjkt, where j ∈ N and j 6= i). There are performances with two price bands to six price bands. The number of variables will be different when the show have different numbers of price bands. In the first part of Chapter 4, we aggregate all other price bands to a new

(26)

variable; as a result, performances with different numbers of price bands can be examined in the same model. Hypothesis 1 to 3 are examined. In the second part, we focus on verifying Hypothesis 4 and examine the relationships of price bands by categorizing the price bands to those with closer and farer prices to a specific price band.

(27)

(28)

Chapter 4 Analysis

4.1 Data cleansing

The theater data set is from 2008 to 2012. In prior to making analysis, we adjust and clean the data set in order to fit the hypotheses we made.

First, there are some days with no historical sales. What we need in our research is sales speed so that we can understand customer preference. In order to measure how cumulated sales will affect the sales speed, we add the missing dates back to the historical sales and set the sales amount to be zero.

Second, we find that when the tickets of a price band are about to be sold out, sales distortion occurs. In Figure 4.1, the sales speed of the price bands with higher cumulated sales becomes slower in comparison to its previous sales speed. The sales speed of other price bands accelerates. Instead of flocking to purchase the price band’s tickets, people turn to other price bands. One possible reason might be that when the tickets are about

(29)

to sold out, the remaining tickets scatter over the entire segments; therefore, groups of people cannot purchase the seats linked together. People will either purchase tickets of other price bands or make no purchase. We assume that the distortion effect occurs when the tickets are sold 90% and ignore the data when any one of the price bands’ cumulated sales exceed 90%. The most important reason to ignore data of those cumulated sales exceed 90% is that if a ticket is sold out, the sales speed will be zero afterwards; however, the cumulated sales will be 100%. The data without a doubt will lead to biased results.

Third, there are times when people ask for refunds. We consider refunds as negative sales. The reason is that the data set does not document every single transaction of the ticket sales. It shows the aggregate sales of one day. When refunds are higher than tickets sold, the sales data that day is documented as negative sales. When the sales are positive, the information might include refunds with less quantity than sales.

Fourth, we find that the time unit of the sales speed is important. Take Show 4 as an example, when we look at sales speed data (Number of tickets sold, N umSold_ikt) of Show 4, we find that there are many zeros in the data (Figure 4.2a). As a result, we aggregate sales data by week to decrease the quantity of zeros (Figure 4.2b).

Lastly, the interactions between the price bands of a performance is also investigated in the paper. We, therefore, select the performances who has sufficient numbers of price bands, which are the ones greater than 3 price bands. After eliminating the unfit performances, there are 30 shows and 324 performances in total out of the five-year data set.

The variable mentioned in the hypotheses are the cumulated sales outcomes (_s ^s^ikt

ikt+qikt) of the investigated price band and other price bands, sales period, time (morning, af-

(30)

(a) Performance 26 (b) Performance 133

(c) Performance 277 (d) Performance 348

Figure 4.1: Weekly Cumulated Sales of Performances.

(31)

(a) Daily (b) Weekly

Figure 4.2: Aggregate Weekly Sales of Show 4.

ternoon, and evening), and region (Northern, Central, and Southern). In addition, we add Y ear_k as another variable because the data set spans various year. We there let cumulated sales outcomes of the price band i (_s ^s^ikt

ikt+qikt) as CumSelf_ikt and cumulated sales outcomes of other price bands (j ∈ N , and j 6= i) as CumOtherikt (Table 4.1).

4.2 An illustrative example of data processing

Here we use an example with three price bands to let readers understand our data processing. The section is separated in to two parts. The first part focuses on the relationship of the cumulated sales of a specific price band in a performance and other price bands.

The second part focuses on the interaction between a specific price band and the price bands closer or farer to it. As a result, the regression models in the the separate sections are different and the data transformations are different.

(32)

Variable Type Possible Values

CumSelf_ikt Numerical 0 – 0.9

CumOtherjkt Numerical 0 – 0.9

P eriod_k Numerical 13 – 359

W eek_kt Categorical 1 – 53

T imek Categorical M orningk, Af ternoonk, Eveningk

Region_k Categorical N orthern_k, Central_k, Southern_k

Y ear_k Numerical 2008 – 2012

Showk Categorical 1 – 30

P erf ormanceId_k Categorical 1 – 324

Table 4.1: Type and Possible Values of Variables

4.2.1 Testing Hypothesis 1 to 3

Table 4.2 lists an example data set for the first part. There are three price bands in the performance. P_ik stands for price band i of performance k. P_1k is the price band with the highest price and P_2k is the second highest price band. P_3k is the price band with the lowest price for performance k. W eek_kt denotes the number of week counted from the start of the sales.

From the data set of local Taiwanese theater, we can also obtain the capacity for each price band (Table 4.3 ).

We let N umSold = ^{Salesof P}_Capacity^ik

ik (Table 4.4).

However, normally people can only see the cumulated sales of tickets rather than the

(33)

W eekkt Sales of P1k Sales of P2k Sales of P3k

1 0 3 5

2 4 2 1

3 4 3 0

4 2 1 5

5 2 0 0

Table 4.2: Sales of the Example Data Set 1

Capacity of P_1k Capacity of P_2k Capacity of P_3k

12 80 120

Table 4.3: Capacity of the Example Data Set 1

W eek_kt N umSold for P_1k N umSold for P_2k N umSold for P_3k

1 ₁₂⁰ = 0 ₈₀³ = 0.0375 ₁₂₀⁵ = 0.041

2 ₁₂⁴ = 0.33 ₈₀² = 0.0250 ₁₂₀¹ = 0.008

3 ₁₂⁴=0.33 ₈₀³ = 0.0375 ₁₂₀⁰ = 0

4 ₁₂²=0.17 ₈₀¹ = 0.0125 ₁₂₀⁵ = 0.041

5 ²₁=0.17 ₈₀⁰ = 0 ₁₂₀⁰ = 0

Table 4.4: N umSold for Example Data Set 1

(34)

W eek_kt Cumulated Sales for P_1k Cumulated Sales for P_2k Cumulated Sales for P_3k

1 0 3 5

2 4 5 6

3 8 8 6

4 10 9 11

5 12 9 11

Table 4.5: Cumulated Sales of Example Data Set 1

W eek_kt CumSelf_1kt CumOther_1kt 1 ₁₂⁰ = 0 ₈₀₊₁₂₀³⁺⁵ = 0.04 2 ₁₂⁴ = 0.33 ₈₀₊₁₂₀⁵⁺⁶ = 0.055 3 ₁₂⁸ = 0.67 ₈₀₊₁₂₀⁸⁺⁶ = 0.07 4 ¹⁰₁₂ = 0.83 ₈₀₊₁₂₀⁹⁺¹¹ = 0.1 5 ¹²₁₂ = 1 ₈₀₊₁₂₀⁹⁺¹¹ = 0.1

Table 4.6: CumSelf_1kt and CumOther_1kt of Example Data Set 1

sales speed. We transform sales data of P_ik into cumulated sales (Table 4.5).

Let we are examining the price band 1 now; therefore, CumSelf_1ktis P_1kand CumOther_1kt is the combination of P_2k and P_3k where its formulation is (

P j6=i(s^j) P

j6=i(s^j+qj)).

Aforementioned, the data that has CumSelf > 0.9 will be deleted. In Table 4.6, we find the CumSelf of the last row exceed 0.9 and the data and the data afterwards are ignored.

(35)

W eek_kt Sales of P_1k Sales of P_2k Sales of P_3k Sales of P_4k

1 0 3 5 0

2 1 2 2 0

3 5 6 0 1

4 0 0 3 2

5 4 0 0 1

Table 4.7: Sales of the Example Data Set 2

Capacity of P_1k Capacity of P_2k Capacity of P_3k Capacity of P_4k

10 20 30 100

Table 4.8: Capacity of the Example Data Set 2

4.2.2 Testing Hypothesis 4

When it comes to the second part, we investigate if the effects of a price band to another are stronger when their prices are closer to each other. Price bands need to be separately examined. We categorize other price bands as two types: Near and Far. For P_1k, the price bands that are near is P_2k and the rest are far. For P_2k, the near price bands are P_1k and P_3k. The CumSold for Near and Far are CumN ear_ikt and CumF ar_ikt separately.

The calculation of the both variable are the same as that to CumOther_ikt

We use another example data set (Table 4.7) here and list CumSelf_2ktand CumN ear_2kt and CumF ar2kt are listed in Table 4.9.

(36)

Week CumSelf_2kt CumN ear_2kt CumF ar_2kt 1 ₂₀³ = 0.15 ₂₀₊₃₀⁰⁺⁵ = 0.10 ₁₀₀⁰ = 0 2 ₂₀⁵ = 0.25 ₂₀₊₃₀¹⁺⁷ = 0.16 ₁₀₀⁰ = 0 3 ¹¹₂₀ = 0.55 ₂₀₊₃₀⁶⁺⁷ = 0.26 ₁₀₀¹ = 0.01 4 ¹¹₂₀ = 0.55 ₂₀₊₃₀⁶⁺¹⁰ = 0.32 ₁₀₀³ = 0.03 5 ¹¹₂₀ = 0.55 ¹⁰⁺¹⁰₂₀₊₃₀ = 0.40 ₁₀₀⁴ = 0.04

Table 4.9: CumSelf_2kt, CumN ear_2kt, and CumF ar_2kt of Example Data Set 2

4.3 Exploratory data analysis

Prior to getting into detail, we describe the data set to let readers understand the data more. The data set contains 36,283 rows of data. There are 30 shows and 324 performances in total out of the five-year data set. The average numbers of performance in a show is 10.8.

First, we test on CumSelf_ikt and aggregate other price bands to CumOther_ikt to put data of different numbers of price bands in one model. Variables used in this part is CumSelf_ikt, CumOther_ikt, P eriod_k, T ime_k, Region_k, and Y ear_k.

Before looking at CumSelf_ikt× Capacity_ik and CumOther_ikt × Capacity_ik, we investigate the numbers of tickets sold without dividing capacity, which we interpret it as N umSoldikt. The results of both N umSoldikt and cumulated sales of both variables are listed in Table 4.10. The variation of all four variables are huge and from Figure A.1a to Figure A.1d, we can conclude that all distributions of the variables are exponential. For P eriodk, the variation is also large and its distribution is right skewed (Figure A.1e).

(37)

Variable Mean Standard Deviation Maximum Minimum

N umSold_ikt (y_ikt) 6.625 15.578 347 −158

CumSelfikt× Capacityik 39.050 42.111 437 0

CumOther_ikt× Capacity_ik 175.590 162.877 1390 0

P eriod_k 144.843 87.734 359 13

Table 4.10: Exploratory Analysis of the Variables of the First Part

As for the control variables, we look at how many performances are in different cate- gories or values of T ime_k, Region_k, and Y ear_k. The performances played in the morning are 83, in the afternoon are 122, and in the evening are 119 (Figure A.1f). There are 284 performances took place in Northern Taiwan, 28 in Central, and 12 in Southern (Fig- ure A.1g). When it comes to Y ear_k, we find that number of performances played has an increasing trend and the quantities are 44, 44, 70, 72, and 94 (Figure A.1h).

4.4 Regression analysis

There are two parts in this section. Part A mainly discusses whether cumulated sales outcome affects weekly sales speed under various circumstances. Part B focuses on the interactions between price bands and whether price bands with closer prices have larger effects on the sales speed of the targeted price band.

The main difference between the two parts is the form of other price bands. In our data set the shows have either 4, 5, or 6 numbers of price bands in a performance. In part A, we mainly discuss the effect of cumulated sales to sales speed; therefore, instead of

(38)

separating the performances having different numbers of price bands, we aggregate all the other price bands to be (

P j6=i(s^j) P

j6=i(s^j+qj)) and name it as CumOther_ikt. After this aggregation, we can put data of various numbers of price bands in the same model. In the second part, we aggregate the prices that are close to the target price band to CumN ear_ikt and the ones farer to CumF ar_ikt.

4.4.1 Testing Hypothesis 1 to 3

There are four models in part A. We first focus on whether CumSelf_ikt is having a positive relation to W eeklySalesikt and if CumOtherikt has a negative relation. This is directly investigated by model A₁. We add variables that belong to the basic performance information people can get when buying tickets into consideration in Model A₂. P eriod_k, T imek, Regionk, and Y eark are added. The interaction effects of performance information to CumSelf_ikt are further added in Model A₃ to see more detailed interactions.

We further add Show_k, P erf ormanceId_k, and W eek_kt as control variables to Model A₃ as Model A^D₃ to control the variation among performances. The effect of CumSelf_ikt and CumOther_ikt to W eeklySales_ikt might not be linear. Model A₄ is then proposed to examine the relationship. We take Af ternoon_k and Central_k as factors for T ime_k and Region_k. The reference table of variable names and math notations is in Table 4.11.

Model A₁:

W eeklySales_ikt = β₀+ β₁^SCumSelf_ikt+ β₂^SCumOther_ikt+

Model A₂:

W eeklySalesikt = β0+ β₁^SCumSelfikt+ β₂^SCumOtherikt+ β₁^PP eriodk

(39)

Variable Name Math Notation W eeklySales_ikt (_s ^y^ikt

ikt+qikt) CumSelfikt (_s ^s^ikt

ikt+qikt) CumOther_ikt (

P

j6=i(s^jkt) P

j6=i(s^jkt+qjkt)) Table 4.11: Variable Names and Notations

+β₁^TM orningk+ β₂^TEveningk+ β₁^RN orthernk+ β₂^RSouthernk

+β₁^YY ear_k+

Model A₃:

W eeklySales_ikt = β₀+ β₁^SCumSelf_ikt+ β₂^SCumOther_ikt+ β₁^PP eriod_k

+β₁^TM orningk+ β₂^TEveningk+ β₁^RN orthernk+ β₂^RSouthernk+ β₁^YY eark

β₁^{P I}P eriod_k× CumSelf_ikt+ β₁^{T I}M orning_k× CumSelf_ikt+ β₂^{T I}Evening_k× CumSelf_ikt

+β₁^RIN orthern_k× CumSelf_ikt+ β₂^RISouthern_k× CumSelf_ikt+

Model A^D₃:

W eeklySalesikt = β0 + β₁^SCumSelfikt+ β₂^SCumOtherikt+ β₁^PP eriodk+ β₂^PW eekkt

+β₁^TM orning_k+ β₂^TEvening_k+ β₁^RN orthern_k+ β₂^RSouthern_k+ β₁^YY ear_k

+β₁^DShow_k+ β₂^DP erf ormanceId_k+ β₁^{P I}P eriod_k× CumSelf_ikt

+β₁^{T I}M orning_k× CumSelf_ikt+ β₂^{T I}Evening_k× CumSelf_ikt

+β₁^RIN orthernk× CumSelfikt+ β₂^RISouthernk× CumSelfikt+

(40)

In Model A₃ and A^D₃, we added the interaction effects and are denoted by β_j^XI, where X ∈ {S, P, T, R}.

We can find that the three models all have a significant correlation between cumulated sales to sales speed and a negative correlation between other price band’s cumulated sales to sales speed (Table 4.12). Hypothesis 1 and 2 are therefore proven. As for Hypothesis 3, the results of Model A2 and A3 both show that the length of sales period negatively affects sales speed and verified Hypothesis 3; however, when control variables Show_k, P erf ormanceId_k, and W eek_kt are added, P eriod_k becomes positive. W eek_kt are mostly negative with the level of W eekk1, which indicates that the sales speed of Week 1 is normally the fastest.

It is also considered that if the effect of Hypothesis 1 and 2 are not only linearly correlated. We add two variables: CumSelf_ikt² and CumOther_ikt² to Model A₃ as Model A₄.

Model A₄:

W eeklySales_ikt = β₀ + β₁^SCumSelf_ikt+ β₂^SCumOther_ikt+ β₁^PP eriod_k+ β₂^PW eek_kt +β₁^TM orning_k+ β₂^TEvening_k+ β₁^RN orthern_k+ β₂^RSouthern_k+ β₁^YY ear_k

+β₁^DShow_k+ β₂^DP erf ormanceId_k+ β₁^{P I}P eriod_k× CumSelf_ikt +β₁^{T I}M orning_k× CumSelf_ikt+ β₂^{T I}Evening_k× CumSelf_ikt +β₁^RIN orthern_k× CumSelf_ikt+ β₂^RISouthern_k× CumSelf_ikt

+β₃^SCumSelf_ikt² + β₄^SCumOther_ikt² +

The previous variables maintain the same effects in Model A₄ (Table 4.12). If we fix all other variables and the equation that only consider the effect of CumSelfikt and

(41)

Regression Model A1 Model A2 Model A3 Model A^D₃ (Intercept) 1.11 × 10⁻²*** 9.84 × 10⁻¹ −4.16 × 10⁻¹ 3.16 × 10¹ CumSelf_ikt 1.49 × 10⁻¹*** 1.46 × 10⁻¹*** 3.02 × 10⁻¹*** 3.30 × 10⁻¹***

CumOther_ikt −5.66 × 10⁻²*** −7.41 × 10⁻²*** −6.56 × 10⁻²*** 1.21 × 10⁻²* P eriodk −1.85 × 10⁻⁴*** −7.39 × 10⁻⁵*** 1.38 × 10⁻⁴**

Y eark −4.61 × 10⁻⁴ 2.19 × 10⁻⁴ −1.64 × 10⁻²

T imek

M orningk −2.66 × 10⁻³* 4.54 × 10⁻³* −5.69 × 10⁻³

Eveningk −4.14 × 10⁻³*** −5.76 × 10⁻³*** −4.74 × 10⁻³ Region_k

N orthern_k −6.97 × 10⁻⁴ 7.92 × 10⁻³** −1.64 × 10⁻¹***

Southern_k −2.66 × 10⁻³ −1.01 × 10⁻³ −1.44 × 10⁻¹***

CumSelfikt× P eriodk −5.08 × 10⁻⁴*** −6.41 × 10⁻⁴***

CumSelfikt× T imek

CumSelfikt× M orningk −2.37 × 10⁻²*** −2.91 × 10⁻²***

CumSelfikt× Eveningk 6.45 × 10⁻³ 6.51 × 10⁻³

CumSelf_ikt× Regionk

CumSelf_ikt× N orthernk −5.58 × 10⁻²*** −3.90 × 10⁻²***

CumSelf_ikt× Southern_k 7.81 × 10⁻³ 3.58 × 10⁻²

Other Control Variables Showk

P erf ormanceIdk

W eekkt

R² 12.50% 18.54% 20.83% 40.68%

Adjusted R² 12.49% 18.51% 20.78% 39.60%

***p < 0.001, ** p < 0.01, * p < 0.05

Table 4.12: The Results of Part A

(42)

CumOther_ikt will be

W eeklySales_ikt= 0.469CumSelf_ikt− 0.157CumSelf_ikt²

+0.247CumOther_ikt− 0.369CumOther_ikt² .

The effects of CumSelf_ikt² and CumOther_ikt² are both interactive effects of that of CumSelf_ikt and CumOther_ikt.

We can obtain the first derivatives of y of CumSelf_ikt and CumOther_ikt.

∂W eeklySales_ikt

∂CumSelf_ikt = 0.469 − 0.314CumSelf_ikt.

∂W eeklySales_ikt

∂CumOther_ikt = 0.247 − 0.738CumOther_ikt.

Both CumSelf_ikt and CumOther are concave functions. Negative CumSelf_ikt² explains that the effect of CumSelf_ikt continues to increase positively with decreasing margin (Figure 4.3a). As for the relationship of CumOther_ikt to W eeklySales_ikt is that it first decreases positively and then when cumulated sales of other price bands is high, customer tend to have less incentive to purchase the price band’s ticket (Figure 4.3b). The situation is due to the interaction of two effects. One is that when people find that other price bands are selling good, they consider the show worthy to watch and purchase their most preferable price band of the show. The second one is that when people find that other price bands are selling good, they turn to buy the price bands with higher sales. From Graph 4.3b, we can conclude that the first effect is stronger during the start of the sales period and the second effect is stronger in the end of the sales period.

We verify the correlation coefficients between variables (Figure 4.4). We find that no two variables are highly correlated.

(43)

(a) CumSelf_ikt (b) CumOther_ikt

Figure 4.3: Change of W eeklySales_ikt.

Figure 4.4: Correlation Coefficients of Variables

(44)

4.4.2 Testing Hypothesis 4

When it comes to part B, we investigate if the effects of a price band to another are stronger when their prices are closer to each other. Price bands need to separately examined. We categorize other price bands as two types: Near and Far. For P1k, the price bands that are near is P_2k and the rest are far. For P_2k, the near price bands are P_1k and P_3k. The CumSold_ikt for Near and Far are CumN ear_ikt and CumF ar_ikt separately.

Model B:

W eeklySales_ikt= β₀+ β₁^SCumSelf_ikt+ β₂^SCumN ear_ikt+ β₃^SCumF ar_ikt

+β₁^PP eriodk+ β₂^PW eekkt+ β^T₁M orningk+ β₂^TEveningk+ β₁^RN orthernk+ β₂^RSouthernk

+β₁^YY eark+ β₁^DShowk+ β₂^DP erf ormanceIdk+ β₁^{P I}P eriodk× CumSelfikt

+β₁^{T I}M orning_k× CumSelf_ikt+ β₂^{T I}Evening_k× CumSelf_ikt

+β₁^RIN orthern_k× CumSelf_ikt+ β₂^RISouthern_k× CumSelf_ikt+

From Table 4.13 we find that the correlation between CumSelf_ikt, CumN ear_ikt, and CumF arikt to W eeklySalesikt has the same pattern as that of Model A^D₃ and A4. From Figure 4.5, we can also find that Model B has the same pattern as A₄. CumSelf_ikt, CumN ear_ikt, and CumF ar_ikt are all concave. CumN ear_ikt and CumF ar_ikt both are with decreasing marginal increase first and then reach an U-turn and decrease with increasing margin. Interestingly, CumF ar_ikt has larger effects compare with CumN ear_ikt. Hypothesis 4 is not verified and can be further investigated in the future.