國立臺灣大學管理學院資訊管理學系 碩士論文
Department of Information Management College of Management
National Taiwan University Master Thesis
舞臺劇產業中銷售資訊對銷售速率的影響:
藉由台灣劇團資料之實證研究
Impact of Performance Ticket Sales Information on Sales Speed: An Empirical Study Based on a Taiwanese Theater
孫珮瑜 Pei-Yu Sun
指導教授:孔令傑 博士 Adviser: Ling-Chieh Kung, Ph.D.
中華民國 106 年 7 月 July, 2017
謝辭
碩士班這兩年能夠加入資訊經濟與決策最佳化實驗室中的的大家庭,並得到 熱誠的孔令傑老師的指教。這段時間中,老師一直以亦師亦友的態度,不斷地給 予我機會去審思自己並提點不足的地方,提供我許多進步、發展的空間。除了課 堂中的專業知識,如作業研究、資訊經濟等,老師更鼓勵我們不停突破自己的舒 適圈,挑戰自己的極限。在您的帶領之下,碩一的時候,我就在決策分析研討會 上得到最佳論文獎並在 PACIS 國際會議上用英文向台下的學者們講述我們的研 究。碩二時開始撰寫論文,在期間老師指引我發想題目、提點架構與寫作方式、
並逐字審閱字句。專案部分也讓我參與神腦國際與 Coursera 的資料分析專案,
讓我實作如何清理、分析、並呈現資料。另外,我很感激老師您選擇我來當服務 學習三的助教。藉由帶領著大學生推廣程式教育,我深刻瞭解回饋社會不僅是影 響他人更是帶給自己快樂並適時地替自己充電。此外也特別感謝我的口試委員,
林真如老師、莊皓鈞老師與裘家寧老師,在口試時給予我許多建議與肯定。
當然,我要謝謝無時無刻都給予我支持並鼓勵我時刻充實、突破自我的家人。
感謝冠宇學長、何禾學姊、偉宏學長與騏瑋學長,幫我指引出碩班研究之路的明 燈。感謝同屆的實驗室夥伴們,在研究的路上互相扶持和成長。謝謝雪兒、韋志 和宸安在神腦專案時的合作與互助。謝謝維哲和千瑜在碩一時一起征服 PACIS 研討會。謝謝韋志、柏宣和宸安的帶領,讓日子過得快樂又有深度。感謝學弟妹 敬傑、佩蓉、子翔和鑑霖,每次的互動都讓我有所學習。特別感謝一路走來心靈 夥伴,維哲。謝謝你這些年的陪伴與鼓勵,有你我的世界多了許多色彩。未來我 們也將一同互助學習,一同成長。
孫珮瑜 謹致 于臺大資訊管理研究所 民國一百零六年七月
中文摘要
本研究資料透過銷售資訊的揭露,期望能幫助決策者最大化獲利。我們藉由 迴歸模型分析一個台灣劇團的售票資料,研究消費者購票瞬間觀察到的各票種售 票狀況如何影響消費者購買意願,並探討決策者持有此資訊後,能如何透過動態 調整各票種的可售數量來最大化預期營收。我們發現票種的銷售量與銷售速度成 正相關,但是和其他票種的累積銷售量呈負相關。這樣的現象尤其在票種之間的 票價差距越大的時候越趨明顯。我們的研究也證實銷售速度會因為銷售期間的增 加而降低。
透過需求預測模型,我們可以根據當前的銷售來計算並調整不同票種最適座 位數量。我們也考量了動態決策的情境,使決策者能動態分配不同票種的座位數 量,並最佳化獲利。在我們的數值實驗中,透過銷售資訊的揭露,本研究發現考 慮銷售狀況對銷售速度影響的動態座位數量調整能使獲利提高 4.60%。
關鍵字:票券銷售、銷售資訊揭露、購買意願不確定性、動態座位分配、迴 歸分析
Thesis Abstract
The data in this study highlight the lengths that a decision maker can go to in order to maximize its profit through sales information disclosure. By incorporating sales information of price bands into a single regression model, designed to represent the example of Taiwanese theater ticket theoretical framework formalizes how decision makers are able to affect customers’ valuation. It is verified that sales of a price band has a positive relation to its sales speed while the cumulated sales of other price bands have negative effects. The relationships intensify when the price of the price bands are closer to one another. Our analyses further support that the sales speed decreases when the length of selling period escalates.
The estimated demand system allows for the calculation of capacity allocation for each price bands based on its current sales. A scenario in which decision makers can dynamically allocate ticket capacities of various price bands of the same performance to optimize profit is investigated. The profit difference between the model with the sales information disclosure is 4.6% higher than the suboptimal model.
Keywords: Ticket Sales, Sales Information Disclosure, Valuation Uncertainty, Dynamic Seat Allocation, Regression Analysis
Contents
1 Introduction 3
1.1 Background and motivation . . . 3 1.2 Research objectives . . . 6 1.3 Research plan . . . 7
2 Literature review 9
2.1 Perishable products . . . 9 2.2 Sales information disclosure . . . 11 2.3 Ticket selling . . . 12
3 Problem Description and Formulation 17
4 Analysis 23
4.1 Data cleansing . . . 23 4.2 An illustrative example of data processing . . . 26 4.2.1 Testing Hypothesis 1 to 3 . . . 27
4.2.2 Testing Hypothesis 4 . . . 30
4.3 Exploratory data analysis . . . 31
4.4 Regression analysis . . . 32
4.4.1 Testing Hypothesis 1 to 3 . . . 33
4.4.2 Testing Hypothesis 4 . . . 39
5 Application: Dynamic Seat Allocation 43 5.1 Exploratory data analysis . . . 44
5.2 Numerical experiment . . . 46
5.3 Quantity allocation based on Lagrangian relaxation . . . 50
6 Conclusions and Future Works 55 6.1 Conclusions . . . 55
6.2 Future works . . . 56
A Figures of Exploratory Data Analysis 59
Bibliography 63
Chapter 1
Introduction
1.1 Background and motivation
Online environment allows merchants to play various inventory-related selling tactics by deciding whether to divulge certain information to customers. As a top online auction platform, eBay shows on the products’ detailed page the amount sold, the number of views per hour, and the quantity left when the products have high selling speed. China’s market has a more visible mechanism. Both Taobao and JD.com, two dominant online marketplaces in China, disclose inventory information directly on advertisements and on search result lists using a bar chart in bright red showing the percentage of products sold versus inventory left. The phenomenon is also found on hotel booking websites.
Hotels.com reveals the information that hotels were booked several hours ago beside the hotel names on the search result list. As for Bookng.com, it shows on the hotel’s detailed page that whether the room is in high demand. Group buying websites like Groupon, GOMAJI, and LivingSocial especially utilize the inventory-disclosure mechanism. Peo-
ple can see the amount of products sold almost on every web page. Inventory-related information disclosure tactics seem to be widely used by existing online merchants. They seem to believe that disclosing related information on the popularity of the product can trigger higher sales.
Academic evidence of the above practice were provided in several streams of litera- ture. Bikhchandani et al. (1992) define that an informational cascade occurs when an individual, having observed the actions of those ahead of him, finds it optimal to follow the behavior of the preceding individual without regard to his own information. Watts (2002) studies a network setting in which an agent is able to affect its neighbors only.
It is found that limited numbers of initial shocks have the potential to trigger a global network’s information cascade. Herding and information cascade might also occur among professional analysts (Alevy et al., 2007; Welch, 2000). Ho et al. (2014) state that sales velocity indicates the popularity of the product. Showing high sales velocity would have a positive effect on purchase likelihood of customers that it even reverses the participants’
preferences for a product with a higher sales rank.
Both literature and practical applications have the information that sales outcome has effects on customers’ purchase decisions; however, limited literature on performance ticket selling discuss this issue. They normally assume people know their valuation (Courty, 2002; Leslie, 2004; Rascher et al., 2007) and only price change affect customer utility (Drayer et al., 2012; McAfee and Te Velde, 2006; You, 1999). As for the business aspect, we can see from the websites of ticket selling like Ticketmaster and UDN ticket selling that websites disclose cumulated sales outcome. Even though some do not directly show any number related to inventory on the detailed product page until customers click into
the show and choose the specific date of the performance, the websites need to disclose cumulated sales outcome during the process of seat selection. Normally ticket websites show the information by disclosing a seat map with different color as background indicat- ing various price bands and seats with red color are sold seats. Customers can directly understand the sales data and percentage of all price bands at a glance.
It is intriguing that people of both academic and business fields in performance ticket pricing have limited research whether people will be influenced by other people under valuation. Theater performance is a type of product with high valuation. The quality of performance might be affected by the location, seat, casts, actors’ emotion, and the entire crew. In comparison to the uncertainty people might obtain from general online product purchase, performance ticket buying should have higher risks. Customers might be more influenced by other people’s opinions; therefore, cumulated sales outcome might play a role in affecting the sales.
Performance ticket purchase has some other interesting characteristics that are dif- ferent from most general products selling. First, performance tickets are perishable and have no salvage value. Second, there are multiple price bands in a show. Customers have different valuations for each price band based on their quality and customer preference.
Lastly, there is no inventory replenishment in performance. People might buy the tickets of higher or lower price when their most desirable price is sold out.
1.2 Research objectives
In this study, we investigate the relationship between ticket sales outcome and customer valuation and how the relationship affects the profit of a ticket seller. We obtain a performance data set spanned from 2008 to 2012 of a local theater group in Taiwan. They had been actively performing and producing innovative and educational kids’ drama for sixteen years. The ticket selling data set contains detailed information of ticket sales.
From the data set, we obtain the information of the sales volumes of all price bands for each performance in each day.
Aforementioned literature shows that customers’ valuation may be affected by sales outcome. Utilizing the data and regression model, how essential inventory information disclosure is to performance ticket selling can be estimated. In addition to the cumulated sales outcome data, we also add other related performance information like location and selling period into the model and examine the interaction effect between basic perfor- mance information and cumulated sales. The relationships of the target price band and other price bands are also investigated.
In addition to proving the relation between sales speed and cumulated sales data, an operational application is considered. In a performance, each price band has its own capacity and different price bands have various cumulated sales. We study a scenario in which allocated seats of a price band can be transferred to another price band while considering the potential influence on customers’ willingness to buy. The capacity of the price band that is not selling so well can be transferred to one that is with better sales performance. With quantity control, we examine how the additional information about
the relationship between sales outcome and customers’ valuation may contribute to a ticket seller’s profit. .
1.3 Research plan
The remainder of the thesis is organized as follows. In Chapter 2, we review some related works about perishable products, inventory disclosure, and ticket pricing. In Chapter 3, a proposed regression model describes the relationship of customer utility between sales speed. Chapter 4 shows the analysis based on the proposed models. An application based on our research finding is examined in Chapter 5. Chapter 6 concludes.
Chapter 2
Literature review
Our research discusses the relationship of ticket cumulated sales to its sales speed; there- fore, we focus on three streams of literature: perishable product, sales information dis- closure, and ticket selling.
2.1 Perishable products
Many inventory models assume that stock items can be stored indefinitely to meet future demands. However, certain types of inventories undergo change in storage so that in time they may become partially or entirely unfit for consumption (Nahmias, 1982). Those assets are called perishable products. Some examples are fresh food, fashion or seasonal goods, and travel-related products like hotels or air tickets. Theater tickets, of course, is a kind of perishable good.
Retailers offer discounts on perishable products nearing their shelf-lives to encourage consumers to buy under the assumption that customers are less willing to purchase per-
ishable products when the expiration date is near. The assumption is verified by Tsiros and Heilman (2005). They conduct a survey of 300 customers of a grocery shop and found that the willingness-to-pay of the customers decreases throughout the course of the product’s shelf life. In situations of discount offering for perishable goods, Sezen (2004) uses expected profit approach to identify the timing and quantum of discount for perishable commodities. Ramanathan (2006) extends the expected profit approach presented in Sezen (2004) by including decisions on the quantities of the perishable prod- ucts to be stocked by the retailer. Zhao and Zheng (2000) conduct numerical studies to show the revenue impact of dynamic pricing on perishable assets. Their examples show that using optimal dynamic optimal policies achieves 2.4 − 7.3% revenue improvement over the optimal single price policy. Price changes become even more critical when the reservation price distribution shifts over time. In the case, the revenue increase can be as high as 100%. These results explain why yield management has become so essential to fashion retailing and travel service industries.
Weatherford and Bodily (1992) propose Perishable-Asset Revenue Management (PARM) and define it as optimal revenue management of perishable assets through price segmen- tation. Fourteen taxonomies are investigated such as discount price classes, group reser- vations, and overbooking. The fourteen taxonomies are mostly price-related; however, the price for theater are fixed in many circumstances. As Gallego and Van Ryzin (1994) state, in reality, fixed-priced policies appears to be more appealing for dynamic pricing needs constant adjustment and is undesirable for practical application. The option of dynamic pricing is thus eliminated in our situation. We focus on whether information disclosure of cumulated sales will affect the customers’ willingness to pay for various price
bands and leads to modification of sales speed.
2.2 Sales information disclosure
Aforementioned, we observe several e-commerce platforms utilizing sales information dis- closure method to urge customers’ purchase desire. Academic evidence are also provided by several streams of literature. We know that people will be affected by others’ decisions, especially when entering a new arena of products. Word of mouth (WOM) is a classic example. It has been proven that WOM influences people’s short-term (Herr et al., 1991;
Bone, 1995; Anderson, 1998) and long-term judgements (Bone, 1995). A WOM data, collected from the Yahoo! movies website, show that WOM information offers significant explanatory power for both aggregate and weekly box office revenue, especially in the early weeks after a movie is released (Liu, 2006).
The situation is the same when disclosing sales information. Bikhchandani et al.
(1992) define information cascade as an action when a person follows the behavior of the preceding individual without considering his own information. The cascades can explain not only conformity but the rapid spread of new behaviors and the new behaviors might be positive or negative (Liang et al., 2014). The early demand realization attracts more customers when it is good; however, when the early demand realization is bad, it works in the opposite way to depress future demand. An experiment with 870 participants shows the same result (Tseng, 2016).
Chen et al. (2011) combine the effects of WOM and observational learning, which is the effect that customers are influenced by others’ actual purchase decisions. An intrigu-
ing finding is that, while negative WOM is more influential than positive WOM, positive observational learning information significantly increases sales but negative observational learning information has no effect. The finding conflicts with that of Liang et al. (2014).
Chen et al. (2011) argue that consumers might think that other people might have differ- ent individual preferences; therefore, bad sales does not necessarily indicate low quality.
In sum, sales outcome reveals the popularity of the product and the power of infor- mation disclosure of accumulated sales outcome seems to influence customers’ decision making. The effect can be strong. Ho et al. (2014) show that the positive effect of dis- closing high sales velocity on purchase likelihood of customers can be so strong that it reverses the participants’ preferences for a product with a higher sales rank.
2.3 Ticket selling
Ticket selling owns the prerequisite circumstances identified by Kimes (1989). There are six prerequisites: (1) The ability to segment markets − By separating consumers into different groups, managers can have different marketing strategies and varying prices across groups. (2) Perishable inventory − This is a common issue faced by several facets of the tourism and hospitality industries (hotels/motels, airlines, and car rental agencies).
(3) Product sold in advance − This issue deals specifically with time and uncertainty of sales. (4) Low marginal sales costs − Servicing additional customers will not cost the firm a large amount of money. (5) High marginal production costs − Creating additional inventory is difficult. For example, it is practically impossible for a hotel to quickly add rooms or an airplane to quickly add seats. Therefore, as inventory runs low, managers may
have the opportunity to increase prices. (6) Fluctuating demand − The hotel industry, in particular, experiences frequent fluctuations according to season and day of the week.
Tickets for airlines, hotels, sports, and performance are all included.
Many past studies examine the relationship between price policy and revenue poten- tial. Using censored regression and elasticity analysis, Rascher et al. (2007) demonstrates that variable pricing would have yielded approximately $590,000 additional ticket rev- enue for each major league team in 1996. Huntington (1993) indicates that theaters offering a range of ticket prices tend to return a higher box office revenue compared to those theaters that only offered tickets at a single price. Boyd and Boyd (1998) suggest that whenever secondary market sellers can resell tickets for a profit, it indicates that tickets are not priced optimally. Conversely, there are other events with high numbers of unsold seats, which indicates the tickets are priced too high (Howard and Crompton, 2004). Shapiro and Drayer (2012) find that it is essential to assess demand-based pricing to see if dynamic ticket pricing policy is appropriate to fit for every professional league because each team has a different market of consumers with varying perceptions of value and willingness to pay for tickets. They obtain the data of San Francisco Giants and find that ticket prices are significantly higher through dynamic ticket pricing compared with the fixed season ticket price. Even though these prices were still lower than secondary market prices for comparable tickets, the willingness to pay for the same tickets are better captured using dynamic pricing strategies.
Time’s effect on ticket pricing is also investigated. People hold different opinions on how time variable affects pricing. Rosen and Rosenfield (1997) assume that those attending later performances are paid to wait, getting their tickets at lower prices. A
policy of declining prices over time allows the waiting market to clear in all but the final period. Different preference groups are served sequentially. People who desire the service the most are willing to buy early if they expect that the price will not fall too quickly.
As for the case of sports tickets, the average dynamic-priced tickets gradually increase as the game drew closer (Shapiro and Drayer, 2012). Fans choosing to attend a game at the last minute are asked to pay a premium for that convenience. This phenomenon seems to target the customers Courty (2003) considers to be the typical price insensitive, last- minute buyer or what he calls the “busy professional”. Courty (2003) suggests that some people simply appreciate the convenience of having flexibility with their social calendar and are willing to pay a premium for that. He also categorizes another groups of people to be “diehard fans”. Those are the people who plan their schedule in advance and is not willing to pay anything if they cannot commit earlier. He suggests that the number of diehard fans is greater than busy professionals so it is optimal for ticket provider to focus on diehard fans and leave busy professionals to brokers.
Even though many articles support that dynamic pricing helps companies gain profit, research indicates that consumers will respond negatively to paying different prices for the same product (Hall and Hitch, 1939; Kung et al., 2002). Furthermore, with the proliferation of Internet, it is easier for customers to search for the product of their choice within their desired price range (Kung et al., 2002). In some established industries, such as auto retailing and term insurance, consumers pay lower prices as a result of the Internet. Car buyers use the Internet to gather information and borrow the negotiating clout of an online buying service. The Internet has increased consumers’ sensitivity to price and changes in price (Kotler, 2009).
In the case of theater, we believe the effect of negative response of customers to the price change is more significant. In reality, theater tickets are mainly sold through the Internet. It is extremely easy for customers to check the price. In addition, the existing sales method of most performance producers is that ticket prices are set from the beginning. They maintain consistency of prices between seat locations. Theater suppliers can only change price slightly by offering discounts.
A more related work is by Rosen and Rosenfield (1997). In the article, price discrim- ination among ticket service classes is analyzed when aggregate demand is known and individual preferences are private information. Serving customers in cheap second-class seats limits the sellers’ ability to extract surplus from expensive first-class seats because of some switch to the lower class. The article assumes that there are two different types of customers: high- and low-type. They want to know how high- and low-type tickets’
quantity and price are decided, focusing on substitution effect of the tickets at the same time. A buyer’s reserve prices for each type of seat are increasing functions of the in- tensity of the demand. The demand has two parameters α and β that depend on the specific service, seat quality, and prices of complements. The conclusion is that marginal buyers of first-class seats switch to second class rather than no purchase when price rises.
The marginal second-class buyers are less likely to cease purchasing. Customers with the highest general tastes for the basic service never leave the market when prices of their most preferred class of seat increases. The seller can charge them more and deter substitution to the other class by increasing the interclass quality difference. The cross effects of quality on price in the socially optimum marginal benefit calculations turn out to be symmetrically negative in both equations because an increase in one service quality
reduces consumer surplus in the other quality.
Streams of literature related to ticket selling investigate on how dynamic pricing in regards to time improves ticket sellers’ profit. However, with the proliferation of the Internet, customers can check prices with convenience and are more sensitive to price change. It is especially true when it comes to theater ticket selling. In this industry, price bands are predefined and will not alter throughout the entire sales period. Instead of focusing on pricing strategies, we turn to quantity control. Rosen and Rosenfield (1997) utilizes fixed-price strategy and assumes there are two segments of pricing with cross effects in the market; however, they do not investigate the relation of cumulated sales to customer utility and focus on the effect of price fluctuation. In this study, we investigate how cumulated sales outcome leverage the revenue of performance ticket retailers.
Chapter 3
Problem Description and Formulation
In this study, we investigate the relationship of cumulated sales outcome to customer utility and how the relationship affects the profit of a ticket seller. We obtain a perfor- mance data set spanned from 2008 to 2012 of a local theater group in Taiwan. They had been actively performing and producing innovative and educational kids’ drama for sixteen years. The ticket selling data set contains detailed information of ticket sales of every price band. For example, ticket time, selling period, and ticket sold per day. The data set is used to verify our hypotheses.
We define a show to have one or multiple performances. Take Notre Dame de Paris for example. It’s performance period is from February 24 to March 5 and there are thirteen performances during the period. Shows have great extends of uncertainty in terms of cast, story, and performance location. Neglecting one of the key factors of the show might lead to mismatch between expected quality and the actual quality of the performance. We
assume that there are fixed portions of knowledgeable customers who understand all the factors of theaters’ various price bands; therefore, those who are not familiar with the performance quality will consider cumulated sales outcome as a useful information.
This paper examines how cumulated sales outcome affects customer valuation on different price bands. The cumulated sales outcome mentioned is translated into the percentage of the sold quantities divided by the total quantity. Let’s use an example to explain the logic. There are two products in the market. Product A and B are sold for 12 and 20 products respectively in the same period. It seems that product B has a better sales; however, product A has 20 products in total and product B has 50. It is intuitive that people will consider product A as a more popular product. People not only look at the sales number but also the total inventory prepared because the risk of having stock out will also drive people to buy products. The data provided by the theater does not have the exact value of customer utility of various tickets. Therefore, we use tickets sold each day to be a proxy of customers’ willingness to buy. If the tickets of the price band is popular and people are more prone to purchase, the quantity tickets sold today will increase and vice versa. We believe that higher sales outcome leads to higher customer utility (sales speed) and Hypothesis 1 is stated.
Hypothesis 1: A price band’s cumulated sales positively affect sales speed of this band. More precisely, the higher a price band’s cumulated sales, the higher the sales speed of this band.
A show usually contains multiple price bands. When a show started to sell tickets, people evaluate various price bands, compare the utility of each price band, and purchase the one with maximum utility. If one of the price bands has a better sales outcome,
people will have been more willing to purchase it and less willing to purchase the others.
It is easy to justify that there are cross effects between various price bands; therefore, the cumulated sales outcome of other price bands are taken into consideration when examining a specific price band. Hypothesis 2 is made based on the assumption.
Hypothesis 2: Other price bands’ cumulated sales negatively affect sales speed of this band. More precisely, the higher other price bands’ cumulated sales, the lower the sales speed of this band.
The data set contains other information like the sales period, time, and location of the performances. We hypothesize that when the selling period become longer, the average sales speed will become slower. Hypothesis 3 is therefore formed.
Hypothesis 3: Sales period negatively affect sales speed of the performance. More precisely, the higher a performance’s sales period, the slower the sales speed of the per- formance.
Hsieh (2015) says that the total sales of the performances in the afternoon and in Taichung are the highest. As a result, we consider time and region of the performances factors that might influence our hypotheses. We add them as control factors of our model.
The data set spanned from 2008 to 2012; therefore, Y eark is also added to be another control variable. Other control variables includes Showk, P erf ormanceIdk, and W eekkt, where k stands for performance k and t is week t. W eekkt stands for the t week since the start of ticket sales of performance k.
From the hypotheses, we formulate the equation
yikt
sikt+ qikt = β0+X
βiS( sikt
sikt+ qikt)+β1PP eriodk+β2PW eekkt+β1TM orningk+β2TEveningk
+β1RN orthernk+ β2RSouthernk+ β1T Y eark+ β1DShowk+ β2DP erf ormanceIdk+ ,
where N is the set of price bands of a performance. For example, performance A has three price bands: 300, 500, and 600; therefore N is 1, 2, and 3. Three price bands ordered from the highest price band to the lowest. The price band 600’s i is 1, 500 is 2, and 300 is 3. yikt is the tickets’ sales speed in performance k during the week t for price band i. The notation βjS stands for sales related variable, while P is period, T is time, R is region, and D is identity (ID) related. Both T ime an Region are factored variables.
sikt is cumulated tickets sold of the price band i of performance k in week t. As for, qiktis tickets’ quantity left of the price band i. We let Af ternoonk and Centralk as the factors for T imek and Regionk for performance k. yikt and sikt are both divided by sikt + qikt, which is the capacity, to make performances of various capacity to have the same scale.
As we mentioned before, one of the unique characters of performance ticket selling is that a performance has different price bands. We examine the interaction between prices.
We hypothesize that the relationships of a price band to another should be stronger when their prices are closer to each other. Hypothesis 4 is added.
Hypothesis 4: The effects of a price band to another are stronger when their prices are closer to each other.
Using the historical ticket selling data from a Taiwanese theater, we use regression model to verify how cumulated sales outcome affect customer utility and how various price band’s sales outcome (s sikt
ikt+qikt) affect the sales speed of other price bands (s yjkt
jkt+qjkt, where j ∈ N and j 6= i). There are performances with two price bands to six price bands. The number of variables will be different when the show have different numbers of price bands. In the first part of Chapter 4, we aggregate all other price bands to a new
variable; as a result, performances with different numbers of price bands can be examined in the same model. Hypothesis 1 to 3 are examined. In the second part, we focus on verifying Hypothesis 4 and examine the relationships of price bands by categorizing the price bands to those with closer and farer prices to a specific price band.
Chapter 4
Analysis
4.1 Data cleansing
The theater data set is from 2008 to 2012. In prior to making analysis, we adjust and clean the data set in order to fit the hypotheses we made.
First, there are some days with no historical sales. What we need in our research is sales speed so that we can understand customer preference. In order to measure how cumulated sales will affect the sales speed, we add the missing dates back to the historical sales and set the sales amount to be zero.
Second, we find that when the tickets of a price band are about to be sold out, sales distortion occurs. In Figure 4.1, the sales speed of the price bands with higher cumulated sales becomes slower in comparison to its previous sales speed. The sales speed of other price bands accelerates. Instead of flocking to purchase the price band’s tickets, people turn to other price bands. One possible reason might be that when the tickets are about
to sold out, the remaining tickets scatter over the entire segments; therefore, groups of people cannot purchase the seats linked together. People will either purchase tickets of other price bands or make no purchase. We assume that the distortion effect occurs when the tickets are sold 90% and ignore the data when any one of the price bands’ cumulated sales exceed 90%. The most important reason to ignore data of those cumulated sales exceed 90% is that if a ticket is sold out, the sales speed will be zero afterwards; however, the cumulated sales will be 100%. The data without a doubt will lead to biased results.
Third, there are times when people ask for refunds. We consider refunds as negative sales. The reason is that the data set does not document every single transaction of the ticket sales. It shows the aggregate sales of one day. When refunds are higher than tickets sold, the sales data that day is documented as negative sales. When the sales are positive, the information might include refunds with less quantity than sales.
Fourth, we find that the time unit of the sales speed is important. Take Show 4 as an example, when we look at sales speed data (Number of tickets sold, N umSoldikt) of Show 4, we find that there are many zeros in the data (Figure 4.2a). As a result, we aggregate sales data by week to decrease the quantity of zeros (Figure 4.2b).
Lastly, the interactions between the price bands of a performance is also investigated in the paper. We, therefore, select the performances who has sufficient numbers of price bands, which are the ones greater than 3 price bands. After eliminating the unfit per- formances, there are 30 shows and 324 performances in total out of the five-year data set.
The variable mentioned in the hypotheses are the cumulated sales outcomes (s sikt
ikt+qikt) of the investigated price band and other price bands, sales period, time (morning, af-
(a) Performance 26 (b) Performance 133
(c) Performance 277 (d) Performance 348
Figure 4.1: Weekly Cumulated Sales of Performances.
(a) Daily (b) Weekly
Figure 4.2: Aggregate Weekly Sales of Show 4.
ternoon, and evening), and region (Northern, Central, and Southern). In addition, we add Y eark as another variable because the data set spans various year. We there let cumulated sales outcomes of the price band i (s sikt
ikt+qikt) as CumSelfikt and cumulated sales outcomes of other price bands (j ∈ N , and j 6= i) as CumOtherikt (Table 4.1).
4.2 An illustrative example of data processing
Here we use an example with three price bands to let readers understand our data pro- cessing. The section is separated in to two parts. The first part focuses on the relationship of the cumulated sales of a specific price band in a performance and other price bands.
The second part focuses on the interaction between a specific price band and the price bands closer or farer to it. As a result, the regression models in the the separate sections are different and the data transformations are different.
Variable Type Possible Values
CumSelfikt Numerical 0 – 0.9
CumOtherjkt Numerical 0 – 0.9
P eriodk Numerical 13 – 359
W eekkt Categorical 1 – 53
T imek Categorical M orningk, Af ternoonk, Eveningk
Regionk Categorical N orthernk, Centralk, Southernk
Y eark Numerical 2008 – 2012
Showk Categorical 1 – 30
P erf ormanceIdk Categorical 1 – 324
Table 4.1: Type and Possible Values of Variables
4.2.1 Testing Hypothesis 1 to 3
Table 4.2 lists an example data set for the first part. There are three price bands in the performance. Pik stands for price band i of performance k. P1k is the price band with the highest price and P2k is the second highest price band. P3k is the price band with the lowest price for performance k. W eekkt denotes the number of week counted from the start of the sales.
From the data set of local Taiwanese theater, we can also obtain the capacity for each price band (Table 4.3 ).
We let N umSold = Salesof PCapacityik
ik (Table 4.4).
However, normally people can only see the cumulated sales of tickets rather than the
W eekkt Sales of P1k Sales of P2k Sales of P3k
1 0 3 5
2 4 2 1
3 4 3 0
4 2 1 5
5 2 0 0
Table 4.2: Sales of the Example Data Set 1
Capacity of P1k Capacity of P2k Capacity of P3k
12 80 120
Table 4.3: Capacity of the Example Data Set 1
W eekkt N umSold for P1k N umSold for P2k N umSold for P3k
1 120 = 0 803 = 0.0375 1205 = 0.041
2 124 = 0.33 802 = 0.0250 1201 = 0.008
3 124=0.33 803 = 0.0375 1200 = 0
4 122=0.17 801 = 0.0125 1205 = 0.041
5 21=0.17 800 = 0 1200 = 0
Table 4.4: N umSold for Example Data Set 1
W eekkt Cumulated Sales for P1k Cumulated Sales for P2k Cumulated Sales for P3k
1 0 3 5
2 4 5 6
3 8 8 6
4 10 9 11
5 12 9 11
Table 4.5: Cumulated Sales of Example Data Set 1
W eekkt CumSelf1kt CumOther1kt 1 120 = 0 80+1203+5 = 0.04 2 124 = 0.33 80+1205+6 = 0.055 3 128 = 0.67 80+1208+6 = 0.07 4 1012 = 0.83 80+1209+11 = 0.1 5 1212 = 1 80+1209+11 = 0.1
Table 4.6: CumSelf1kt and CumOther1kt of Example Data Set 1
sales speed. We transform sales data of Pik into cumulated sales (Table 4.5).
Let we are examining the price band 1 now; therefore, CumSelf1ktis P1kand CumOther1kt is the combination of P2k and P3k where its formulation is (
P j6=i(sj) P
j6=i(sj+qj)).
Aforementioned, the data that has CumSelf > 0.9 will be deleted. In Table 4.6, we find the CumSelf of the last row exceed 0.9 and the data and the data afterwards are ignored.
W eekkt Sales of P1k Sales of P2k Sales of P3k Sales of P4k
1 0 3 5 0
2 1 2 2 0
3 5 6 0 1
4 0 0 3 2
5 4 0 0 1
Table 4.7: Sales of the Example Data Set 2
Capacity of P1k Capacity of P2k Capacity of P3k Capacity of P4k
10 20 30 100
Table 4.8: Capacity of the Example Data Set 2
4.2.2 Testing Hypothesis 4
When it comes to the second part, we investigate if the effects of a price band to another are stronger when their prices are closer to each other. Price bands need to be separately examined. We categorize other price bands as two types: Near and Far. For P1k, the price bands that are near is P2k and the rest are far. For P2k, the near price bands are P1k and P3k. The CumSold for Near and Far are CumN earikt and CumF arikt separately.
The calculation of the both variable are the same as that to CumOtherikt
We use another example data set (Table 4.7) here and list CumSelf2ktand CumN ear2kt and CumF ar2kt are listed in Table 4.9.
Week CumSelf2kt CumN ear2kt CumF ar2kt 1 203 = 0.15 20+300+5 = 0.10 1000 = 0 2 205 = 0.25 20+301+7 = 0.16 1000 = 0 3 1120 = 0.55 20+306+7 = 0.26 1001 = 0.01 4 1120 = 0.55 20+306+10 = 0.32 1003 = 0.03 5 1120 = 0.55 10+1020+30 = 0.40 1004 = 0.04
Table 4.9: CumSelf2kt, CumN ear2kt, and CumF ar2kt of Example Data Set 2
4.3 Exploratory data analysis
Prior to getting into detail, we describe the data set to let readers understand the data more. The data set contains 36,283 rows of data. There are 30 shows and 324 perfor- mances in total out of the five-year data set. The average numbers of performance in a show is 10.8.
First, we test on CumSelfikt and aggregate other price bands to CumOtherikt to put data of different numbers of price bands in one model. Variables used in this part is CumSelfikt, CumOtherikt, P eriodk, T imek, Regionk, and Y eark.
Before looking at CumSelfikt× Capacityik and CumOtherikt × Capacityik, we in- vestigate the numbers of tickets sold without dividing capacity, which we interpret it as N umSoldikt. The results of both N umSoldikt and cumulated sales of both variables are listed in Table 4.10. The variation of all four variables are huge and from Figure A.1a to Figure A.1d, we can conclude that all distributions of the variables are exponential. For P eriodk, the variation is also large and its distribution is right skewed (Figure A.1e).
Variable Mean Standard Deviation Maximum Minimum
N umSoldikt (yikt) 6.625 15.578 347 −158
CumSelfikt× Capacityik 39.050 42.111 437 0
CumOtherikt× Capacityik 175.590 162.877 1390 0
P eriodk 144.843 87.734 359 13
Table 4.10: Exploratory Analysis of the Variables of the First Part
As for the control variables, we look at how many performances are in different cate- gories or values of T imek, Regionk, and Y eark. The performances played in the morning are 83, in the afternoon are 122, and in the evening are 119 (Figure A.1f). There are 284 performances took place in Northern Taiwan, 28 in Central, and 12 in Southern (Fig- ure A.1g). When it comes to Y eark, we find that number of performances played has an increasing trend and the quantities are 44, 44, 70, 72, and 94 (Figure A.1h).
4.4 Regression analysis
There are two parts in this section. Part A mainly discusses whether cumulated sales outcome affects weekly sales speed under various circumstances. Part B focuses on the interactions between price bands and whether price bands with closer prices have larger effects on the sales speed of the targeted price band.
The main difference between the two parts is the form of other price bands. In our data set the shows have either 4, 5, or 6 numbers of price bands in a performance. In part A, we mainly discuss the effect of cumulated sales to sales speed; therefore, instead of
separating the performances having different numbers of price bands, we aggregate all the other price bands to be (
P j6=i(sj) P
j6=i(sj+qj)) and name it as CumOtherikt. After this aggregation, we can put data of various numbers of price bands in the same model. In the second part, we aggregate the prices that are close to the target price band to CumN earikt and the ones farer to CumF arikt.
4.4.1 Testing Hypothesis 1 to 3
There are four models in part A. We first focus on whether CumSelfikt is having a positive relation to W eeklySalesikt and if CumOtherikt has a negative relation. This is directly investigated by model A1. We add variables that belong to the basic performance information people can get when buying tickets into consideration in Model A2. P eriodk, T imek, Regionk, and Y eark are added. The interaction effects of performance infor- mation to CumSelfikt are further added in Model A3 to see more detailed interactions.
We further add Showk, P erf ormanceIdk, and W eekkt as control variables to Model A3 as Model AD3 to control the variation among performances. The effect of CumSelfikt and CumOtherikt to W eeklySalesikt might not be linear. Model A4 is then proposed to examine the relationship. We take Af ternoonk and Centralk as factors for T imek and Regionk. The reference table of variable names and math notations is in Table 4.11.
Model A1:
W eeklySalesikt = β0+ β1SCumSelfikt+ β2SCumOtherikt+
Model A2:
W eeklySalesikt = β0+ β1SCumSelfikt+ β2SCumOtherikt+ β1PP eriodk
Variable Name Math Notation W eeklySalesikt (s yikt
ikt+qikt) CumSelfikt (s sikt
ikt+qikt) CumOtherikt (
P
j6=i(sjkt) P
j6=i(sjkt+qjkt)) Table 4.11: Variable Names and Notations
+β1TM orningk+ β2TEveningk+ β1RN orthernk+ β2RSouthernk
+β1YY eark+
Model A3:
W eeklySalesikt = β0+ β1SCumSelfikt+ β2SCumOtherikt+ β1PP eriodk
+β1TM orningk+ β2TEveningk+ β1RN orthernk+ β2RSouthernk+ β1YY eark
β1P IP eriodk× CumSelfikt+ β1T IM orningk× CumSelfikt+ β2T IEveningk× CumSelfikt
+β1RIN orthernk× CumSelfikt+ β2RISouthernk× CumSelfikt+
Model AD3:
W eeklySalesikt = β0 + β1SCumSelfikt+ β2SCumOtherikt+ β1PP eriodk+ β2PW eekkt
+β1TM orningk+ β2TEveningk+ β1RN orthernk+ β2RSouthernk+ β1YY eark
+β1DShowk+ β2DP erf ormanceIdk+ β1P IP eriodk× CumSelfikt
+β1T IM orningk× CumSelfikt+ β2T IEveningk× CumSelfikt
+β1RIN orthernk× CumSelfikt+ β2RISouthernk× CumSelfikt+
In Model A3 and AD3, we added the interaction effects and are denoted by βjXI, where X ∈ {S, P, T, R}.
We can find that the three models all have a significant correlation between cumulated sales to sales speed and a negative correlation between other price band’s cumulated sales to sales speed (Table 4.12). Hypothesis 1 and 2 are therefore proven. As for Hypothesis 3, the results of Model A2 and A3 both show that the length of sales period negatively affects sales speed and verified Hypothesis 3; however, when control variables Showk, P erf ormanceIdk, and W eekkt are added, P eriodk becomes positive. W eekkt are mostly negative with the level of W eekk1, which indicates that the sales speed of Week 1 is normally the fastest.
It is also considered that if the effect of Hypothesis 1 and 2 are not only linearly correlated. We add two variables: CumSelfikt2 and CumOtherikt2 to Model A3 as Model A4.
Model A4:
W eeklySalesikt = β0 + β1SCumSelfikt+ β2SCumOtherikt+ β1PP eriodk+ β2PW eekkt +β1TM orningk+ β2TEveningk+ β1RN orthernk+ β2RSouthernk+ β1YY eark
+β1DShowk+ β2DP erf ormanceIdk+ β1P IP eriodk× CumSelfikt +β1T IM orningk× CumSelfikt+ β2T IEveningk× CumSelfikt +β1RIN orthernk× CumSelfikt+ β2RISouthernk× CumSelfikt
+β3SCumSelfikt2 + β4SCumOtherikt2 +
The previous variables maintain the same effects in Model A4 (Table 4.12). If we fix all other variables and the equation that only consider the effect of CumSelfikt and
Regression Model A1 Model A2 Model A3 Model AD3 (Intercept) 1.11 × 10−2*** 9.84 × 10−1 −4.16 × 10−1 3.16 × 101 CumSelfikt 1.49 × 10−1*** 1.46 × 10−1*** 3.02 × 10−1*** 3.30 × 10−1***
CumOtherikt −5.66 × 10−2*** −7.41 × 10−2*** −6.56 × 10−2*** 1.21 × 10−2* P eriodk −1.85 × 10−4*** −7.39 × 10−5*** 1.38 × 10−4**
Y eark −4.61 × 10−4 2.19 × 10−4 −1.64 × 10−2
T imek
M orningk −2.66 × 10−3* 4.54 × 10−3* −5.69 × 10−3
Eveningk −4.14 × 10−3*** −5.76 × 10−3*** −4.74 × 10−3 Regionk
N orthernk −6.97 × 10−4 7.92 × 10−3** −1.64 × 10−1***
Southernk −2.66 × 10−3 −1.01 × 10−3 −1.44 × 10−1***
CumSelfikt× P eriodk −5.08 × 10−4*** −6.41 × 10−4***
CumSelfikt× T imek
CumSelfikt× M orningk −2.37 × 10−2*** −2.91 × 10−2***
CumSelfikt× Eveningk 6.45 × 10−3 6.51 × 10−3
CumSelfikt× Regionk
CumSelfikt× N orthernk −5.58 × 10−2*** −3.90 × 10−2***
CumSelfikt× Southernk 7.81 × 10−3 3.58 × 10−2
Other Control Variables Showk
P erf ormanceIdk
W eekkt
R2 12.50% 18.54% 20.83% 40.68%
Adjusted R2 12.49% 18.51% 20.78% 39.60%
***p < 0.001, ** p < 0.01, * p < 0.05
Table 4.12: The Results of Part A
CumOtherikt will be
W eeklySalesikt= 0.469CumSelfikt− 0.157CumSelfikt2
+0.247CumOtherikt− 0.369CumOtherikt2 .
The effects of CumSelfikt2 and CumOtherikt2 are both interactive effects of that of CumSelfikt and CumOtherikt.
We can obtain the first derivatives of y of CumSelfikt and CumOtherikt.
∂W eeklySalesikt
∂CumSelfikt = 0.469 − 0.314CumSelfikt.
∂W eeklySalesikt
∂CumOtherikt = 0.247 − 0.738CumOtherikt.
Both CumSelfikt and CumOther are concave functions. Negative CumSelfikt2 explains that the effect of CumSelfikt continues to increase positively with decreasing margin (Figure 4.3a). As for the relationship of CumOtherikt to W eeklySalesikt is that it first decreases positively and then when cumulated sales of other price bands is high, customer tend to have less incentive to purchase the price band’s ticket (Figure 4.3b). The situation is due to the interaction of two effects. One is that when people find that other price bands are selling good, they consider the show worthy to watch and purchase their most preferable price band of the show. The second one is that when people find that other price bands are selling good, they turn to buy the price bands with higher sales. From Graph 4.3b, we can conclude that the first effect is stronger during the start of the sales period and the second effect is stronger in the end of the sales period.
We verify the correlation coefficients between variables (Figure 4.4). We find that no two variables are highly correlated.
(a) CumSelfikt (b) CumOtherikt
Figure 4.3: Change of W eeklySalesikt.
Figure 4.4: Correlation Coefficients of Variables
4.4.2 Testing Hypothesis 4
When it comes to part B, we investigate if the effects of a price band to another are stronger when their prices are closer to each other. Price bands need to separately examined. We categorize other price bands as two types: Near and Far. For P1k, the price bands that are near is P2k and the rest are far. For P2k, the near price bands are P1k and P3k. The CumSoldikt for Near and Far are CumN earikt and CumF arikt separately.
Model B:
W eeklySalesikt= β0+ β1SCumSelfikt+ β2SCumN earikt+ β3SCumF arikt
+β1PP eriodk+ β2PW eekkt+ βT1M orningk+ β2TEveningk+ β1RN orthernk+ β2RSouthernk
+β1YY eark+ β1DShowk+ β2DP erf ormanceIdk+ β1P IP eriodk× CumSelfikt
+β1T IM orningk× CumSelfikt+ β2T IEveningk× CumSelfikt
+β1RIN orthernk× CumSelfikt+ β2RISouthernk× CumSelfikt+
From Table 4.13 we find that the correlation between CumSelfikt, CumN earikt, and CumF arikt to W eeklySalesikt has the same pattern as that of Model AD3 and A4. From Figure 4.5, we can also find that Model B has the same pattern as A4. CumSelfikt, CumN earikt, and CumF arikt are all concave. CumN earikt and CumF arikt both are with decreasing marginal increase first and then reach an U-turn and decrease with in- creasing margin. Interestingly, CumF arikt has larger effects compare with CumN earikt. Hypothesis 4 is not verified and can be further investigated in the future.